数据批量导入HBase

测试数据：

datas

1001    lilei   17  13800001111

1002    lily    16  13800001112

1003    lucy    16  13800001113

1004    meimei  16  13800001114

数据批量导入使用mr,先生成HFile文件然后在用completebulkload工具导入。

1、需要先在hbase 创建表名：

hbase> create 'student', {NAME => 'info'}

maven pom.xml配置文件如下：

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.0</version>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0</version>
</dependency>

<!-- hbase -->

        <dependency>

            <groupId>org.apache.hbase</groupId>

            <artifactId>hbase-client</artifactId>

            <version>1.0.0</version>

        </dependency>

        <dependency>

            <groupId>org.apache.hbase</groupId>

            <artifactId>hbase-server</artifactId>

            <version>1.0.0</version>

        </dependency>

编写MapReduce代码如下：

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.KeyValue;

import org.apache.hadoop.hbase.client.HTable;

import org.apache.hadoop.hbase.io.ImmutableBytesWritable;

import org.apache.hadoop.hbase.mapreduce.HFileOutputFormat;

import org.apache.hadoop.hbase.util.Bytes;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**

 * @author 作者 E-mail:

 * @version 创建时间：2016年3月2日 下午4:15:57

 * 类说明

 */

public class CreateHfileByMapReduce {

    public static class MyBulkMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, KeyValue>{

        @Override

        protected void setup( Mapper<LongWritable, Text, ImmutableBytesWritable, KeyValue>.Context context )

            throws IOException, InterruptedException {

            super.setup( context );

        }

        @Override

        protected void map( LongWritable key, Text value,

                            Context context )

            throws IOException, InterruptedException {

            String[] split = value.toString().split("\t"); // 根据实际情况修改

            if (split.length == 4){

                byte[] rowkey = split[0].getBytes();

                ImmutableBytesWritable imrowkey = new ImmutableBytesWritable( rowkey );

                context.write(imrowkey, new KeyValue(rowkey, Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes(split[1])));

                context.write(imrowkey, new KeyValue(rowkey, Bytes.toBytes("info"), Bytes.toBytes("age"), Bytes.toBytes(split[2])));

                context.write(imrowkey, new KeyValue(rowkey, Bytes.toBytes("info"), Bytes.toBytes("phone"), Bytes.toBytes(split[3])));

            }

        }

    }

    @SuppressWarnings( "deprecation" )

    public static void main( String[] args ) {

        if (args.length != 4){

            System.err.println("Usage: CreateHfileByMapReduce <table_name><data_input_path><hfile_output_path> ");

            System.exit(2);

        }

        String tableName = args[0];

        String inputPath  = args[1];

        String outputPath = args[2];

      /*  String tableName = "student";

        String inputPath  = "hdfs://node2:9000/datas";

        String outputPath = "hdfs://node2:9000/user/output";*/

        HTable hTable = null;

        Configuration conf = HBaseConfiguration.create();

        try {

           hTable  = new HTable(conf, tableName);

           Job job = Job.getInstance( conf, "CreateHfileByMapReduce");

           job.setJarByClass( CreateHfileByMapReduce.class );

           job.setMapperClass(MyBulkMapper.class);

           job.setInputFormatClass(org.apache.hadoop.mapreduce.lib.input.TextInputFormat.class);

           //

           HFileOutputFormat.configureIncrementalLoad(job, hTable);

           FileInputFormat.addInputPath( job, new Path(inputPath) );

           FileOutputFormat.setOutputPath( job, new Path(outputPath) );

           System.exit( job.waitForCompletion(true)? 0: 1 );

        }

        catch ( Exception e ) {

            e.printStackTrace();

        }

    }

}

注: 借助maven的assembly插件, 生成胖jar包(就是把依赖的zookeeper和hbase jar包都打到该MapReduce包中), 否则的话, 就需要用户静态配置, 在Hadoop的class中添加zookeeper和hbase的配置文件和相关jar包.

最终的jar包为 bulk.jar, 主类名为cn.bd.batch.mr.CreateHfileByMapReduce, 生成HFile, 增量热载入hbase
sudo -u hdfs hadoop jar <xxoo>.jar <MainClass> <table_name> <data_input_path> <hfile_output_path>
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles <hfile_output_path> <table_name>

hadoop jar bulk.jar cn.bd.batch.mr.CreateHfileByMapReduce student /datas /user/output

hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/output student

本文参考地址：http://www.cnblogs.com/mumuxinfei/p/3823367.html

数据批量导入HBase的更多相关文章

用MR生成HFile文件格式后，数据批量导入HBase
环境hadoop cdh5.4.7 hbase1.0.0 测试数据: topsid uid roler_num typ 10 111111 255 0 在Hbase 创建t2数据库: create ...
[Django]数据批量导入
前言:历经一个月的复习,考试终于结束了.这期间上班的时候有研究了Django网页制作过程中,如何将数据批量导入到数据库中. 这个过程真的是惨不忍睹,犯了很多的低级错误,这会在正文中说到的.再者导入数据 ...
将Excle中的数据批量导入数据库
namespace 将Excle中的数据批量导入数据库{ class Program { static void Main(string[] args) { S ...
将execl里的数据批量导入数据库
本文将采用NPOI插件来读取execl文件里的数据,将数据加载到内存中的DataTable中 /// <summary> /// 将Excel转换为DataTable /// </s ...
mysql中把一个表的数据批量导入另一个表中
mysql中把一个表的数据批量导入另一个表中不管是在网站开发还是在应用程序开发中,我们经常会碰到需要将MySQL或MS SQLServer某个表的数据批量导入到另一个表的情况,甚至有时还需要指定 ...
.net core利用MySqlBulkLoader大数据批量导入MySQL
最近用core写了一个数据迁移小工具,从SQLServer读取数据,加工后导入MySQL,由于数据量太过庞大,数据表都过百万,常用的dapper已经无法满足.三大数据库都有自己的大数据批量导入数据的方 ...
Java实现Excel数据批量导入数据库
Java实现Excel数据批量导入数据库概述: 这个小工具类是工作中的一个小插曲哦,因为提数的时候需要跨数据库导数... 有的是需要从oracle导入mysql ,有的是从mysql导入oracle ...
Java实现数据批量导入mysql数据库
本文完全照搬别人的. 原文标题:Java实现数据批量导入数据库(优化速度-2种方法) 原文地址:https://blog.csdn.net/qy20115549/article/details/526 ...
SQL Server中bcp命令的用法以及数据批量导入导出
原文:SQL Server中bcp命令的用法以及数据批量导入导出 1.bcp命令参数解析 bcp命令有许多参数,下面给出bcp命令参数的简要解析用法: bcp {dbtable | query} { ...

随机推荐

VS2012修改代码时会把后面的覆盖
vs2012修改代码时会把后面的覆盖,并且鼠标指针变成灰色竖方块解决:按一下键盘上的Insert键
万能媒体播放器 PotPlayer
推荐一款超级牛逼播放器:PotPlayer
P1001 A+B Problem
下面我们要展示这个题的玄学做法!!!@cjx!!! #include<iostream> #define chen using namespace std; #define jia int ...
python-列表基本操作
本文讲解python列表的常用操作: 1.list函数,可以将任何序列作为list的参数 names=['lilei','tom','mackle','dongdong']print(list(nam ...
split slice splice的简单区别
split slice splice的简单区别 split: 分割 //字符串方法 string.split let str = 'hello world'; //str.split('') 以什么东 ...
服务端 Cros 配置解决跨域
<system.webServer> <httpProtocol> <customHeaders> <remove name="Access-Con ...
for循环（C语言型）举例
SHELL自动化--接口测试
#!/bin/bash fileNum=`ls /bin/testShell/apiCheck_shell/logs/ | grep $(date "+%Y-%m-%d") | w ...
jdbc baseDAO 以及每个类的继承
首先是baseDAO,用来作为DAO的父类 package dao; import java.lang.reflect.Field; import java.sql.Connection; impor ...
工程师技术(三)：独立Web站点的快速部署、虚拟Web主机的部署、配置网页内容访问、使用自定Web根目录、配置安全Web服务、部署并测试WSGI站点
一.独立Web站点的快速部署目标: 本例要求为 http://server0.example.com 配置Web站点,要求如下: 1> 从http://classroom/pub/materi ...

数据批量导入HBase

数据批量导入HBase的更多相关文章

随机推荐

热门专题