2.8-2.10 HBase集成MapReduce

一、HBase集成MapReduce

1、查看HBase集成MapReduce需要的jar包

[root@hadoop-senior hbase-0.98.6-hadoop2]# bin/hbase mapredcp

2019-05-22 16:23:46,814 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

/opt/modules/hbase-0.98.6-hadoop2/lib/hbase-common-0.98.6-hadoop2.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/protobuf-java-2.5.0.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/hbase-client-0.98.6-hadoop2.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/hbase-hadoop-compat-0.98.6-hadoop2.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/hbase-server-0.98.6-hadoop2.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/hbase-protocol-0.98.6-hadoop2.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/high-scale-lib-1.1.1.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/zookeeper-3.4.5.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/guava-12.0.1.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/htrace-core-2.04.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/netty-3.6.6.Final.jar

2、

##开启yarn

[root@hadoop-senior hadoop-2.5.0]# sbin/yarn-daemon.sh start nodemanager

[root@hadoop-senior hadoop-2.5.0]# sbin/mr-jobhistory-daemon.sh start histryserver

[root@hadoop-senior hadoop-2.5.0]# sbin/mr-jobhistory-daemon.sh start historyserver

##HBase默认带的MapReduce程序都在hbase-server-0.98.6-hadoop2.jar里面，比较有用

[root@hadoop-senior hbase-0.98.6-hadoop2]# export HBASE_HOME=/opt/modules/hbase-0.98.6-hadoop2

[root@hadoop-senior hbase-0.98.6-hadoop2]# export HADOOP_HOME=/opt/modules/hadoop-2.5.0

[root@hadoop-senior hbase-0.98.6-hadoop2]# HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp` $HADOOP_HOME/bin/yarn jar $HBASE_HOME/lib/hbase-server-0.98.6-hadoop2.jar

An example program must be given as the first argument.

Valid program names are:

  CellCounter: Count cells in HBase table

  completebulkload: Complete a bulk data load.

  copytable: Export a table from local cluster to peer cluster

  export: Write table data to HDFS.

  import: Import data written by Export.

  importtsv: Import data in TSV format.

  rowcounter: Count rows in HBase table

  verifyrep: Compare the data from tables in two different clusters. WARNING: It doesn't work for incrementColumnValues'd cells since the timestamp is changed after being appended to the log.

#####

TSV

    tab分割

    >>student.tsv

    1001 zhangsan 26 shanghai 

CSV

    逗号分割

    >>student.csv

    1001，zhangsan，26，shanghai

二、编写MapReduce程序，集成HBase对表进行读取和写入数据

1、准备数据

##准备两张表，user:里面有数据，basic:没有数据

hbase(main):004:0> create 'basic', 'info'

0 row(s) in 0.4290 seconds

=> Hbase::Table – basic

hbase(main):005:0> list

TABLE

basic

user

2 row(s) in 0.0290 seconds

=> ["basic", "user"]

hbase(main):003:0> scan 'user'

ROW                                          COLUMN+CELL

 10002                                       column=info:age, timestamp=1558343570256, value=30

 10002                                       column=info:name, timestamp=1558343559457, value=wangwu

 10002                                       column=info:qq, timestamp=1558343612746, value=231294737

 10002                                       column=info:tel, timestamp=1558343607851, value=231294737

 10003                                       column=info:age, timestamp=1558577830484, value=35

 10003                                       column=info:name, timestamp=1558345826709, value=zhaoliu

 10004                                       column=info:address, timestamp=1558505387829, value=shanghai

 10004                                       column=info:age, timestamp=1558505387829, value=25

 10004                                       column=info:name, timestamp=1558505387829, value=zhaoliu

3 row(s) in 0.0190 seconds

hbase(main):006:0> scan 'basic'

ROW                                          COLUMN+CELL

0 row(s) in 0.0100 seconds

2、编写MapReduce，将user表中的数据导入到basic表中

package com.beifeng.senior.hadoop.hbase;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.conf.Configured;

import org.apache.hadoop.hbase.Cell;

import org.apache.hadoop.hbase.CellUtil;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.client.Mutation;

import org.apache.hadoop.hbase.client.Put;

import org.apache.hadoop.hbase.client.Result;

import org.apache.hadoop.hbase.client.Scan;

import org.apache.hadoop.hbase.io.ImmutableBytesWritable;

import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;

import org.apache.hadoop.hbase.mapreduce.TableMapper;

import org.apache.hadoop.hbase.mapreduce.TableReducer;

import org.apache.hadoop.hbase.util.Bytes;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.util.Tool;

import org.apache.hadoop.util.ToolRunner;

public class User2BasicMapReduce extends Configured implements Tool {

    // Mapper Class

    public static class ReadUserMapper extends TableMapper<Text, Put> {

        private Text mapOutputKey = new Text();

        @Override

        public void map(ImmutableBytesWritable key, Result value,

                Mapper<ImmutableBytesWritable, Result, Text, Put>.Context context)

                        throws IOException, InterruptedException {

            // get rowkey

            String rowkey = Bytes.toString(key.get());

            // set

            mapOutputKey.set(rowkey);

            // --------------------------------------------------------

            Put put = new Put(key.get());

            // iterator

            for (Cell cell : value.rawCells()) {

                // add family : info

                if ("info".equals(Bytes.toString(CellUtil.cloneFamily(cell)))) {

                    // add column: name

                    if ("name".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))) {

                        put.add(cell);

                    }

                    // add column : age

                    if ("age".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))) {

                        put.add(cell);

                    }

                }

            }

            // context write

            context.write(mapOutputKey, put);

        }

    }

    // Reducer Class

    public static class WriteBasicReducer extends TableReducer<Text, Put, //

    ImmutableBytesWritable> {

        @Override

        public void reduce(Text key, Iterable<Put> values,

                Reducer<Text, Put, ImmutableBytesWritable, Mutation>.Context context)

                        throws IOException, InterruptedException {

            for(Put put: values){

                context.write(null, put);

            }

        }

    }

    // Driver

    public int run(String[] args) throws Exception {

        // create job

        Job job = Job.getInstance(this.getConf(), this.getClass().getSimpleName());

        // set run job class

        job.setJarByClass(this.getClass());

        // set job

        Scan scan = new Scan();

        scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs

        scan.setCacheBlocks(false);  // don't set to true for MR jobs

        // set other scan attrs

        // set input and set mapper

        TableMapReduceUtil.initTableMapperJob(

          "user",        // input table

          scan,               // Scan instance to control CF and attribute selection

          ReadUserMapper.class,     // mapper class

          Text.class,         // mapper output key

          Put.class,  // mapper output value

          job //

         );

        // set reducer and output

        TableMapReduceUtil.initTableReducerJob(

          "basic",        // output table

          WriteBasicReducer.class,    // reducer class

          job//

         );

        job.setNumReduceTasks(1);   // at least one, adjust as required

        // submit job

        boolean isSuccess = job.waitForCompletion(true) ;

        return isSuccess ? 0 : 1;

    }

    public static void main(String[] args) throws Exception {

        // get configuration

        Configuration configuration = HBaseConfiguration.create();

        // submit job

        int status = ToolRunner.run(configuration,new User2BasicMapReduce(),args) ;

        // exit program

        System.exit(status);

    }

}

3、执行

##打jar包，并上传到$HADOOP_HOME/jars/

##执行

export HBASE_HOME=/opt/modules/hbase-0.98.6-hadoop2

export HADOOP_HOME=/opt/modules/hadoop-2.5.0

HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp` $HADOOP_HOME/bin/yarn jar $HADOOP_HOME/jars/hbase-mr-user2basic.jar

##查看执行结果

hbase(main):004:0> scan 'basic'

ROW                                          COLUMN+CELL

 10002                                       column=info:age, timestamp=1558343570256, value=30

 10002                                       column=info:name, timestamp=1558343559457, value=wangwu

 10003                                       column=info:age, timestamp=1558577830484, value=35

 10003                                       column=info:name, timestamp=1558345826709, value=zhaoliu

 10004                                       column=info:age, timestamp=1558505387829, value=25

 10004                                       column=info:name, timestamp=1558505387829, value=zhaoliu

3 row(s) in 0.0300 seconds

2.8-2.10 HBase集成MapReduce的更多相关文章

HBase概念学习（七）HBase与Mapreduce集成
这篇文章是看了HBase权威指南之后,依据上面的解说搬下来的样例,可是略微有些不一样. HBase与mapreduce的集成无非就是mapreduce作业以HBase表作为输入,或者作为输出,也或者作 ...
HBase 与 MapReduce 集成
6. HBase 与 MapReduce 集成 6.1 官方 HBase 与 MapReduce 集成查看 HBase 的 MapReduce 任务的执行:bin/hbase mapredcp; 环 ...
hbase运行mapreduce设置及基本数据加载方法
hbase与mapreduce集成后,运行mapreduce程序,同时需要mapreduce jar和hbase jar文件的支持,这时我们需要通过特殊设置使任务可以同时读取到hadoop jar和h ...
hive与hbase集成
http://blog.csdn.net/vah101/article/details/22597341 这篇文章最初是基于介绍HIVE-705.这个功能允许Hive QL命令访问HBase表,进行读 ...
Hbase框架原理及相关的知识点理解、Hbase访问MapReduce、Hbase访问Java API、Hbase shell及Hbase性能优化总结
转自:http://blog.csdn.net/zhongwen7710/article/details/39577431 本blog的内容包含: 第一部分:Hbase框架原理理解第二部分:Hbas ...
《HBase in Action》第三章节的学习总结 ---- 如何编写和运行基于HBase的MapReduce程序
HBase之所以与Hadoop是最好的伙伴,我理解就因为两点:1.HADOOP的HDFS,为HBase提供了分布式的存储方式:2.HADOOP的MR为HBase提供的分布式的计算方法.u 其中第一点, ...
3.12-3.16 Hbase集成hive、sqoop、hue
一.Hbase集成hive https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration 1.说明 Hive与HBase整合在一起 ...
新闻实时分析系统Hive与HBase集成进行数据分析 Cloudera HUE大数据可视化分析
1.Hue 概述及版本下载 1)概述 Hue是一个开源的Apache Hadoop UI系统,最早是由Cloudera Desktop演化而来,由Cloudera贡献给开源社区,它是基于Python ...
新闻实时分析系统Hive与HBase集成进行数据分析
(一)Hive 概述 (二)Hive在Hadoop生态圈中的位置 (三)Hive 架构设计 (四)Hive 的优点及应用场景 (五)Hive 的下载和安装部署 1.Hive 下载 Apache版本的H ...

随机推荐

node.js如何读取MySQL数据
先安装mysql模块. node.js默认安装时,模块文件放在 /usr/local/lib/node_modules 这个目录下,为了便宜管理,模块还是统一安装到这里好. $ cd /usr/loc ...
com关于IUnknown接口
com定义的每个接口都必须从IUnknown继承过来,主要原因是IUnknown接口提供了两个很重要的特性:生存期控制和接口查询. 客户程序仅仅能通过接口与com对象进行通信.尽管客户程序能够无论对象 ...
Use the command of tar to multi-part archive method.
We usually meet the package too large to upload internat space when upload have a limited .So we nee ...
【每日Scrum】第七天（4.28）Sprint2总结性会议
本次会议主要是演示了一下本组项目的各项功能,每个人负责那一块儿功能由本人来负责说明和演示,确定alpha版本的发布时间,并且分派了各组员的文档负责情况,上图是会议记录,下面我详细介绍一下我组分派情况: ...
Python遍历列表
#循环遍历列表 nums = [ss,gg,e,fff,bb] #while循环遍历,但是不推荐使用,因为还要把列表的元素数出来 i = 0 while i<5: print(nums[i]) ...
Android调用JNI本地方法经过有点改变
方法注册好后要经过哪些路 Android一个异常捕获项目 https://github.com/xroche/coffeecatch coffeecatch CoffeeCatch, a tiny n ...
2016最新手机号码正则、身份证JS正则表达式
js最新手机号码.身份证正则表达式身份证正则: //身份证正则表达式(15位) isIDCard1=/^[1-9]\d{7}((0\d)|(1[0-2]))(([0|1|2]\d)|3[0-1] ...
JavaScript通过正则随机生成电话号码
没有接口,就只能自己模拟Json数据了恰好需要模拟一些电话号码,我又懒得自己随便写, 不如写一个小功能就用来实现随机生成电话号码 <!DOCTYPE html> <html lan ...
linux快捷键及主要命令（转载）
作者:幻影快递Linux小组翻译 2004-10-05 22:03:01 来自:Linux新手管理员指南(中文版) 5.1 Linux基本的键盘输入快捷键和一些常用命令 5.2 帮助命令 5.3 系 ...
thinkphp3.2独立分组的建立
很简单,就是把默认的Home模块复制一份,放到Admin目录下,同时把namespace改成namespace Admin\Controller即可,配置项如下:

2.8-2.10 HBase集成MapReduce

2.8-2.10 HBase集成MapReduce的更多相关文章

随机推荐

热门专题