2.8-2.10 HBase集成MapReduce

一、HBase集成MapReduce

1、查看HBase集成MapReduce需要的jar包

[root@hadoop-senior hbase-0.98.6-hadoop2]# bin/hbase mapredcp

2019-05-22 16:23:46,814 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

/opt/modules/hbase-0.98.6-hadoop2/lib/hbase-common-0.98.6-hadoop2.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/protobuf-java-2.5.0.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/hbase-client-0.98.6-hadoop2.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/hbase-hadoop-compat-0.98.6-hadoop2.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/hbase-server-0.98.6-hadoop2.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/hbase-protocol-0.98.6-hadoop2.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/high-scale-lib-1.1.1.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/zookeeper-3.4.5.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/guava-12.0.1.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/htrace-core-2.04.jar:

/opt/modules/hbase-0.98.6-hadoop2/lib/netty-3.6.6.Final.jar

2、

##开启yarn

[root@hadoop-senior hadoop-2.5.0]# sbin/yarn-daemon.sh start nodemanager

[root@hadoop-senior hadoop-2.5.0]# sbin/mr-jobhistory-daemon.sh start histryserver

[root@hadoop-senior hadoop-2.5.0]# sbin/mr-jobhistory-daemon.sh start historyserver

##HBase默认带的MapReduce程序都在hbase-server-0.98.6-hadoop2.jar里面，比较有用

[root@hadoop-senior hbase-0.98.6-hadoop2]# export HBASE_HOME=/opt/modules/hbase-0.98.6-hadoop2

[root@hadoop-senior hbase-0.98.6-hadoop2]# export HADOOP_HOME=/opt/modules/hadoop-2.5.0

[root@hadoop-senior hbase-0.98.6-hadoop2]# HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp` $HADOOP_HOME/bin/yarn jar $HBASE_HOME/lib/hbase-server-0.98.6-hadoop2.jar

An example program must be given as the first argument.

Valid program names are:

  CellCounter: Count cells in HBase table

  completebulkload: Complete a bulk data load.

  copytable: Export a table from local cluster to peer cluster

  export: Write table data to HDFS.

  import: Import data written by Export.

  importtsv: Import data in TSV format.

  rowcounter: Count rows in HBase table

  verifyrep: Compare the data from tables in two different clusters. WARNING: It doesn't work for incrementColumnValues'd cells since the timestamp is changed after being appended to the log.

#####

TSV

    tab分割

    >>student.tsv

    1001 zhangsan 26 shanghai 

CSV

    逗号分割

    >>student.csv

    1001，zhangsan，26，shanghai

二、编写MapReduce程序，集成HBase对表进行读取和写入数据

1、准备数据

##准备两张表，user:里面有数据，basic:没有数据

hbase(main):004:0> create 'basic', 'info'

0 row(s) in 0.4290 seconds

=> Hbase::Table – basic

hbase(main):005:0> list

TABLE

basic

user

2 row(s) in 0.0290 seconds

=> ["basic", "user"]

hbase(main):003:0> scan 'user'

ROW                                          COLUMN+CELL

 10002                                       column=info:age, timestamp=1558343570256, value=30

 10002                                       column=info:name, timestamp=1558343559457, value=wangwu

 10002                                       column=info:qq, timestamp=1558343612746, value=231294737

 10002                                       column=info:tel, timestamp=1558343607851, value=231294737

 10003                                       column=info:age, timestamp=1558577830484, value=35

 10003                                       column=info:name, timestamp=1558345826709, value=zhaoliu

 10004                                       column=info:address, timestamp=1558505387829, value=shanghai

 10004                                       column=info:age, timestamp=1558505387829, value=25

 10004                                       column=info:name, timestamp=1558505387829, value=zhaoliu

3 row(s) in 0.0190 seconds

hbase(main):006:0> scan 'basic'

ROW                                          COLUMN+CELL

0 row(s) in 0.0100 seconds

2、编写MapReduce，将user表中的数据导入到basic表中

package com.beifeng.senior.hadoop.hbase;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.conf.Configured;

import org.apache.hadoop.hbase.Cell;

import org.apache.hadoop.hbase.CellUtil;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.client.Mutation;

import org.apache.hadoop.hbase.client.Put;

import org.apache.hadoop.hbase.client.Result;

import org.apache.hadoop.hbase.client.Scan;

import org.apache.hadoop.hbase.io.ImmutableBytesWritable;

import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;

import org.apache.hadoop.hbase.mapreduce.TableMapper;

import org.apache.hadoop.hbase.mapreduce.TableReducer;

import org.apache.hadoop.hbase.util.Bytes;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.util.Tool;

import org.apache.hadoop.util.ToolRunner;

public class User2BasicMapReduce extends Configured implements Tool {

    // Mapper Class

    public static class ReadUserMapper extends TableMapper<Text, Put> {

        private Text mapOutputKey = new Text();

        @Override

        public void map(ImmutableBytesWritable key, Result value,

                Mapper<ImmutableBytesWritable, Result, Text, Put>.Context context)

                        throws IOException, InterruptedException {

            // get rowkey

            String rowkey = Bytes.toString(key.get());

            // set

            mapOutputKey.set(rowkey);

            // --------------------------------------------------------

            Put put = new Put(key.get());

            // iterator

            for (Cell cell : value.rawCells()) {

                // add family : info

                if ("info".equals(Bytes.toString(CellUtil.cloneFamily(cell)))) {

                    // add column: name

                    if ("name".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))) {

                        put.add(cell);

                    }

                    // add column : age

                    if ("age".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))) {

                        put.add(cell);

                    }

                }

            }

            // context write

            context.write(mapOutputKey, put);

        }

    }

    // Reducer Class

    public static class WriteBasicReducer extends TableReducer<Text, Put, //

    ImmutableBytesWritable> {

        @Override

        public void reduce(Text key, Iterable<Put> values,

                Reducer<Text, Put, ImmutableBytesWritable, Mutation>.Context context)

                        throws IOException, InterruptedException {

            for(Put put: values){

                context.write(null, put);

            }

        }

    }

    // Driver

    public int run(String[] args) throws Exception {

        // create job

        Job job = Job.getInstance(this.getConf(), this.getClass().getSimpleName());

        // set run job class

        job.setJarByClass(this.getClass());

        // set job

        Scan scan = new Scan();

        scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs

        scan.setCacheBlocks(false);  // don't set to true for MR jobs

        // set other scan attrs

        // set input and set mapper

        TableMapReduceUtil.initTableMapperJob(

          "user",        // input table

          scan,               // Scan instance to control CF and attribute selection

          ReadUserMapper.class,     // mapper class

          Text.class,         // mapper output key

          Put.class,  // mapper output value

          job //

         );

        // set reducer and output

        TableMapReduceUtil.initTableReducerJob(

          "basic",        // output table

          WriteBasicReducer.class,    // reducer class

          job//

         );

        job.setNumReduceTasks(1);   // at least one, adjust as required

        // submit job

        boolean isSuccess = job.waitForCompletion(true) ;

        return isSuccess ? 0 : 1;

    }

    public static void main(String[] args) throws Exception {

        // get configuration

        Configuration configuration = HBaseConfiguration.create();

        // submit job

        int status = ToolRunner.run(configuration,new User2BasicMapReduce(),args) ;

        // exit program

        System.exit(status);

    }

}

3、执行

##打jar包，并上传到$HADOOP_HOME/jars/

##执行

export HBASE_HOME=/opt/modules/hbase-0.98.6-hadoop2

export HADOOP_HOME=/opt/modules/hadoop-2.5.0

HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp` $HADOOP_HOME/bin/yarn jar $HADOOP_HOME/jars/hbase-mr-user2basic.jar

##查看执行结果

hbase(main):004:0> scan 'basic'

ROW                                          COLUMN+CELL

 10002                                       column=info:age, timestamp=1558343570256, value=30

 10002                                       column=info:name, timestamp=1558343559457, value=wangwu

 10003                                       column=info:age, timestamp=1558577830484, value=35

 10003                                       column=info:name, timestamp=1558345826709, value=zhaoliu

 10004                                       column=info:age, timestamp=1558505387829, value=25

 10004                                       column=info:name, timestamp=1558505387829, value=zhaoliu

3 row(s) in 0.0300 seconds

2.8-2.10 HBase集成MapReduce的更多相关文章

HBase概念学习（七）HBase与Mapreduce集成
这篇文章是看了HBase权威指南之后,依据上面的解说搬下来的样例,可是略微有些不一样. HBase与mapreduce的集成无非就是mapreduce作业以HBase表作为输入,或者作为输出,也或者作 ...
HBase 与 MapReduce 集成
6. HBase 与 MapReduce 集成 6.1 官方 HBase 与 MapReduce 集成查看 HBase 的 MapReduce 任务的执行:bin/hbase mapredcp; 环 ...
hbase运行mapreduce设置及基本数据加载方法
hbase与mapreduce集成后,运行mapreduce程序,同时需要mapreduce jar和hbase jar文件的支持,这时我们需要通过特殊设置使任务可以同时读取到hadoop jar和h ...
hive与hbase集成
http://blog.csdn.net/vah101/article/details/22597341 这篇文章最初是基于介绍HIVE-705.这个功能允许Hive QL命令访问HBase表,进行读 ...
Hbase框架原理及相关的知识点理解、Hbase访问MapReduce、Hbase访问Java API、Hbase shell及Hbase性能优化总结
转自:http://blog.csdn.net/zhongwen7710/article/details/39577431 本blog的内容包含: 第一部分:Hbase框架原理理解第二部分:Hbas ...
《HBase in Action》第三章节的学习总结 ---- 如何编写和运行基于HBase的MapReduce程序
HBase之所以与Hadoop是最好的伙伴,我理解就因为两点:1.HADOOP的HDFS,为HBase提供了分布式的存储方式:2.HADOOP的MR为HBase提供的分布式的计算方法.u 其中第一点, ...
3.12-3.16 Hbase集成hive、sqoop、hue
一.Hbase集成hive https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration 1.说明 Hive与HBase整合在一起 ...
新闻实时分析系统Hive与HBase集成进行数据分析 Cloudera HUE大数据可视化分析
1.Hue 概述及版本下载 1)概述 Hue是一个开源的Apache Hadoop UI系统,最早是由Cloudera Desktop演化而来,由Cloudera贡献给开源社区,它是基于Python ...
新闻实时分析系统Hive与HBase集成进行数据分析
(一)Hive 概述 (二)Hive在Hadoop生态圈中的位置 (三)Hive 架构设计 (四)Hive 的优点及应用场景 (五)Hive 的下载和安装部署 1.Hive 下载 Apache版本的H ...

随机推荐

MQTT 测试工具介绍
eclipse paho 下载地址为: https://repo.eclipse.org/content/repositories/paho-releases/org/eclipse/paho/org ...
使用Nightwatch.js做基于浏览器的web应用自动测试
1 安装 1.1 安装Node.js 在http://nodejs.org/ 上下载适合本机系统的安装包运行安装,注意安装选项中选择npm tool以用于后续依赖包的安装. 1.2 ...
MVC3-表单
[.NET Core已取消]Html.BeginForm() 该方法用于构建一个From表单的开始,他的构造方法为:Html.BeginForm("ActionName", &qu ...
Error: Cannot find module 'webpack'错误解决
$ npm install webpack -g $ npm install webpack-cli -g 全局安装webpack $ npm run dev Error: Cannot find m ...
Webview页面的控件元素定位
前言现在有很多App都是Hybrid的,即有原生的页面又有Webview的页面,元素的可以通过uiautomatorviewer工具进行控件元素的定位,Webview页面的则无法通过此方式定位,而 ...
ETL Automation完整安装方法_(元数据存放在mysql数据库)
安装前介质准备: DBI-1.636.tar.gz DBD-mysql-4.037.tar.gz ETL.tar mysql-5.6.12-linux-glibc2.5-x86_64.tar.gz P ...
51NOD 1962 区间计数单调栈+二分 / 线段树+扫描线
区间计数基准时间限制:1.5 秒空间限制:262144 KB 分值: 80 两个数列 {An} , {Bn} ,请求出Ans, Ans定义如下: Ans:=Σni=1Σnj=i[max{ ...
SAM4E单片机之旅——12、USART
清楚了UART的用法之后,现在来研究一下USART的用法.和上一次差不多,这次也通过USART的串口来实现和PC的通信.和上一次不同的是,USART本身就有接收超时的功能,所以这次就不用TC了. US ...
Hibernate表关系映射之一对多映射
一.基本概述在表中的一对多,是使用外键关联,通过一张表的一个键另一个表的外键来建立一多关系;而在类中表示为一个类中有一个集合属性包含对方类的很多对象,而在另一个类中,只包含前述类的一个对象,从而实现 ...
java手写单例模式
1 懒汉模式 public class Singleton { private Singleton singleton = null; private Singleton() { } public S ...

2.8-2.10 HBase集成MapReduce

2.8-2.10 HBase集成MapReduce的更多相关文章

随机推荐

热门专题