一、HBase集成MapReduce

1、查看HBase集成MapReduce需要的jar包

[root@hadoop-senior hbase-0.98.6-hadoop2]# bin/hbase mapredcp
2019-05-22 16:23:46,814 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
/opt/modules/hbase-0.98.6-hadoop2/lib/hbase-common-0.98.6-hadoop2.jar:
/opt/modules/hbase-0.98.6-hadoop2/lib/protobuf-java-2.5.0.jar:
/opt/modules/hbase-0.98.6-hadoop2/lib/hbase-client-0.98.6-hadoop2.jar:
/opt/modules/hbase-0.98.6-hadoop2/lib/hbase-hadoop-compat-0.98.6-hadoop2.jar:
/opt/modules/hbase-0.98.6-hadoop2/lib/hbase-server-0.98.6-hadoop2.jar:
/opt/modules/hbase-0.98.6-hadoop2/lib/hbase-protocol-0.98.6-hadoop2.jar:
/opt/modules/hbase-0.98.6-hadoop2/lib/high-scale-lib-1.1.1.jar:
/opt/modules/hbase-0.98.6-hadoop2/lib/zookeeper-3.4.5.jar:
/opt/modules/hbase-0.98.6-hadoop2/lib/guava-12.0.1.jar:
/opt/modules/hbase-0.98.6-hadoop2/lib/htrace-core-2.04.jar:
/opt/modules/hbase-0.98.6-hadoop2/lib/netty-3.6.6.Final.jar

2、

##开启yarn
[root@hadoop-senior hadoop-2.5.0]# sbin/yarn-daemon.sh start nodemanager
[root@hadoop-senior hadoop-2.5.0]# sbin/mr-jobhistory-daemon.sh start histryserver
[root@hadoop-senior hadoop-2.5.0]# sbin/mr-jobhistory-daemon.sh start historyserver ##HBase默认带的MapReduce程序都在hbase-server-0.98.6-hadoop2.jar里面,比较有用 [root@hadoop-senior hbase-0.98.6-hadoop2]# export HBASE_HOME=/opt/modules/hbase-0.98.6-hadoop2
[root@hadoop-senior hbase-0.98.6-hadoop2]# export HADOOP_HOME=/opt/modules/hadoop-2.5.0
[root@hadoop-senior hbase-0.98.6-hadoop2]# HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp` $HADOOP_HOME/bin/yarn jar $HBASE_HOME/lib/hbase-server-0.98.6-hadoop2.jar An example program must be given as the first argument.
Valid program names are:
CellCounter: Count cells in HBase table
completebulkload: Complete a bulk data load.
copytable: Export a table from local cluster to peer cluster
export: Write table data to HDFS.
import: Import data written by Export.
importtsv: Import data in TSV format.
rowcounter: Count rows in HBase table
verifyrep: Compare the data from tables in two different clusters. WARNING: It doesn't work for incrementColumnValues'd cells since the timestamp is changed after being appended to the log. #####
TSV
tab分割
>>student.tsv
1001 zhangsan 26 shanghai CSV
逗号分割
>>student.csv
1001,zhangsan,26,shanghai

二、编写MapReduce程序,集成HBase对表进行读取和写入数据

1、准备数据

##准备两张表,user:里面有数据,basic:没有数据
hbase(main):004:0> create 'basic', 'info'
0 row(s) in 0.4290 seconds
=> Hbase::Table – basic hbase(main):005:0> list
TABLE
basic
user
2 row(s) in 0.0290 seconds
=> ["basic", "user"] hbase(main):003:0> scan 'user'
ROW COLUMN+CELL
10002 column=info:age, timestamp=1558343570256, value=30
10002 column=info:name, timestamp=1558343559457, value=wangwu
10002 column=info:qq, timestamp=1558343612746, value=231294737
10002 column=info:tel, timestamp=1558343607851, value=231294737
10003 column=info:age, timestamp=1558577830484, value=35
10003 column=info:name, timestamp=1558345826709, value=zhaoliu
10004 column=info:address, timestamp=1558505387829, value=shanghai
10004 column=info:age, timestamp=1558505387829, value=25
10004 column=info:name, timestamp=1558505387829, value=zhaoliu
3 row(s) in 0.0190 seconds hbase(main):006:0> scan 'basic'
ROW COLUMN+CELL
0 row(s) in 0.0100 seconds

2、编写MapReduce,将user表中的数据导入到basic表中

package com.beifeng.senior.hadoop.hbase;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Mutation;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner; public class User2BasicMapReduce extends Configured implements Tool { // Mapper Class
public static class ReadUserMapper extends TableMapper<Text, Put> { private Text mapOutputKey = new Text(); @Override
public void map(ImmutableBytesWritable key, Result value,
Mapper<ImmutableBytesWritable, Result, Text, Put>.Context context)
throws IOException, InterruptedException {
// get rowkey
String rowkey = Bytes.toString(key.get()); // set
mapOutputKey.set(rowkey); // --------------------------------------------------------
Put put = new Put(key.get()); // iterator
for (Cell cell : value.rawCells()) {
// add family : info
if ("info".equals(Bytes.toString(CellUtil.cloneFamily(cell)))) {
// add column: name
if ("name".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))) {
put.add(cell);
}
// add column : age
if ("age".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))) {
put.add(cell);
}
}
} // context write
context.write(mapOutputKey, put);
} } // Reducer Class
public static class WriteBasicReducer extends TableReducer<Text, Put, //
ImmutableBytesWritable> { @Override
public void reduce(Text key, Iterable<Put> values,
Reducer<Text, Put, ImmutableBytesWritable, Mutation>.Context context)
throws IOException, InterruptedException {
for(Put put: values){
context.write(null, put);
}
} } // Driver
public int run(String[] args) throws Exception { // create job
Job job = Job.getInstance(this.getConf(), this.getClass().getSimpleName()); // set run job class
job.setJarByClass(this.getClass()); // set job
Scan scan = new Scan();
scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false); // don't set to true for MR jobs
// set other scan attrs // set input and set mapper
TableMapReduceUtil.initTableMapperJob(
"user", // input table
scan, // Scan instance to control CF and attribute selection
ReadUserMapper.class, // mapper class
Text.class, // mapper output key
Put.class, // mapper output value
job //
); // set reducer and output
TableMapReduceUtil.initTableReducerJob(
"basic", // output table
WriteBasicReducer.class, // reducer class
job//
); job.setNumReduceTasks(1); // at least one, adjust as required // submit job
boolean isSuccess = job.waitForCompletion(true) ; return isSuccess ? 0 : 1;
} public static void main(String[] args) throws Exception {
// get configuration
Configuration configuration = HBaseConfiguration.create(); // submit job
int status = ToolRunner.run(configuration,new User2BasicMapReduce(),args) ; // exit program
System.exit(status);
} }

3、执行

##打jar包,并上传到$HADOOP_HOME/jars/

##执行
export HBASE_HOME=/opt/modules/hbase-0.98.6-hadoop2
export HADOOP_HOME=/opt/modules/hadoop-2.5.0
HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp` $HADOOP_HOME/bin/yarn jar $HADOOP_HOME/jars/hbase-mr-user2basic.jar ##查看执行结果
hbase(main):004:0> scan 'basic'
ROW COLUMN+CELL
10002 column=info:age, timestamp=1558343570256, value=30
10002 column=info:name, timestamp=1558343559457, value=wangwu
10003 column=info:age, timestamp=1558577830484, value=35
10003 column=info:name, timestamp=1558345826709, value=zhaoliu
10004 column=info:age, timestamp=1558505387829, value=25
10004 column=info:name, timestamp=1558505387829, value=zhaoliu
3 row(s) in 0.0300 seconds

2.8-2.10 HBase集成MapReduce的更多相关文章

  1. HBase概念学习(七)HBase与Mapreduce集成

    这篇文章是看了HBase权威指南之后,依据上面的解说搬下来的样例,可是略微有些不一样. HBase与mapreduce的集成无非就是mapreduce作业以HBase表作为输入,或者作为输出,也或者作 ...

  2. HBase 与 MapReduce 集成

    6. HBase 与 MapReduce 集成 6.1 官方 HBase 与 MapReduce 集成 查看 HBase 的 MapReduce 任务的执行:bin/hbase mapredcp; 环 ...

  3. hbase运行mapreduce设置及基本数据加载方法

    hbase与mapreduce集成后,运行mapreduce程序,同时需要mapreduce jar和hbase jar文件的支持,这时我们需要通过特殊设置使任务可以同时读取到hadoop jar和h ...

  4. hive与hbase集成

    http://blog.csdn.net/vah101/article/details/22597341 这篇文章最初是基于介绍HIVE-705.这个功能允许Hive QL命令访问HBase表,进行读 ...

  5. Hbase框架原理及相关的知识点理解、Hbase访问MapReduce、Hbase访问Java API、Hbase shell及Hbase性能优化总结

    转自:http://blog.csdn.net/zhongwen7710/article/details/39577431 本blog的内容包含: 第一部分:Hbase框架原理理解 第二部分:Hbas ...

  6. 《HBase in Action》 第三章节的学习总结 ---- 如何编写和运行基于HBase的MapReduce程序

    HBase之所以与Hadoop是最好的伙伴,我理解就因为两点:1.HADOOP的HDFS,为HBase提供了分布式的存储方式:2.HADOOP的MR为HBase提供的分布式的计算方法.u 其中第一点, ...

  7. 3.12-3.16 Hbase集成hive、sqoop、hue

    一.Hbase集成hive https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration 1.说明 Hive与HBase整合在一起 ...

  8. 新闻实时分析系统Hive与HBase集成进行数据分析 Cloudera HUE大数据可视化分析

    1.Hue 概述及版本下载 1)概述 Hue是一个开源的Apache Hadoop UI系统,最早是由Cloudera Desktop演化而来,由Cloudera贡献给开源社区,它是基于Python ...

  9. 新闻实时分析系统Hive与HBase集成进行数据分析

    (一)Hive 概述 (二)Hive在Hadoop生态圈中的位置 (三)Hive 架构设计 (四)Hive 的优点及应用场景 (五)Hive 的下载和安装部署 1.Hive 下载 Apache版本的H ...

随机推荐

  1. start-dfs.sh 和 start-all.sh的区别

    start-dfs.sh 只启动namenode 和datanode, start-all.sh还包括yarn的resourcemanager 和nodemanager 之前就所以因为只启动了star ...

  2. adb的经常使用命令(android debud bridge)

    android调试桥: adb命令使用须要在系统环境遍历中path中追加adb.exe的完整路径D:\IDE\adt-bundle-windows-x86-20130729\sdk\platform- ...

  3. xpath 轴,节点之间的关系

    http://www.w3school.com.cn/xpath/xpath_axes.asp http://www.freeformatter.com/xpath-tester.html 测试 轴可 ...

  4. Struts2 (三) (转载)

    前面一直在说Action可以是一个普通的Java类,与Servlet API完全分离,但是为了实现业务逻辑,Action需要使用HttpServletRequest内容.Struts 2设计的精巧之处 ...

  5. 用JS写九九乘法表

    本来JS部分觉得就不是很好,结果经过一个寒假,在家的日子过的太舒适,基本把学的都快忘干净了,今天老师一说九九乘法表,除了脑子里浮现出要满足的条件,其他的都不记得了,赶快整理了一下: <scrip ...

  6. live555中fDurationInMicroseconds的计算

    live555中fDurationInMicroseconds表示单个视频或者音频帧所占用的时间间隔,也表示在fDurationInMicroseconds微秒时间后再次向Source进行getNex ...

  7. c# vs2010 连接access数据库

    第一次在博客园写博文,由于文采不怎么好,即使是自己很熟悉的东西,写起来也会感觉到不知从何讲起,我想写的多了就好了. 这篇文章主要是介绍怎么用c# 语言 vs2010连接access数据库的,连接字符串 ...

  8. 20181125 test

    1.对一些不确定性的东西,还是需要一些勇气进行尝试 2.多阅读blog上的大咖的文章肯定会有提高的

  9. 支付宝cookie 是支付密码 不是登录密码

    开发文档/ 手机网站支付 / 产品介绍 开放平台文档中心 https://docs.open.alipay.com/203/105288

  10. Asynchronous programming with async and await (C#)

    Asynchronous Programming with async and await (C#) | Microsoft Docs https://docs.microsoft.com/en-us ...