HBASE--MapReduce

1、查看 HBase 的 MapReduce 任务的执行

$ bin/hbase mapredcp

2、执行环境变量的导入

$ export HBASE_HOME= ~/hadoop_home/hbase-1.2.6
$ export HADOOP_HOME= ~/hadoop_home
$ export HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp`
本句的环境变量可以不用加入 ~/.profile里面

不过要在哪个shell里执行哪个才能用jar包

3) 运行官方的 MapReduce 任务
-- 案例一：统计 Student 表中有多少行数据
cd /home/hadoop/hadoop_home/hbase-1.2.6

（文件夹路径环境变量，只是使用没有配置，所以进入此路径下）

$ yarn jar lib/hbase-server-1.2.6.jar rowcounter student

案例二：使用 MapReduce 将本地数据导入到 HBase
(1) 在本地创建一个 tsv 格式的文件：fruit.tsv

1001 Apple Red
1002 Pear Yellow
1003 PineappleYellow

尖叫提示：上面的这个数据不要从文中直接复制，有格式错误
(2)

创建 HBase 表

hbase(main):001:0> create 'fruit','info
(3) 在 HDFS 中创建 input_fruit 文件夹并上传 fruit.tsv 文件

$ hdfs dfs -mkdir /input_fruit/
$ hdfs dfs -put fruit.tsv /input_fruit/
(4) 执行 MapReduce 到 HBase 的 fruit 表中
cd /home/hadoop/hadoop_home/hbase-1.2.6

$ yarn jar lib/hbase-server-1.2.6.jar importtsv
-Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:color fruit
hdfs://master:9000/input_fruit
(5) 使用 scan 命令查看导入后的结果

hbase(main):001:0> scan ‘fruit’
2.5.2、自定义 HBase-MapReduce1
目标：将 fruit 表中的一部分数据，通过 MR 迁入到 fruit_mr 表中。
分步实现：
1) 构建 ReadFruitMapper 类，用于读取 fruit 表中的数据
package com.yjsj.hbase_mr;
import java.io.IOException;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;

import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
public class ReadFruitMapper extends TableMapper<ImmutableBytesWritable, Put> {
　　@Override
　　protected void map(ImmutableBytesWritable key, Result value, Context context)
　　throws IOException, InterruptedException {
　　　　//将 fruit 的 name 和 color 提取出来，相当于将每一行数据读取出来放入到 Put 对象中。
　　　　Put put = new Put(key.get());
　　　　//遍历添加 column 行
　　　　for(Cell cell: value.rawCells()){
　　　　　　//添加/克隆列族:info
　　　　　　if("info".equals(Bytes.toString(CellUtil.cloneFamily(cell)))){
　　　　　　　　//添加/克隆列：name
　　　　　　　　if("name".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))){
　　　　　　　　　　//将该列 cell 加入到 put 对象中
　　　　　　　　　　put.add(cell);
　　　　　　　　//添加/克隆列:color
　　　　　　　　}else if("color".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))){
　　　　　　　　　　//向该列 cell 加入到 put 对象中
　　　　　　　　　　put.add(cell);
　　　　　　　　}
　　　　　　}
　　　　}

　　　　//将从 fruit 读取到的每行数据写入到 context 中作为 map 的输出
　　　　context.write(key, put);
　　}
}

2) 构建 WriteFruitMRReducer 类，用于将读取到的 fruit 表中的数据写入到 fruit_mr 表中

package com.yjsj.hbase_mr;
import java.io.IOException;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.io.NullWritable;
public class WriteFruitMRReducer extends TableReducer<ImmutableBytesWritable, Put,
NullWritable> {
　　@Override
　　protected void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context)
　　throws IOException, InterruptedException {
　　　　//读出来的每一行数据写入到 fruit_mr 表中
　　　　for(Put put: values){
　　　　　　context.write(NullWritable.get(), put);
　　　　}
　　}
}

3) 构建 Fruit2FruitMRRunner extends Configured implements Tool 用于组装运行 Job
任务
class Txt2FruitRunner extends Configured implements Tool {

　　//组装 Job
　　public int run(String[] args) throws Exception {

　　　　//得到 Configuration
　　　　Configuration conf = this.getConf();
　　　　//创建 Job 任务
　　　　Job job = Job.getInstance(conf, this.getClass().getSimpleName());
　　　　job.setJarByClass(Fruit2FruitMRRunner.class);
　　　　//配置 Job
　　　　Scan scan = new Scan();
　　　　scan.setCacheBlocks(false);
　　　　scan.setCaching(500);
　　　　//设置 Mapper，注意导入的是 mapreduce 包下的，不是 mapred 包下的，后者是老
　　　　版本
　　　　TableMapReduceUtil.initTableMapperJob(
　　　　　　"fruit", //数据源的表名
　　　　　　scan, //scan 扫描控制器
　　　　　　ReadFruitMapper.class,//设置 Mapper 类
　　　　　　ImmutableBytesWritable.class,//设置 Mapper 输出 key 类型
　　　　　　Put.class,//设置 Mapper 输出 value 值类型
　　　　　　job//设置给哪个 JOB
　　　　);
　　　　//设置 Reducer
　　　　TableMapReduceUtil.initTableReducerJob("fruit_mr", WriteFruitMRReducer.class,
　　　　job);
　　　　//设置 Reduce 数量，最少 1 个
　　　　job.setNumReduceTasks(1);
　　　　boolean isSuccess = job.waitForCompletion(true);

　　　　if(!isSuccess){
　　　　　　throw new IOException("Job running with error");
　　　　}
　　　　return isSuccess ? 0 : 1;
　　}

4) 主函数中调用运行该 Job 任务

public static void main( String[] args ) throws Exception{
　　Configuration conf = HBaseConfiguration.create();

　　conf = HBaseConfiguration.create();
　　conf.set("hbase.zookeeper.quorum", "master,node1,node2");
　　conf.set("hbase.zookeeper.property.clientPort", "2181");
　　conf.set("hbase.master", "master:60000");

　　int status = ToolRunner.run(conf, new Fruit2FruitMRRunner(), args);
　　System.exit(status);
}

注:加入zookeeper代码
conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "master,node1,node2");
conf.set("hbase.zookeeper.property.clientPort", "2181");
conf.set("hbase.master", "master:60000");

5) 打包运行任务（参考java打包，而且打包需要zip 格式一下）

$ ~/hadoop_home/bin/yarn jar ~/hadoop_home/TestHbase.jar
com.yjsj.hbase.mr1.Fruit2FruitMRRunner

尖叫提示：运行任务前，如果待数据导入的表不存在，则需要提前创建之。
尖叫提示：maven 打包命令：-P local clean package 或-P dev clean package install（将第三方
jar 包一同打包，需要插件：maven-shade-plugin）

2.5.3、自定义HBase-MapReduce2
目标：实现将 HDFS 中的数据写入到 HBase 表中。
分步实现：
1) 构建 ReadFruitFromHDFSMapper 于读取 HDFS 中的文件数据

package com.yjsj.hbase_mr2;
import java.io.IOException;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;

import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class ReadFruitFromHDFSMapper extends Mapper<LongWritable, Text,
ImmutableBytesWritable, Put> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException {
//从 HDFS 中读取的数据
String lineValue = value.toString();
//读取出来的每行数据使用\t 进行分割，存于 String 数组
String[] values = lineValue.split("\t");
//根据数据中值的含义取值
String rowKey = values[0];
String name = values[1];
String color = values[2];
//初始化 rowKey
ImmutableBytesWritable rowKeyWritable = new
ImmutableBytesWritable(Bytes.toBytes(rowKey));
//初始化 put 对象
Put put = new Put(Bytes.toBytes(rowKey));
//参数分别:列族、列、值

put.add(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes(name));
put.add(Bytes.toBytes("info"), Bytes.toBytes("color"), Bytes.toBytes(color));
context.write(rowKeyWritable, put);
}
}

2) 构建 WriteFruitMRFromTxtReducer 类

package com.yjsj.hbase_mr2;
import java.io.IOException;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.io.NullWritable;
public class WriteFruitMRFromTxtReducer extends TableReducer<ImmutableBytesWritable, Put,
NullWritable> {
@Override
protected void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context)
throws IOException, InterruptedException {
//读出来的每一行数据写入到 fruit_hdfs 表中
for(Put put: values){
context.write(NullWritable.get(), put);
}
}
}

3) 创建 Txt2FruitRunner 组装 Job

class Txt2FruitRunner extends Configured implements Tool {

public int run(String[] args) throws Exception {
//得到 Configuration
Configuration conf = this.getConf();
//创建 Job 任务
Job job = Job.getInstance(conf, this.getClass().getSimpleName());
job.setJarByClass(Txt2FruitRunner.class);
Path inPath = new Path("hdfs://master:9000/input_fruit/fruit.tsv");
FileInputFormat.addInputPath(job, inPath);
//设置 Mapper
job.setMapperClass(ReadFruitFromHDFSMapper.class);
job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(Put.class);
//设置 Reducer
TableMapReduceUtil.initTableReducerJob("fruit_hdfs", WriteFruitMRFromTxtReducer.class, job);
//设置 Reduce 数量，最少 1 个
job.setNumReduceTasks(1);
boolean isSuccess = job.waitForCompletion(true);
if(!isSuccess){
throw new IOException("Job running with error");
}
return isSuccess ? 0 : 1;
}

4) 调用执行 Job

public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();

conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "master,node1,node2");
conf.set("hbase.zookeeper.property.clientPort", "2181");
conf.set("hbase.master", "master:60000");

int status = ToolRunner.run(conf, new Txt2FruitRunner(), args);
System.exit(status);
}

注:
conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "master,node1,node2");
conf.set("hbase.zookeeper.property.clientPort", "2181");
conf.set("hbase.master", "master:60000");

5) 打包运行（参考java打包，而且打包需要zip 格式一下）

$ ~/hadoop_home/bin/yarn jar ~/softwares/jars/hbase-0.0.1-SNAPSHOT.jar
com.z.hbase.mr2.Txt2FruitRunner

2.6、与 Hive 的集成
2.6.1、HBase 与 Hive 的对比
1) Hive
(1) 数据仓库
Hive 的本质其实就相当于将 HDFS 中已经存储的文件在 Mysql 中做了一个双射关系，以方
便使用 HQL 去管理查询。
(2) 用于数据分析、清洗
Hive 适用于离线的数据分析和清洗，延迟较高。
(3) 基于 HDFS、MapReduce
Hive 存储的数据依旧在 DataNode 上，编写的 HQL 语句终将是转换为 MapReduce 代码执行。
2) HBase
(1) 数据库
是一种面向列存储的非关系型数据库。
(2) 用于存储结构化和非结构话的数据适用于单表非关系型数据的存储，不适合做关联查

(3) 基于 HDFS
数据持久化存储的体现形式是 Hfile，存放于 DataNode 中，被 ResionServer 以 region 的形式
进行管理。
(4) 延迟较低，接入在线业务使用
面对大量的企业数据，HBase 可以直线单表大量数据的存储，同时提供了高效的数据访问速
度。
2.6.2、HBase 与 Hive 集成使用
尖叫提示：HBase 与 Hive 的集成在最新的两个版本中无法兼容。所以，我们只能含着泪勇
敢的重新编译：hive-hbase-handler-1.2.2.jar！！好气！！
环境准备
因为我们后续可能会在操作 Hive 的同时对 HBase 也会产生影响，所以 Hive 需要持有操作
HBase 的 Jar，那么接下来拷贝 Hive 所依赖的 Jar 包（或者使用软连接的形式）。

HBASE--MapReduce的更多相关文章

【Hbase学习之五】HBase MapReduce
环境虚拟机:VMware 10 Linux版本:CentOS-6.5-x86_64 客户端:Xshell4 FTP:Xftp4 jdk8 hadoop-2.6.5 hbase-0.98.12.1-h ...
HBase MapReduce 一些 ClassNotFoundException 所缺少的jar包
我们在用 java 操作 HBase 时,可能会出现相关的 ClassNotFoundException 等异常信息,但是我们又不想把 HBase lib 下的所有jar包全部导入到工程,因为会有 ...
org/apache/hadoop/hbase/mapreduce/TableReducer:Unsupported major.minor version52.0
问题详情: 问题原因: <dependency> <groupId>org.apache.hbase</groupId> <artifactId& ...
Mapreduce的文件和hbase共同输入
Mapreduce的文件和hbase共同输入 package duogemap; import java.io.IOException; import org.apache.hadoop.co ...
HBase with MapReduce （MultiTable Read）
hbase当中没有两表联查的操作,要实现两表联查或者在查询一个表的同时也需要访问另外一张表的时候,可以通过mapreduce的方式来实现,实现方式如下:由于查询是map过程,因此这个过程不需要设计re ...
HBase with MapReduce （SummaryToFile）
上一篇文章是实现统计hbase单元值出现的个数,并将结果存放到hbase的表中,本文是将结果存放到hdfs上.其中的map实现与前文一直,连接:http://www.cnblogs.com/ljy20 ...
HBase with MapReduce （Summary）
我们知道,hbase没有像关系型的数据库拥有强大的查询功能和统计功能,本文实现了如何利用mapreduce来统计hbase中单元值出现的个数,并将结果携带目标的表中, (1)mapper的实现 pac ...
HBase with MapReduce （Read and Write）
上面一篇文章仅仅是介绍如何通过mapReduce来对HBase进行读的过程,下面将要介绍的是利用mapreduce进行读写的过程,前面我们已经知道map实际上是读过程,reduce是写的过程,然而ma ...
HBase with MapReduce （Only Read）
最近在学习HBase,在看到了如何使用Mapreduce来操作Hbase,下面将几种情况介绍一下,具体的都可以参照官网上的文档说明.官网文档连接:http://hbase.apache.org/boo ...
hbase与mapreduce集成
一:运行给定的案例 1.获取jar包里的方法 2.运行hbase自带的mapreduce程序 lib/hbase-server-0.98.6-hadoop2.jar 3.具体运行 4.运行一个小方法 ...

随机推荐

Android利用百度云来识别身份证及各种证件的信息
上一篇中我已经介绍过了ocr,及google出来的tess-two的使用. 接下来我来介绍一个更方便的身份证识别系统,当然它本身也是利用ocr来识别文字的,不过它处理的更好,可以为我们提供更快,更准确 ...
Linux 上通过binlog文件恢复mysql 数据库详细步骤
一.binlog 介绍服务器的二进制日志记录着该数据库的所有增删改的操作日志(前提是要在自己的服务器上开启binlog),还包括了这些操作的执行时间.为了显示这些二进制内容,我们可以使用mysqlb ...
php.ini文件中的include_path设置
下面以安装smarty为例: 下面内容中,我们都是假设你的文件放在了D:\Appserv\www\Smarty下. 1.找到你的php.ini配置文件修改php.ini的include_path选项, ...
oringin 画图
oringin做图输出矢量图方法: 右击图区,选择copy page 在Word文档中直接粘贴即可 oringin做图调整图边距: tool->option->page->margi ...
星型打分插件 bootstrap-rating-input
最近帮人实现一个打分的功能,发现bootstrap-rating-input是个简单又好用的星型打分,我对其做了些定制,添加了分值说明,并修改了样式,毕竟 bootstrap 自身的黑色五角星还是不够 ...
cdn path b 问题
主节点内存和磁盘最好大点,许多默认东西都放主节点了 mysql 配置文件修改后server-id = 1 1.hive 启动不起来去配置里关掉严格的 Hive Metastore 架构验证 hiv ...
RedHat Linux设置yum软件源为本地ISO
先挂载ISO到某个目录下(如我的:/media/RHEL_6.0 x86_64 Disc 1) # mount –o loop rhel-server-6.4-x86_64-dvd.iso /medi ...
高性能Web服务器Nginx的配置与部署研究（10）核心模块之HTTP模块Location相关指令
一.基本语法语法:location [= | ~ | ~* | ^~] </uri/> {...} 缺省:N/A 作用域:server 二.匹配规则 1. 四种匹配方式 = 精确匹配 ~ ...
java反射对实体类取值和赋值，可以写成通过实体类获取其他元素的数据，很方便哦~~~
项目中需要过滤前面表单页面中传过来的实体类的中的String类型变量的前后空格过滤,由于前几天看过一个其他技术博客的的java反射讲解,非常受益.于是,哈哈哈 public static <T& ...
C++——代码运行过程详解
#include <iostream> using namespace std; ;//初始化的全局变量:保存在数据段 char *p1;//未初始化的全局变量:保存在BSS段 int m ...

HBASE--MapReduce

HBASE--MapReduce的更多相关文章

随机推荐

热门专题