HBase 与 MapReduce 集成

6. HBase 与 MapReduce 集成

6.1 官方 HBase 与 MapReduce 集成

查看 HBase 的 MapReduce 任务的执行:bin/hbase mapredcp;
环境变量的导入
1. 临时生效,在命令行执行操作:
  - export HBASE_HOME=/opt/module/hbase-1.3.4;
  - export HADOOP_HOME=/opt/module/hadoop-2.8.5;
  - export HADOOP_CLASSPATH=${HBASE_HOME}/bin/hbase mapredcp;
2. 永久生效,在/etc/profile配置
  - export HBASE_HOME=/opt/module/hbase-1.3.4;
  - export HADOOP_HOME=/opt/module/hadoop-2.8.5;
  - 并在hadoop-env.sh配置:export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/module/hbase/lib/*
运行官方的 MapReduce 任务

// ===== 案例一:统计Student表中有多少行数据 (`opt/module/hbase-1.3.4/` 目录下)

/opt/module/hadoop-2.8.5/bin/yarn jar ./lib/hbase-server-1.3.4.jar rowcounter student

// ===== 案例二:使用 MapReduce 将本地数据导入到 HBASE

// 1. 本地创建一个fruit.tsv文件

1001    Apple   Red

1002    Pear    Yellow

1003    Pineapple   Yellow

// 2. 创建 HBase 表

create 'fruit','info'

// 3. 在 HDFS 中创建 input_fruit 文件夹并上传 fruit.tsv 文件

/opt/module/hadoop-2.8.5/bin/hdfs dfs -mkdir /input_fruit

/opt/module/hadoop-2.8.5/bin/hdfs dfs -put fruit.tsv /input_fruit/

// 4. 执行 MapReduce, 将 fruit.tsv 导入到 HBase 的 fruit 表中

/opt/module/hadoop-2.8.5/bin/yarn jar ./lib/hbase-server-1.3.4.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:color fruit hdfs://IP地址/input_fruit

6.2 自定义HBase-MapReduce

需求:将 fruit 表中的部分数据,通过MR迁入到 fruit_mr 表中

// 1. 创建 FruitMapper 类,用于读取 fruit 表中的数据

public class FruitMapper extends TableMapper<ImmutableBytesWritable, Put>{

	@Override

	protected void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException {

		// 创建put对象

		Put put = new Put(key.get());

		Cell[] cells = value.rawCells();

		for(Cell cell : cells) {

			if("name".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))) {

				put.add(cell);

			}

		}

		context.write(key, put);

	}

}

// 2. 创建 FruitReducer 类,用于写入

public class FruitReducer extends TableReducer<ImmutableBytesWritable, Put, NullWritable>{

	@Override

	protected void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context) throws IOException, InterruptedException {

		for (Put value : values) {

			context.write(NullWritable.get(), value);

		}

	}

}

// 3. 创建 FruitDriver 类,用于执行 mapper 和 reducer

public class FruitDriver extends Configuration implements Tool{

	private Configuration configuration = null;

	@Override

	public void setConf(Configuration conf) {

		this.configuration = conf;

	}

	@Override

	public Configuration getConf() {

		return configuration;

	}

	@Override

	public int run(String[] args) throws Exception {

		// 获取任务对象

		Job job = Job.getInstance(configuration);

		// 指定 Driver类

		job.setJarByClass(FruitDriver.class);

		// 指定 Mapper

		TableMapReduceUtil.initTableMapperJob("fruit", new Scan(), FruitMapper.class, ImmutableBytesWritable.class, Put.class, job);

		// 指定 Reducer

		TableMapReduceUtil.initTableReducerJob("fruit_mr", FruitReducer.class, job);

		// 提交

		boolean result = job.waitForCompletion(true);

		return result ? 0 : 1;

	}

	public static void main(String[] args) throws Exception {

		Configuration configuration = HBaseConfiguration.create();

		ToolRunner.run(configuration, new FruitDriver(), args);

	}

}

// 4. 打成 fruit.jar包

// 5. HBase 中创建 fruit_mr 表

create 'fruit_mr','info'

// 6. 在 /opt/module/hbase 中执行:

/opt/module/hadoop-2.8.5/bin/yarn jar ./fruit.jar com.noodles.mr1.FruitDriver(Driver的类名)

6.3 自定义 HBase-MapReduce2

需求:实现将 HDFS 中的数据写入到 HBase 表中

// 1. 创建 Mapper, 用于读取 HDFS 上的文件

public class HDFSMapper extends Mapper<LongWritable, Text, NullWritable, Put>{

	@Override

	protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, NullWritable, Put>.Context context)

			throws IOException, InterruptedException {

		// 获取一行数据

		String line = value.toString();

		// 切割

		String[] split = line.split("\t");

		// 封装 Put 对象

		Put put = new Put(Bytes.toBytes(split[0]));

		put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes(split[1]));

		put.addColumn(Bytes.toBytes("info"), Bytes.toBytes("color"), Bytes.toBytes(split[2]));

		// 写出去

		context.write(NullWritable.get(), put);

	}

}

// 2. 创建 Reducer, 用于写入

public class HDFSReducer extends TableReducer<NullWritable, Put, NullWritable>{

	@Override

	protected void reduce(NullWritable key, Iterable<Put> values,

			Reducer<NullWritable, Put, NullWritable, Mutation>.Context context) throws IOException, InterruptedException {

		// 写出数据

		for(Put value : values) {

			context.write(NullWritable.get(), value);

		}

	}

}

// 3. 创建Driver

public class HDFSDriver extends Configuration implements Tool{

	private Configuration configuration = null;

	@Override

	public void setConf(Configuration conf) {

		this.configuration = conf;

	}

	@Override

	public Configuration getConf() {

		return configuration;

	}

	@Override

	public int run(String[] args) throws Exception {

		// 获取 Job 对象

		Job job = Job.getInstance(configuration);

		// 设置主类

		job.setJarByClass(HDFSDriver.class);

		// 设置 Mapper

		job.setMapperClass(HDFSMapper.class);

		job.setMapOutputKeyClass(NullWritable.class);

		job.setMapOutputValueClass(Put.class);

		// 设置 Reducer

		TableMapReduceUtil.initTableReducerJob("fruit2", HDFSReducer.class, job);

        // 设置输入路径

		// import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

		FileInputFormat.setInputPaths(job, args[0]);

		// 提交

		boolean result = job.waitForCompletion(true);

		return result ? 0 : 1;

	}

	public static void main(String[] args) throws Exception {

		Configuration configuration = HBaseConfiguration.create();

		ToolRunner.run(configuration, new HDFSDriver(), args);

	}

}

// 4. 打成 fruit.jar包

// 5. HBase 中创建 fruit2 表

create 'fruit2','info'

// 6. 在 /opt/module/hbase 中执行:

/opt/module/hadoop-2.8.5/bin/yarn jar ./fruit.jar com.noodles.mr2.HDFSDriver(Driver的类名) /input_fruit/fruit.tsv(文件路径)

HBase 与 MapReduce 集成的更多相关文章

HBase概念学习（七）HBase与Mapreduce集成
这篇文章是看了HBase权威指南之后,依据上面的解说搬下来的样例,可是略微有些不一样. HBase与mapreduce的集成无非就是mapreduce作业以HBase表作为输入,或者作为输出,也或者作 ...
hbase与mapreduce集成
一:运行给定的案例 1.获取jar包里的方法 2.运行hbase自带的mapreduce程序 lib/hbase-server-0.98.6-hadoop2.jar 3.具体运行 4.运行一个小方法 ...
074 hbase与mapreduce集成
一:运行给定的案例 1.获取jar包里的方法 2.运行hbase自带的mapreduce程序 lib/hbase-server-0.98.6-hadoop2.jar 3.具体运行注意命令:mapre ...
【HBase】HBase与MapReduce集成——从HDFS的文件读取数据到HBase
目录需求步骤一.创建maven工程,导入jar包二.开发MapReduce程序三.结果需求将HDFS路径 /hbase/input/user.txt 文件的内容读取并写入到HBase 表 ...
hbase运行mapreduce设置及基本数据加载方法
hbase与mapreduce集成后,运行mapreduce程序,同时需要mapreduce jar和hbase jar文件的支持,这时我们需要通过特殊设置使任务可以同时读取到hadoop jar和h ...
【HBase】HBase与MapReduce的集成案例
目录需求步骤一.创建maven工程,导入jar包二.开发MapReduce程序三.运行结果 HBase与MapReducer集成官方帮助文档:http://archive.cloudera. ...
大数据技术之_11_HBase学习_02_HBase API 操作 + HBase 与 Hive 集成 + HBase 优化
第6章 HBase API 操作6.1 环境准备6.2 HBase API6.2.1 判断表是否存在6.2.2 抽取获取 Configuration.Connection.Admin 对象的方法以及关 ...
Hbase与hive集成与对比
HBase与Hive的对比 1．Hive (1) 数据仓库 Hive的本质其实就相当于将HDFS中已经存储的文件在Mysql中做了一个双射关系,以方便使用HQL去管理查询. (2) 用于数据分析.清洗 ...
《OD大数据实战》HBase整合MapReduce和Hive
一.HBase整合MapReduce环境搭建 1. 搭建步骤1)在etc/hadoop目录中创建hbase-site.xml的软连接.在真正的集群环境中的时候,hadoop运行mapreduce会通过 ...

随机推荐

（1）前端框架uni-app
前端框架uni-app 可编译到iOS.Android.H5.小程序等多个平台一套代码编到7个平台 uni-app在跨端数量.扩展能力.性能体验.周边生态.学习成本.开发成本等6大关键指标上拥有极强 ...
BAT 批量执行SQL脚本
需要在BAT的sqlcmd中设置数据库连接信息. https://files.cnblogs.com/files/gguozhenqian/BAT%E6%89%A7%E8%A1%8CSQL%E8%84 ...
模板 - 数学 - 数论 - 扩展Euler定理
费马(Fermat)小定理当 $p$ 为质数,则 $a^{p-1}\equiv 1 \mod p$ 反之,费马小定理的逆定理不成立,这样的数叫做伪质数,最小的伪质数是341. 欧拉(Eule ...
[代码审计]php反序列化漏洞
0x01 php面向对象简介对象:可以对其做事情的一些东西.一个对象有状态.行为和标识三种属性. 类:一个共享相同结构和行为的对象的集合. 每个类的定义都以关键字class开头,后面跟着类的名字. ...
Java设计模式之三建造者模式和原型模式
建造者模式简介建造者模式是属于创建型模式.建造者模式使用多个简单的对象一步一步构建成一个复杂的对象.这种类型的设计模式属于创建型模式,它提供了一种创建对象的最佳方式.简单的来说就是将一个复杂的东西 ...
Scikit-Learn 机器学习笔记 -- 线性回归、逻辑回归、softma
import numpy as np from matplotlib import pyplot as plt # 创建线性回归数据集 def create_dataset(): X = 2 * ...
详解Unity Profiler内存分析问题
在使用Unity开发游戏的过程中,借助Profiler来分析内存使用状况是至关重要的.但许多开发者可能还对Profiler中各项数据表示的含义不甚明确,今天我们Unity官方的技术工程师柳振东,将针对 ...
ubuntu16.04 18.04 Qt5.11安装Gstreamer
最近因为要做跨平台的视频传输,需要用到linux的解码器,真的是搞死我了大概讲一下我现在的平台是ubuntu16.04 Qt5.11 ,我现在需要在我的程序中使用视频这一块,无奈linux中,Qt支 ...
响应面分析 | response surface analysis | R代码
先开题,慢慢补充. 参考: 什么是响应面(RSM)分析 Response-Surface Methods in R, Using rsm In-class Examples with R Code R ...
Gis基础知识，坐标投影
1. 大地测量学 (Geodesy) 大地测量学是一门量测和描绘地球表面的学科,也包括确定地球重力场和海底地形. 1.1 大地水准面 (geoid) 大地水准面是海洋表面在排除风力.潮汐等其它影响,只 ...

HBase 与 MapReduce 集成

6. HBase 与 MapReduce 集成

6.1 官方 HBase 与 MapReduce 集成

6.2 自定义HBase-MapReduce

6.3 自定义 HBase-MapReduce2

HBase 与 MapReduce 集成的更多相关文章

随机推荐

热门专题