Hadoop之——HBASE结合MapReduce批量导入数据

转载请注明出处:http://blog.csdn.net/l1028386804/article/details/46463889

废话不多说。直接上代码，你懂得

package hbase;

import java.text.SimpleDateFormat;

import java.util.Date;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.client.Put;

import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;

import org.apache.hadoop.hbase.mapreduce.TableReducer;

import org.apache.hadoop.hbase.util.Bytes;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.NullWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Counter;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

/**

 * HBASE结合MapReduce批量导入

 * @author liuyazhuang

 */

public class BatchImport {

	static class BatchImportMapper extends Mapper<LongWritable, Text, LongWritable, Text>{

		SimpleDateFormat dateformat1=new SimpleDateFormat("yyyyMMddHHmmss");

		Text v2 = new Text();

		protected void map(LongWritable key, Text value, Context context) throws java.io.IOException ,InterruptedException {

			final String[] splited = value.toString().split("\t");

			try {

				final Date date = new Date(Long.parseLong(splited[0].trim()));

				final String dateFormat = dateformat1.format(date);

				String rowKey = splited[1]+":"+dateFormat;

				v2.set(rowKey+"\t"+value.toString());

				context.write(key, v2);

			} catch (NumberFormatException e) {

				final Counter counter = context.getCounter("BatchImport", "ErrorFormat");

				counter.increment(1L);

				System.out.println("出错了"+splited[0]+" "+e.getMessage());

			}

		};

	}

	static class BatchImportReducer extends TableReducer<LongWritable, Text, NullWritable>{

		protected void reduce(LongWritable key, java.lang.Iterable<Text> values, 	Context context) throws java.io.IOException ,InterruptedException {

			for (Text text : values) {

				final String[] splited = text.toString().split("\t");

				final Put put = new Put(Bytes.toBytes(splited[0]));

				put.add(Bytes.toBytes("cf"), Bytes.toBytes("date"), Bytes.toBytes(splited[1]));

				put.add(Bytes.toBytes("cf"), Bytes.toBytes("msisdn"), Bytes.toBytes(splited[2]));

				//省略其它字段，调用put.add(....)就可以

				context.write(NullWritable.get(), put);

			}

		};

	}

	public static void main(String[] args) throws Exception {

		final Configuration configuration = new Configuration();

		//设置zookeeper

		configuration.set("hbase.zookeeper.quorum", "hadoop0");

		//设置hbase表名称

		configuration.set(TableOutputFormat.OUTPUT_TABLE, "wlan_log");

		//将该值改大，防止hbase超时退出

		configuration.set("dfs.socket.timeout", "180000");

		final Job job = new Job(configuration, "HBaseBatchImport");

		job.setMapperClass(BatchImportMapper.class);

		job.setReducerClass(BatchImportReducer.class);

		//设置map的输出，不设置reduce的输出类型

		job.setMapOutputKeyClass(LongWritable.class);

		job.setMapOutputValueClass(Text.class);

		job.setInputFormatClass(TextInputFormat.class);

		//不再设置输出路径，而是设置输出格式类型

		job.setOutputFormatClass(TableOutputFormat.class);

		FileInputFormat.setInputPaths(job, "hdfs://hadoop0:9000/input");

		job.waitForCompletion(true);

	}

}

Hadoop之——HBASE结合MapReduce批量导入数据的更多相关文章

HBase结合MapReduce批量导入（HDFS中的数据导入到HBase）
HBase结合MapReduce批量导入 package hbase; import java.text.SimpleDateFormat; import java.util.Date; import ...
csv文件批量导入数据到sqlite。
csv文件批量导入数据到sqlite. 代码: f = web.input(bs_switch = {}) # bs_switch 为from表单file字段的namedata =[i.split( ...
使用python向Redis批量导入数据
1.使用pipeline进行批量导入数据.包含先使用rpush插入数据,然后使用expire改动过期时间 class Redis_Handler(Handler): def connect(self) ...
Cassandra使用pycassa批量导入数据
本周接手了一个Cassandra系统的维护工作,有一项是需要将应用方的数据导入我们维护的Cassandra集群,并且为应用方提供HTTP的方式访问服务.这是我第一次接触KV系统,原来只是走马观花似的看 ...
Redis批量导入数据的方法
有时候,我们需要给redis库中插入大量的数据,如做性能测试前的准备数据.遇到这种情况时,偶尔可能也会懵逼一下,这里就给大家介绍一个批量导入数据的方法. 先准备一个redis protocol的文件( ...
项目总结04：SQL批量导入数据：将具有多表关联的Excel数据，通过sql语句脚本的形式，导入到数据库
将具有多表关联的Excel数据,通过sql语句脚本的形式,导入到数据库写在前面:本文用的语言是java:数据库是MySql: 需求:在实际项目中,经常会被客户要求,做批量导入数据:一般的简单的单表数 ...
批量导入数据到mssql数据库的
概述批量导入数据到数据库中,我们有好几种方式. 从一个数据表里生成数据脚本,到另一个数据库里执行脚本从EXCEL里导入数据上面两种方式,导入的数据都会生成大量的日志.如果批量导入5W条数据到数据 ...
asp.net线程批量导入数据时通过ajax获取执行状态
最近因为工作中遇到一个需求,需要做了一个批量导入功能,但长时间运行没个反馈状态,很容易让人看了心急,产生各种臆想!为了解决心里障碍,写了这么个功能. 通过线程执行导入,并把正在执行的状态存入sessi ...
ADO.NET 对数据操作以及如何通过C# 事务批量导入数据
ADO.NET 对数据操作以及如何通过C# 事务批量导入数据 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ...

随机推荐

比较 String，StringBuffer，StringBuilder
1)三者在执行速度方面的比较:StringBuilder > StringBuffer > String 2)String <(StringBuffer,StringBuild ...
create_module - 生成一条可加载模块记录
总览 #include <linux/module.h> caddr_t create_module(const char *name, size_t size); 描述 create_m ...
BASH BUILTIN COMMANDS 内建命令
除非另外说明,这一章介绍的内建命令如果接受 - 引导的选项,那么它也接受 -- 作为参数,来指示选项的结束 : [arguments] 没有效果:这个命令除了扩展 arguments 并且作任何指定的 ...
python根据日期返回星期
import time #定义:timedate为时间戳def formattime(timedate,s="%Y-%m-%d %H:%M:%S"): return ...
C++操作MYSQL遇到的一些问题
首先我使用的是 vcpkg<不知道的可以进行百度可以剧透一下,这个对Visual Studio使用一些C++的轮子太方便了, 上面是我装的一些库<大大安利vcpkg 安装时一定要使用p ...
PHP读取超大的excel文件数据的方案
场景和痛点说明今天因为一个老同学找我,说自己公司的物流业务都是现在用excel处理,按月因为数据量大,一个excel差不多有百万数据,文件有接近100M,打开和搜索就相当的慢联想到场景:要导入数 ...
(十五)python3 可变长参数（arg,*args,**kwargs）
可变长参数(*args,**kwargs) 一.最常见的是在定义函数时,预先并不知道, 函数使用者会传递多少个参数给你, 所以在这个场景下使用这两个关键字.其实并不是必须写成*args 和**kwar ...
C++ string 转整数
使用 sstream 完成转换, #include <iostream> #include <string> #include <sstream> #include ...
LeetCode（64） Minimum Path Sum
题目 Total Accepted: 47928 Total Submissions: 148011 Difficulty: Medium Given a m x n grid filled with ...
UVA 221 城市化地图（离散化思想）
题意: 给出若干个栋楼俯视图的坐标和面积,求从俯视图的南面(可以视为正视图)看过去到底能看到多少栋楼. 输入第一个n说明有n栋楼,然后输入5个实数(注意是实数),分别是楼的左下角坐标(x,y), 然后 ...

Hadoop之——HBASE结合MapReduce批量导入数据

Hadoop之——HBASE结合MapReduce批量导入数据的更多相关文章

随机推荐

热门专题