用mapreduce读取hdfs数据到hbase上

hdfs数据到hbase过程

将HDFS上的文件中的数据导入到hbase中

实现上面的需求也有两种办法，一种是自定义mr，一种是使用hbase提供好的import工具

hbase先创建好表 create 'TB','info'

下面是实现代码：

import org.apache.hadoop.hbase.client.Put;

import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;

import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;

import org.apache.hadoop.hbase.mapreduce.TableReducer;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.NullWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import java.io.IOException;

/**

 * 用于HDFS的数据读取，写入到hbase中，

 * hbase里预先创建好表：create 'NNTB','info'

 * */

public class HdfsToHBase {

    public static void main(String[] args) throws Exception{

        System.setProperty("hadoop.home.dir", "D:\\hadoop-2.7.6");//这行我是本地运行所需指定的hadoop home

        Configuration conf = HBaseConfiguration.create();

        conf.set("hbase.zookeeper.quorum", "202.168.27.196:2181");//ip乱写的，端口默认2181

        conf.set(TableOutputFormat.OUTPUT_TABLE, "NNTB");

        Job job = Job.getInstance(conf, HdfsToHBase.class.getSimpleName());

        TableMapReduceUtil.addDependencyJars(job);

        job.setJarByClass(HdfsToHBase.class);

        job.setMapperClass(HdfsToHBaseMapper.class);

        job.setMapOutputKeyClass(Text.class);

        job.setMapOutputValueClass(Text.class);

        job.setReducerClass(HdfsToHBaseReducer.class);

        FileInputFormat.addInputPath(job, new Path("hdfs://202.168.27.196:9000/user/hadoop/gznt/gznt_bmda/*"));

        job.setOutputFormatClass(TableOutputFormat.class);

        job.waitForCompletion(true);

    }

    public static class HdfsToHBaseMapper extends Mapper<LongWritable, Text, Text, Text> {

        private Text outKey = new Text();

        private Text outValue = new Text();

        @Override

        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

            String[] splits = value.toString().split("\t");

            outKey.set(splits[0]);

            outValue.set(splits[1]+"\t"+splits[2]+"\t"+splits[3]+"\t"+splits[4]);

            context.write(outKey, outValue);

        }

    }

    //:::   create 'NNTB','info'

    public static class HdfsToHBaseReducer extends TableReducer<Text, Text, NullWritable> {

        @Override

        protected void reduce(Text k2, Iterable<Text> v2s, Context context) throws IOException, InterruptedException {

            Put put = new Put(k2.getBytes());

            for (Text v2 : v2s) {

                String[] splis = v2.toString().split("\t");

                //info，对应hbase列族名

                if(splis[0]!=null && !"NULL".equals(splis[0])){

                    put.addColumn("info".getBytes(), "NodeCode".getBytes(),splis[0].getBytes());

                }

                if(splis[1]!=null && !"NULL".equals(splis[1])){

                    put.addColumn("info".getBytes(), "NodeType".getBytes(),splis[1].getBytes());

                }

                if(splis[2]!=null && !"NULL".equals(splis[2])){

                    put.addColumn("info".getBytes(), "NodeName".getBytes(),splis[2].getBytes());

                }

                if(splis[3]!=null && !"NULL".equals(splis[3])){

                    put.addColumn("info".getBytes(), "IsWarehouse".getBytes(),splis[3].getBytes());

                }

            }

            context.write(NullWritable.get(),put);

        }

    }

}

用mapreduce读取hdfs数据到hbase上的更多相关文章

bulk-load 装载HDFS数据到HBase
bulk-load的作用是用mapreduce的方式将hdfs上的文件装载到hbase中,对于海量数据装载入hbase非常有用,参考http://hbase.apache.org/docs/r0.89 ...
hdfs数据到hbase过程
需求:将HDFS上的文件中的数据导入到hbase中实现上面的需求也有两种办法,一种是自定义mr,一种是使用hbase提供好的import工具一.hdfs中的数据是这样的 hbase创建好表 cre ...
使用MapReduce将HDFS数据导入Mysql
使用MapReduce将Mysql数据导入HDFS代码链接将HDFS数据导入Mysql,代码示例 package com.zhen.mysqlToHDFS; import java.io.DataI ...
MapReduce读取hdfs上文件，建立词频的倒排索引到Hbase
Hdfs上的数据文件为T0,T1,T2(无后缀): T0: What has come into being in him was life, and the life was the light o ...
使用MapReduce将HDFS数据导入到HBase（二）
package com.bank.service; import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.conf. ...
使用MapReduce将HDFS数据导入到HBase（一）
package com.bank.service; import java.io.IOException; import org.apache.hadoop.conf.Configuration;im ...
使用MapReduce将HDFS数据导入到HBase（三）
使用MapReduce生成HFile文件,通过BulkLoader方式(跳过WAL验证)批量加载到HBase表中 package com.mengyao.bigdata.hbase; import j ...
spark读取hdfs数据本地性异常
在分布式计算中,为了提高计算速度,数据本地性是其中重要的一环. 不过有时候它同样也会带来一些问题. 一.问题描述在分布式计算中,大多数情况下要做到移动计算而非移动数据,所以数据本地性尤其重要,因此我 ...
spark读取hdfs数据本地性异常【转】
在分布式计算中,为了提高计算速度,数据本地性是其中重要的一环. 不过有时候它同样也会带来一些问题. 一.问题描述在分布式计算中,大多数情况下要做到移动计算而非移动数据,所以数据本地性尤其重要,因此我 ...

随机推荐

【BZOJ3495】PA2010 Riddle
题目大意有$n$个城镇被分成了$k$个郡,有$m$条连接城镇的无向边.要求给每个郡选择一个城镇作为首都,满足每条边至少有一个端点是首都. 题目分析每条边至少有一个端点是首都,每个郡至多 ...
Python：函数的命名空间、作用域与闭合函数
1,参数陷阱如果默认参数的只是一个可变数据类型,那么每一次调用的时候,如果不传值就共用这个数据类型的资源. 2,三元运算 c=a if a>b else b#如果a>b返回a,否则,返回 ...
2018中国大学生程序设计竞赛 - 网络选拔赛 1009 - Tree and Permutation 【dfs+树上两点距离和】
Tree and Permutation Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 65536/65536 K (Java/Oth ...
原生ajax、XMLHttpRequest和FetchAPI简单描述
什么是ajax ajax的出现,刚好解决了传统方法的缺陷.AJAX 是一种用于创建快速动态网页的技术.通过在后台与服务器进行少量数据交换,AJAX 可以使网页实现异步更新.这意味着可以在不重新加载整个 ...
【luogu P1004 方格取数】题解
题目链接:https://www.luogu.org/problemnew/show/P1004 标准的DP,不明白为什么有普及+提高的难度四维DP[i][j][k][l] 表示第一遍走到i,j格子 ...
webstorm开发nodejs的技巧--代码提示DefinitelyTyped
先发视频“WebStorm - MEAN Stack Walkthrough and Tips”,地址:http://www.tudou.com/programs/view/txhBUA7fcNA/? ...
[UNIX]UNIX常用命令总结
(1)查看服务器IP信息 $netstat -in (2)查看挂载磁盘信息 #sam #需要在root账号下查看
JSP静态包含和动态包含
JSP中有两种包含: 静态包含:<%@include file="被包含页面"%>: 动态包含:<jsp:include page="被包含页面&quo ...
flex布局——回顾
flex 即为弹性布局. 任何一个容器都可以指定为flex布局. .box{display:flex} 行内元素可以使用flex布局 .box{display: inline-flex} webkit ...
第13届景驰-埃森哲杯广东工业大学ACM程序设计大赛--L-用来作弊的药水
链接:https://www.nowcoder.com/acm/contest/90/L 来源:牛客网 1.题目描述 -- 在一个风雨交加的夜晚,来自异世界的不愿透露姓名的TMK同学获得了两种超强药水 ...

用mapreduce读取hdfs数据到hbase上

用mapreduce读取hdfs数据到hbase上的更多相关文章

随机推荐

热门专题