hadoop程序MapReduce之SingletonTableJoin

需求：单表关联问题。从文件中孩子和父母的关系挖掘出孙子和爷奶关系

样板：child-parent.txt

xiaoming daxiong

daxiong alice

daxiong jack

输出：xiaoming alice

xiaoming jack

分析设计：

mapper部分设计：

1、<k1,k1>k1代表：一行数据的编号位置，v1代表：一行数据。

2、左表：<k2,v2>k2代表：parent名字，v2代表：(1,child名字)，此处1：代表左表标志。

3、右表：<k3,v3>k3代表：child名字，v3代表：(2，parent名字)，此处2：代表右表标志。

reduce部分设计：

4、<k4,v4>k4代表：相同的key,v4代表：list<String>

5、求笛卡尔积<k5,v5>:k5代表：grandChild名字，v5代表：grandParent名字。

程序部分：

SingletonTableJoinMapper类

package com.cn.singletonTableJoin;

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Mapper;

public class SingletonTableJoinMapper extends Mapper<Object, Text, Text, Text> {

    @Override

    protected void map(Object key, Text value, Mapper<Object, Text, Text, Text>.Context context)

            throws IOException, InterruptedException {

        String childName = new String();

        String parentName = new String();

        String relationType = new String();

        String[] values=new String[2];

        int i = 0;

        StringTokenizer itr = new StringTokenizer(value.toString());

        while(itr.hasMoreElements()){

            values[i] = itr.nextToken();

            i++;

        }

        if(values[0].compareTo("child") != 0){

            childName  = values[0];

            parentName = values[1];

            relationType = "1";

            context.write(new Text(parentName), new Text(relationType+" "+childName));

            relationType = "2";

            context.write(new Text(childName), new Text(relationType+" "+parentName));

        }

    }

}

SingletonTableJoinReduce类：

package com.cn.singletonTableJoin;

import java.io.IOException;

import java.util.ArrayList;

import java.util.Iterator;

import java.util.List;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Reducer;

public class SingletonTableJoinReduce extends Reducer<Text, Text, Text, Text> {

    @Override

    protected void reduce(Text key, Iterable<Text> values, Reducer<Text, Text, Text, Text>.Context context)

            throws IOException, InterruptedException {

        List<String> grandChild = new ArrayList<String>();

        List<String> grandParent = new ArrayList<String>();

        Iterator<Text> itr = values.iterator();

        while(itr.hasNext()){

            String[] record = itr.next().toString().split(" ");

            if(0 == record[0].length()){

                continue;

            }

            if("1".equals(record[0])){

                grandChild.add(record[1]);

            }else if("2".equals(record[0])){

                grandParent.add(record[1]);

            }

        }

        if(0 != grandChild.size() && 0 != grandParent.size()){

            for(String grandchild : grandChild){

                for(String grandparent : grandParent){

                    context.write(new Text(grandchild), new Text(grandparent));

                }

            }

        }

    }

}

SingletonTableJoin类

package com.cn.singletonTableJoin;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

/**

 * 单表关联

 * @author root

 *

 */

public class SingletonTableJoin {

    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();

        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

        if (otherArgs.length != 2) {

           System.err.println("Usage: SingletonTableJoin  ");

           System.exit(2);

        }

        //创建一个job

        Job job = new Job(conf, "SingletonTableJoin");

        job.setJarByClass(SingletonTableJoin.class);

        //设置文件的输入输出路径

        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));

        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

        //设置mapper和reduce处理类

        job.setMapperClass(SingletonTableJoinMapper.class);

        job.setReducerClass(SingletonTableJoinReduce.class);

      //设置输出key-value数据类型

        job.setOutputKeyClass(Text.class);

        job.setOutputValueClass(Text.class);

       //提交作业并等待它完成

       System.exit(job.waitForCompletion(true) ? 0 : 1);

    }

}

把总结当成一种习惯。

hadoop程序MapReduce之SingletonTableJoin的更多相关文章

hadoop程序MapReduce之average
需求:求多门课程的平均值. 样板:math.txt zhangsan 90 lisi 88 wanghua 80 china.txt zhangsan 80lisi 90wanghua 88 输出:z ...
hadoop程序MapReduce之DataSort
需求:对文件中的数据进行排序. 样本:sort.log 10 13 10 20 输出:1 10 2 10 3 13 4 20 分析部分: mapper分析: 1.<k1,v1>k1代表:行 ...
hadoop程序MapReduce之DataDeduplication
需求:去掉文件中重复的数据. 样板:data.log 2016-3-1 a 2016-3-2 b 2016-3-2 c 2016-3-2 b 输出结果: 2016-3-1 a 2016 ...
hadoop程序MapReduce之MaxTemperature
需求:求每年当中最高的温度样本:temp.log 2016080623 2016072330 2015030420 输出结果:2016 30 2015 20 MapReduce分析设计: Mappe ...
hadoop程序MapReduce之WordCount
需求:统计一个文件中所有单词出现的个数. 样板:word.log文件中有hadoop hive hbase hadoop hive 输出:hadoop 2 hive 2 hbase 1 MapRedu ...
用PHP编写Hadoop的MapReduce程序
用PHP编写Hadoop的MapReduce程序 Hadoop流虽然Hadoop是用Java写的,但是Hadoop提供了Hadoop流,Hadoop流提供一个API, 允许用户使用任何语言编 ...
Hadoop之MapReduce程序应用三
摘要:MapReduce程序进行数据去重. 关键词:MapReduce 数据去重数据源:人工构造日志数据集log-file1.txt和log-file2.txt. log-file1.txt内容 ...
如何在Windows下面运行hadoop的MapReduce程序
在Windows下面运行hadoop的MapReduce程序的方法: 1.下载hadoop的安装包,这里使用的是"hadoop-2.6.4.tar.gz": 2.将安装包直接解压到 ...
Hadoop之Mapreduce 程序
package com.gylhaut.hadoop.senior.mapreduce; import java.io.IOException; import java.util.StringToke ...

随机推荐

在django中访问静态文件(js css img)
刚开始参考的是别的文章,后来参考文章<各种 django 静态文件的配置总结>才看到原来没有但是没有注意到版本,折腾了一晚上,浪费了很多很多时间.后来终于知道搜索django1.7访问静态 ...
c++之五谷杂粮---2
2.1 我们通过调用运算符(call operator)来执行函数.调用运算符的形式是一对圆括号,它作用于一个表达式,该表达式是函数或者指向函数的指针:圆括号之内是用逗号隔开的实参列表,我们用实参初 ...
python2.7执行shell命令
python学习——python中执行shell命令 2013-10-21 17:44:33 标签:python shell命令原创作品,允许转载,转载时请务必以超链接形式标明文章原始出处 .作者 ...
C语言一闪而过
头文件#include<stdlib.h> main函数system("pause");
linux的一些软件基本安装
买了个腾讯云的服务器,开始玩起来了,先装环境吧. JAVA安装安装个java yum -y install java-1.7.0-openjdk* 查看java版本 java -version 可以 ...
【WPF】弹窗定位、弹窗关闭后再打开的报错
需求:点击按钮,打开一个弹窗. // 获得窗体实例 Window window = openDesignViewModel.View as Window; // 这是使用了WAF框架 //Window ...
IPC通信:Posix消息队列
IPC通信:Posix消息队列消息队列可以认为是一个链表.进程(线程)可以往里写消息,也可以从里面取出消息.一个进程可以往某个消息队列里写消息,然后终止,另一个进程随时可以从消息队列里取走这些消息. ...
（转）Linux下PS命令详解
(转)Linux下PS命令详解整理自:http://blog.chinaunix.net/space.php?uid=20564848&do=blog&id=74654 要对系统中进 ...
sql server拼接一列字段
有一表,名曰IPSlot,欲取IP整列字段. sql语句,利用sql server的xml auto将表数据转换成xml=> select name= STUFF( REPLACE( REPLA ...
[LintCode]各位相加
描述: 给出一个非负整数 num,反复的将所有位上的数字相加,直到得到一个一位的整数. 给出 num = 38. 相加的过程如下:3 + 8 = 11,1 + 1 = 2.因为 2 只剩下一个数字,所 ...

hadoop程序MapReduce之SingletonTableJoin

hadoop程序MapReduce之SingletonTableJoin的更多相关文章

随机推荐

热门专题