mapreduce-实现多表关联

//map

package hadoop3;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class duobiaomap extends Mapper<LongWritable,Text,Text,Text>{

protected void map(LongWritable key,Text value,Context context) throws IOException, InterruptedException

{
String line=value.toString();
if (line.contains("factoryname") || line.contains("addressID"))
{
return;
}

String [] str=line.split("\t");
String flag=new String();
if (str[0].length()==1)

{ flag="2";
context.write(new Text(str[0]), new Text(flag+"+"+str[1]) );

}
else if (str[0].length()>1)

{

flag="1";
context.write(new Text(str[1]), new Text(flag+"+"+str[0]));
}
else {}
}
}

//reduce

package hadoop3;

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class duobiaoreduce extends Reducer<Text,Text,Text,Text>

{ private static int num=0;
protected void reduce(Text key,Iterable<Text> values,Context context) throws IOException, InterruptedException
{
if(num==0)
{
context.write(new Text("factory"), new Text("address"));
num++;
}

Iterator<Text> itr=values.iterator();
String [] factory=new String[100];
int factorynum=0;
String [] address=new String[100];
int addressnum=0;

while(itr.hasNext())
{
String [] str1=itr.next().toString().split("\\+");
if (str1[0].compareTo("1")==0)
{
factory[factorynum]=str1[1];
factorynum++;

}
else if(str1[0].compareTo("2")==0)

{
address[addressnum]=str1[1];
addressnum++;

}
else {}
}

if(factorynum !=0 && addressnum !=0){
for (int i=0;i<address.length;i++)
{
for (int j=0;j<factory.length;j++)
{

context.write(new Text(factory[j]), new Text(address[i]));

}

}
}

}

package hadoop3;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

//import com.sun.jersey.core.impl.provider.entity.XMLJAXBElementProvider.Text;

public class duobiao extends Configured implements Tool{

public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
ToolRunner.run(new duobiao(), args);
}

@Override
public int run(String[] arg0) throws Exception {
// TODO Auto-generated method stub
Configuration conf=getConf();
Job job=new Job();
job.setJarByClass(getClass());
FileSystem fs=FileSystem.get(conf);
fs.delete(new Path("/outfile1105"),true);
FileInputFormat.addInputPath(job, new Path("/luo/duobiao.txt"));
FileInputFormat.addInputPath(job, new Path("/luo/duobiao2.txt"));
FileOutputFormat.setOutputPath(job, new Path("/outfile1105"));

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);

job.setMapperClass(duobiaomap.class);
job.setReducerClass(duobiaoreduce.class);
job.waitForCompletion(true);

return 0;
}

}

mapreduce-实现多表关联的更多相关文章

MapReduce应用案例--单表关联
1. 实例描述单表关联这个实例要求从给出的数据中寻找出所关心的数据,它是对原始数据所包含信息的挖掘. 实例中给出child-parent 表, 求出grandchild-grandparent表. ...
Hadoop on Mac with IntelliJ IDEA - 8 单表关联NullPointerException
简化陆喜恒. Hadoop实战(第2版)5.4单表关联的代码时遇到空指向异常,经分析是逻辑问题,在此做个记录. 环境:Mac OS X 10.9.5, IntelliJ IDEA 13.1.5, Ha ...
hadoop实例---多表关联
多表关联和单表关联类似,它也是通过对原始数据进行一定的处理,从其中挖掘出关心的信息.如下输入的是两个文件,一个代表工厂表,包含工厂名列和地址编号列:另一个代表地址表,包含地址名列和地址编号列.要求从 ...
Hadoop 多表关联
一.实例描述多表关联和单表关联类似,它也是通过对原始数据进行一定的处理,从其中挖掘出关心的信息.下面进入这个实例. 输入是两个文件,一个代表工厂表,包含工厂名列和地址编号列:另一个代表地址列,包含地 ...
Hadoop 单表关联
前面的实例都是在数据上进行一些简单的处理,为进一步的操作打基础.单表关联这个实例要求从给出的数据中寻找到所关心的数据,它是对原始数据所包含信息的挖掘.下面进入这个实例. 1.实例描述实例中给出chi ...
Hive中小表与大表关联(join)的性能分析【转】
Hive中小表与大表关联(join)的性能分析 [转自:http://blog.sina.com.cn/s/blog_6ff05a2c01016j7n.html] 经常看到一些Hive优化的建议中说当 ...
MapRedece(多表关联)
多表关联: 准备数据 ******************************************** 工厂表: Factory Addressed BeijingRedStar 1 Shen ...
MapRedece(单表关联)
源数据:Child--Parent表 Tom Lucy Tom Jack Jone Lucy Jone Jack Lucy Marry Lucy Ben Jack Alice Jack Jesse T ...
Hadoop案例（七）MapReduce中多表合并
MapReduce中多表合并案例一.案例需求订单数据表t_order: id pid amount 1001 01 1 1002 02 2 1003 03 3 订单数据order.txt 商品信息 ...
MR案例：单表关联查询
"单表关联"这个实例要求从给出的数据中寻找所关心的数据,它是对原始数据所包含信息的挖掘. 需求:实例中给出 child-parent(孩子—父母)表,要求输出 grandchild ...

随机推荐

FTP下载
import java.io.BufferedInputStream;import java.io.BufferedOutputStream;import java.io.File;import ja ...
系统封装接口层 cmsis_os
在这个实时操作系统泛滥的年代,有这么一个系统封装接口层还是蛮有必要的.前些时间偶然间在STM32最新的固件库中就发现了这个系统封装接口,当时就把自己所用的系统进行封装.直到最近KEIL5.0发现其中所 ...
Python编程-函数进阶
一.函数对象函数是第一类对象,即函数可以当作数据传递 1 可以被引用 2 可以当作参数传递 3 返回值可以是函数 4 可以当作容器类型的元素 def foo(): print('from foo') ...
linux上安装程序出现的问题汇总
1.程序在编译过程中出现:variable set but not used [-Werror=unused-but-set-variable] 解决方法:将configure文件和Makefile文 ...
五、golang实现排序
实现排序: 1.实现一个冒泡排序 2.实现一个选择排序 3.实现一个插入排序 4.实现一个快速排序冒泡排序 package main import( "fmt" ) func b ...
mysql里的ibdata1文件
mysql大多数磁盘空间被 InnoDB 的共享表空间 ibdata1 使用.而你已经启用了 innodb_file_per_table,所以问题是: ibdata1存了什么? 当你启用了innodb ...
SG函数略解
由于笔者太懒,懒得把原来的markdown改成MCE,所以有很多奇怪的地方请谅解. 先说nim游戏. 大意:有n堆石子,两个人轮流取,每个人每次从任意一堆取任意个,直到一个人无法取了为止.问对于石子的 ...
CF703D Mishka and Interesting sum
题意:给定一个1e6长度的值域1e9的数组.每次给定询问,询问区间内出现偶数次的数的异或和. 题解:首先很显然,每一次询问的答案,等于这个区间所有不同元素异或和异或上区间异或和.(因为出现偶数次的对区 ...
【转】Android ImageView圆形头像
Android ImageView圆形头像图片完全解析我们在做项目的时候会用到圆形的图片,比如用户头像,类似QQ.用户在用QQ更换头像的时候,上传的图片都是矩形的,但显示的时候确是圆形的. 原理: ...
ML 感知机（Perceptrons）
感知机 Perceptrons 学习Hinton神经网络公开课的学习笔记 https://class.coursera.org/neuralnets-2012-001 1 感知机历史在19世纪60年 ...

mapreduce-实现多表关联

mapreduce-实现多表关联的更多相关文章

随机推荐

热门专题