mapreduce-实现多表关联

//map

package hadoop3;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class duobiaomap extends Mapper<LongWritable,Text,Text,Text>{

protected void map(LongWritable key,Text value,Context context) throws IOException, InterruptedException

{
String line=value.toString();
if (line.contains("factoryname") || line.contains("addressID"))
{
return;
}

String [] str=line.split("\t");
String flag=new String();
if (str[0].length()==1)

{ flag="2";
context.write(new Text(str[0]), new Text(flag+"+"+str[1]) );

}
else if (str[0].length()>1)

{

flag="1";
context.write(new Text(str[1]), new Text(flag+"+"+str[0]));
}
else {}
}
}

//reduce

package hadoop3;

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class duobiaoreduce extends Reducer<Text,Text,Text,Text>

{ private static int num=0;
protected void reduce(Text key,Iterable<Text> values,Context context) throws IOException, InterruptedException
{
if(num==0)
{
context.write(new Text("factory"), new Text("address"));
num++;
}

Iterator<Text> itr=values.iterator();
String [] factory=new String[100];
int factorynum=0;
String [] address=new String[100];
int addressnum=0;

while(itr.hasNext())
{
String [] str1=itr.next().toString().split("\\+");
if (str1[0].compareTo("1")==0)
{
factory[factorynum]=str1[1];
factorynum++;

}
else if(str1[0].compareTo("2")==0)

{
address[addressnum]=str1[1];
addressnum++;

}
else {}
}

if(factorynum !=0 && addressnum !=0){
for (int i=0;i<address.length;i++)
{
for (int j=0;j<factory.length;j++)
{

context.write(new Text(factory[j]), new Text(address[i]));

}

}
}

}

package hadoop3;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

//import com.sun.jersey.core.impl.provider.entity.XMLJAXBElementProvider.Text;

public class duobiao extends Configured implements Tool{

public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
ToolRunner.run(new duobiao(), args);
}

@Override
public int run(String[] arg0) throws Exception {
// TODO Auto-generated method stub
Configuration conf=getConf();
Job job=new Job();
job.setJarByClass(getClass());
FileSystem fs=FileSystem.get(conf);
fs.delete(new Path("/outfile1105"),true);
FileInputFormat.addInputPath(job, new Path("/luo/duobiao.txt"));
FileInputFormat.addInputPath(job, new Path("/luo/duobiao2.txt"));
FileOutputFormat.setOutputPath(job, new Path("/outfile1105"));

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);

job.setMapperClass(duobiaomap.class);
job.setReducerClass(duobiaoreduce.class);
job.waitForCompletion(true);

return 0;
}

}

mapreduce-实现多表关联的更多相关文章

MapReduce应用案例--单表关联
1. 实例描述单表关联这个实例要求从给出的数据中寻找出所关心的数据,它是对原始数据所包含信息的挖掘. 实例中给出child-parent 表, 求出grandchild-grandparent表. ...
Hadoop on Mac with IntelliJ IDEA - 8 单表关联NullPointerException
简化陆喜恒. Hadoop实战(第2版)5.4单表关联的代码时遇到空指向异常,经分析是逻辑问题,在此做个记录. 环境:Mac OS X 10.9.5, IntelliJ IDEA 13.1.5, Ha ...
hadoop实例---多表关联
多表关联和单表关联类似,它也是通过对原始数据进行一定的处理,从其中挖掘出关心的信息.如下输入的是两个文件,一个代表工厂表,包含工厂名列和地址编号列:另一个代表地址表,包含地址名列和地址编号列.要求从 ...
Hadoop 多表关联
一.实例描述多表关联和单表关联类似,它也是通过对原始数据进行一定的处理,从其中挖掘出关心的信息.下面进入这个实例. 输入是两个文件,一个代表工厂表,包含工厂名列和地址编号列:另一个代表地址列,包含地 ...
Hadoop 单表关联
前面的实例都是在数据上进行一些简单的处理,为进一步的操作打基础.单表关联这个实例要求从给出的数据中寻找到所关心的数据,它是对原始数据所包含信息的挖掘.下面进入这个实例. 1.实例描述实例中给出chi ...
Hive中小表与大表关联(join)的性能分析【转】
Hive中小表与大表关联(join)的性能分析 [转自:http://blog.sina.com.cn/s/blog_6ff05a2c01016j7n.html] 经常看到一些Hive优化的建议中说当 ...
MapRedece(多表关联)
多表关联: 准备数据 ******************************************** 工厂表: Factory Addressed BeijingRedStar 1 Shen ...
MapRedece(单表关联)
源数据:Child--Parent表 Tom Lucy Tom Jack Jone Lucy Jone Jack Lucy Marry Lucy Ben Jack Alice Jack Jesse T ...
Hadoop案例（七）MapReduce中多表合并
MapReduce中多表合并案例一.案例需求订单数据表t_order: id pid amount 1001 01 1 1002 02 2 1003 03 3 订单数据order.txt 商品信息 ...
MR案例：单表关联查询
"单表关联"这个实例要求从给出的数据中寻找所关心的数据,它是对原始数据所包含信息的挖掘. 需求:实例中给出 child-parent(孩子—父母)表,要求输出 grandchild ...

随机推荐

C# Invoke 使用异步委托
如果使用多线程,应该会遇到这样的一个问题,在子线程中想调用主线程中(Form1)控件,提示报错! 可以使用Invoke方法调用. this.Invoke(new MethodInvoker(() =& ...
每天一个Linux命令（63）scp命令
scp(secure copy)用于进行远程文件拷贝. (1)用法: 用法: scp [参数] [源文件] [目标文件] (2)功能: 功能: scp在主机 ...
$ 用python处理Excel文档（1）——用xlrd模块读取xls/xlsx文档
本文主要介绍xlrd模块读取Excel文档的基本用法,并以一个GDP数据的文档为例来进行操作. 1. 准备工作: 1. 安装xlrd:pip install xlrd 2. 准备数据集:从网上找到的1 ...
oracle 定时删除3天前的备份数据
不需要保留那么多,按公司要求只需要保留一个星期的即可. 1.那么有什么方法自动删除7天以前备份的*.log文件呢? 2.服务器过多,不可能一一手动创建,有没有自动完成这个创建计划任务的批处理呢? 首先 ...
Hearbeat 介绍
Hearbeat 介绍 Linux-HA的全称是High-Availability Linux,它是一个开源项目,这个开源项目的目标是:通过社区开发者的共同努力,提供一个增强linux可靠性(reli ...
JSON.parse和JSON.stringify的作用
//JSON.parse将字符串格式json转化为json对象 var str='{"name":"lingling","age":&quo ...
ANSI C和POSIX
简单的说 ANSI C:标准C API(对应fopen) POSIX:方便在Linux下运行的C API(对应open)
使用shell自动备份数据库
全备份 #!/bin/sh #mysql地址 #检测用户是否手动输入了密码 mysql_host="" #mysql用户 mysql_user="" #mysq ...
【P1886】滑动窗口（单调队列→线段树→LCT）
这个题很友好,我们可以分别进行简单难度,中等难度,恶心难度来做.然而智商没问题的话肯定是用单调队列来做... 板子题,直接裸的单调队列就能过. #include<iostream> #in ...
MapReduce-排序(全部排序、辅助排序)
排序排序是MapReduce的核心技术. 1.准备示例:按照气温字段对天气数据集排序.由于气温字段是有符号的整数,所以不能将该字段视为Text对象并以字典顺序排序.反之,用顺序文件存储数据,其In ...

mapreduce-实现多表关联

mapreduce-实现多表关联的更多相关文章

随机推荐

热门专题