还是计算矩阵的乘积,待计算的表达式如下:

S=F*[B+mu(u+s+b+d)]

其中,矩阵B、u、s、d分别存放在名称对应的SequenceFile文件中。

1)我们想分别读取这些文件(放在不同的文件夹中)然后计算与矩阵F的乘积,这就需要使用MultipleInputs类,那么就需要修改main()函数中对作业的配置,首先我们回顾一下之前未使用MultipleInputs的时候,对job中的map()阶段都需要进行哪些配置,示例如下:

job.setInputFormatClass(SequenceFileInputFormat.class);
job.setMapperClass(MyMapper.class);
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(DoubleArrayWritable.class);
FileInputFormat.setInputPaths(job, new Path(uri));

在配置job的时候对map任务的设置有五点,分别是:输入格式、所使用的mapper类、map输出key的类型、map输出value的类型以及输入路径。

然后,使用MultipleInputs时,对map的配置内容如下:

 MultipleInputs.addInputPath(job, new Path(uri + "/b100"), SequenceFileInputFormat.class, MyMapper.class);
MultipleInputs.addInputPath(job, new Path(uri + "/u100"), SequenceFileInputFormat.class, MyMapper.class);
MultipleInputs.addInputPath(job, new Path(uri + "/s100"), SequenceFileInputFormat.class, MyMapper.class);
MultipleInputs.addInputPath(job, new Path(uri + "/d100"), SequenceFileInputFormat.class, MyMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(DoubleArrayWritable.class);

首先使用addInputpath()将文件的输入路径、文件输入格式、所使用的mapper类添加到job中,所以接下来我们只需要再配置map的输出key和value的类型就可以了。

2)以上就是完成了使用MultipleInputs对map任务的配置,但是当我们使用MultipleInputs时,获得文件名的方法与以前的方法不同,所以需要将map()方法中获得文件名的代码修改为如下代码(http://blog.csdn.net/cuilanbo/article/details/25722489):

 InputSplit split=context.getInputSplit();
//String fileName=((FileSplit)inputSplit).getPath().getName();
Class<? extends InputSplit> splitClass = split.getClass(); FileSplit fileSplit = null;
if (splitClass.equals(FileSplit.class)) {
fileSplit = (FileSplit) split;
} else if (splitClass.getName().equals(
"org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit")) {
// begin reflection hackery...
try {
Method getInputSplitMethod = splitClass
.getDeclaredMethod("getInputSplit");
getInputSplitMethod.setAccessible(true);
fileSplit = (FileSplit) getInputSplitMethod.invoke(split);
} catch (Exception e) {
// wrap and re-throw error
throw new IOException(e);
}
// end reflection hackery
}
String fileName=fileSplit.getPath().getName();

以上就完成了map的多路径输入,不过如果我们想要将这多个文件的计算结果也输出到不同的文件这怎么办?那就使用MultipleOutputs。

3)使用MultipleOutputs之前,我们首先考虑之前我们是怎么配置reduce任务的,示例如下:

 job.setOutputFormatClass(SequenceFileOutputFormat.class);
job.setReducerClass(MyReducer.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(DoubleArrayWritable.class);
FileOutputFormat.setOutputPath(job,new Path(outUri));

同样的,在reduce任务的时候设置也是五点,分别是:输入格式、所使用的reduce类、renduce输出key的类型、reduce输出value的类型以及输出路径。然后,使用MultipleInputs是对reducer的配置如下:

 MultipleOutputs.addNamedOutput(job, "Sb100", SequenceFileOutputFormat.class, IntWritable.class, DoubleArrayWritable.class);
MultipleOutputs.addNamedOutput(job,"Su100",SequenceFileOutputFormat.class,IntWritable.class,DoubleArrayWritable.class);
MultipleOutputs.addNamedOutput(job,"Ss100",SequenceFileOutputFormat.class,IntWritable.class,DoubleArrayWritable.class);
MultipleOutputs.addNamedOutput(job, "Sd100", SequenceFileOutputFormat.class, IntWritable.class, DoubleArrayWritable.class);
job.setReducerClass(MyReducer.class);
FileOutputFormat.setOutputPath(job,new Path(outUri));

使用MultipleOutputs的addNamedOutput()方法将输出文件名、输出文件格式、输出key类型、输出value类型配置到job中。然后我们再配置所使用的reduce类、输出路径

4)使用MultipleOutputs时,在reduce()方法中写入文件使用的不再是context.write(),而是使用MultipleOutputs类的write()方法。所以需要修改redcue()的实现以及setup()的实现,修改后如下:

a.setup()方法

 public void setup(Context context){
mos=new MultipleOutputs(context);
int leftMatrixColumnNum=context.getConfiguration().getInt("leftMatrixColumnNum",100);
sum=new DoubleWritable[leftMatrixColumnNum];
for (int i=0;i<leftMatrixColumnNum;++i){
sum[i]=new DoubleWritable(0.0);
}
}

b.reduce()方法

 public void reduce(Text key,Iterable<DoubleArrayWritable>value,Context context) throws IOException, InterruptedException {
int valueLength=0;
for(DoubleArrayWritable doubleValue:value){
obValue=doubleValue.toArray();
valueLength=Array.getLength(obValue);
for (int i=0;i<valueLength;++i){
sum[i]=new DoubleWritable(Double.parseDouble(Array.get(obValue,i).toString())+sum[i].get());
}
}
valueArrayWritable=new DoubleArrayWritable();
valueArrayWritable.set(sum);
String[] xx=key.toString(). split(",");
IntWritable intKey=new IntWritable(Integer.parseInt(xx[0]));
if (key.toString().endsWith("b100")){
mos.write("Sb100",intKey,valueArrayWritable);
}
else if (key.toString().endsWith("u100")) {
mos.write("Su100",intKey,valueArrayWritable);
}
else if (key.toString().endsWith("s100")) {
mos.write("Ss100",intKey,valueArrayWritable);
}
else if (key.toString().endsWith("d100")) {
mos.write("Sd100",intKey,valueArrayWritable);
}
for (int i=0;i<sum.length;++i){
sum[i].set(0.0);
} }
}

mos.write("Sb100",key,value)中的文件名必须与使用addNamedOutput()方法配置job时使用的文件名相同,另外文件名中不能包含"-"、“_”字符。

5)在一个Job中同时使用MultipleInputs和MultipleOutputs的完整代码如下:

 /**
* Created with IntelliJ IDEA.
* User: hadoop
* Date: 16-3-9
* Time: 下午12:47
* To change this template use File | Settings | File Templates.
*/
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import java.io.IOException;
import java.lang.reflect.Array;
import java.lang.reflect.Method;
import java.net.URI; import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.output.NullOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.util.ReflectionUtils; public class MutiDoubleInputMatrixProduct {
public static class MyMapper extends Mapper<IntWritable,DoubleArrayWritable,Text,DoubleArrayWritable>{
public DoubleArrayWritable map_value=new DoubleArrayWritable();
public double[][] leftMatrix=null;/******************************************/
public Object obValue=null;
public DoubleWritable[] arraySum=null;
public double sum=0;
public void setup(Context context) throws IOException {
Configuration conf=context.getConfiguration();
leftMatrix=new double[conf.getInt("leftMatrixRowNum",10)][conf.getInt("leftMatrixColumnNum",10)];
System.out.println("map setup() start!");
//URI[] cacheFiles=DistributedCache.getCacheFiles(context.getConfiguration());
Path[] cacheFiles=DistributedCache.getLocalCacheFiles(conf);
String localCacheFile="file://"+cacheFiles[0].toString();
//URI[] cacheFiles=DistributedCache.getCacheFiles(conf);
//DistributedCache.
System.out.println("local path is:"+cacheFiles[0].toString());
// URI[] cacheFiles=DistributedCache.getCacheFiles(context.getConfiguration());
FileSystem fs =FileSystem.get(URI.create(localCacheFile), conf);
SequenceFile.Reader reader=null;
reader=new SequenceFile.Reader(fs,new Path(localCacheFile),conf);
IntWritable key= (IntWritable)ReflectionUtils.newInstance(reader.getKeyClass(),conf);
DoubleArrayWritable value= (DoubleArrayWritable)ReflectionUtils.newInstance(reader.getValueClass(),conf);
int valueLength=0;
int rowIndex=0;
while (reader.next(key,value)){
obValue=value.toArray();
rowIndex=key.get();
if(rowIndex<1){
valueLength=Array.getLength(obValue);
}
leftMatrix[rowIndex]=new double[conf.getInt("leftMatrixColumnNum",10)];
//this.leftMatrix=new double[valueLength][Integer.parseInt(context.getConfiguration().get("leftMatrixColumnNum"))];
for (int i=0;i<valueLength;++i){
leftMatrix[rowIndex][i]=Double.parseDouble(Array.get(obValue, i).toString());
} }
}
public void map(IntWritable key,DoubleArrayWritable value,Context context) throws IOException, InterruptedException {
obValue=value.toArray();
InputSplit split=context.getInputSplit();
//String fileName=((FileSplit)inputSplit).getPath().getName();
Class<? extends InputSplit> splitClass = split.getClass(); FileSplit fileSplit = null;
if (splitClass.equals(FileSplit.class)) {
fileSplit = (FileSplit) split;
} else if (splitClass.getName().equals(
"org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit")) {
// begin reflection hackery...
try {
Method getInputSplitMethod = splitClass
.getDeclaredMethod("getInputSplit");
getInputSplitMethod.setAccessible(true);
fileSplit = (FileSplit) getInputSplitMethod.invoke(split);
} catch (Exception e) {
// wrap and re-throw error
throw new IOException(e);
}
// end reflection hackery
}
String fileName=fileSplit.getPath().getName(); if (fileName.startsWith("FB")) {
context.write(new Text(String.valueOf(key.get())+","+fileName),value);
}
else{
arraySum=new DoubleWritable[this.leftMatrix.length];
for (int i=0;i<this.leftMatrix.length;++i){
sum=0;
for (int j=0;j<this.leftMatrix[0].length;++j){
sum+= this.leftMatrix[i][j]*Double.parseDouble(Array.get(obValue,j).toString())*(double)(context.getConfiguration().getFloat("u",1f));
}
arraySum[i]=new DoubleWritable(sum);
//arraySum[i].set(sum);
}
map_value.set(arraySum);
context.write(new Text(String.valueOf(key.get())+","+fileName),map_value);
}
}
}
public static class MyReducer extends Reducer<Text,DoubleArrayWritable,IntWritable,DoubleArrayWritable>{
public DoubleWritable[] sum=null;
public Object obValue=null;
public DoubleArrayWritable valueArrayWritable=null;
private MultipleOutputs mos=null; public void setup(Context context){
mos=new MultipleOutputs(context);
int leftMatrixColumnNum=context.getConfiguration().getInt("leftMatrixColumnNum",100);
sum=new DoubleWritable[leftMatrixColumnNum];
for (int i=0;i<leftMatrixColumnNum;++i){
sum[i]=new DoubleWritable(0.0);
}
} public void reduce(Text key,Iterable<DoubleArrayWritable>value,Context context) throws IOException, InterruptedException {
int valueLength=0;
for(DoubleArrayWritable doubleValue:value){
obValue=doubleValue.toArray();
valueLength=Array.getLength(obValue);
for (int i=0;i<valueLength;++i){
sum[i]=new DoubleWritable(Double.parseDouble(Array.get(obValue,i).toString())+sum[i].get());
}
}
valueArrayWritable=new DoubleArrayWritable();
valueArrayWritable.set(sum);
String[] xx=key.toString(). split(",");
IntWritable intKey=new IntWritable(Integer.parseInt(xx[0]));
if (key.toString().endsWith("b100")){
mos.write("Sb100",intKey,valueArrayWritable);
}
else if (key.toString().endsWith("u100")) {
mos.write("Su100",intKey,valueArrayWritable);
}
else if (key.toString().endsWith("s100")) {
mos.write("Ss100",intKey,valueArrayWritable);
}
else if (key.toString().endsWith("d100")) {
mos.write("Sd100",intKey,valueArrayWritable);
}
for (int i=0;i<sum.length;++i){
sum[i].set(0.0);
} }
} public static void main(String[]args) throws IOException, ClassNotFoundException, InterruptedException {
String uri="data/input";
String outUri="sOutput";
String cachePath="data/F100";
HDFSOperator.deleteDir(outUri);
Configuration conf=new Configuration();
DistributedCache.addCacheFile(URI.create(cachePath),conf);//添加分布式缓存
/**************************************************/
//FileSystem fs=FileSystem.get(URI.create(uri),conf);
//fs.delete(new Path(outUri),true);
/*********************************************************/
conf.setInt("leftMatrixColumnNum",100);
conf.setInt("leftMatrixRowNum",100);
conf.setFloat("u",0.5f);
// conf.set("mapred.jar","MutiDoubleInputMatrixProduct.jar");
Job job=new Job(conf,"MultiMatrix2");
job.setJarByClass(MutiDoubleInputMatrixProduct.class);
//job.setOutputFormatClass(NullOutputFormat.class);
job.setReducerClass(MyReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(DoubleArrayWritable.class);
MultipleInputs.addInputPath(job, new Path(uri + "/b100"), SequenceFileInputFormat.class, MyMapper.class);
MultipleInputs.addInputPath(job, new Path(uri + "/u100"), SequenceFileInputFormat.class, MyMapper.class);
MultipleInputs.addInputPath(job, new Path(uri + "/s100"), SequenceFileInputFormat.class, MyMapper.class);
MultipleInputs.addInputPath(job, new Path(uri + "/d100"), SequenceFileInputFormat.class, MyMapper.class);
MultipleOutputs.addNamedOutput(job, "Sb100", SequenceFileOutputFormat.class, IntWritable.class, DoubleArrayWritable.class);
MultipleOutputs.addNamedOutput(job,"Su100",SequenceFileOutputFormat.class,IntWritable.class,DoubleArrayWritable.class);
MultipleOutputs.addNamedOutput(job,"Ss100",SequenceFileOutputFormat.class,IntWritable.class,DoubleArrayWritable.class);
MultipleOutputs.addNamedOutput(job, "Sd100", SequenceFileOutputFormat.class, IntWritable.class, DoubleArrayWritable.class);
FileOutputFormat.setOutputPath(job,new Path(outUri));
System.exit(job.waitForCompletion(true)?0:1);
} }
class DoubleArrayWritable extends ArrayWritable {
public DoubleArrayWritable(){
super(DoubleWritable.class);
}
public String toString(){
StringBuilder sb=new StringBuilder();
for (Writable val:get()){
DoubleWritable doubleWritable=(DoubleWritable)val;
sb.append(doubleWritable.get());
sb.append(",");
}
sb.deleteCharAt(sb.length()-1);
return sb.toString();
}
} class HDFSOperator{
public static boolean deleteDir(String dir)throws IOException{
Configuration conf=new Configuration();
FileSystem fs =FileSystem.get(conf);
boolean result=fs.delete(new Path(dir),true);
System.out.println("sOutput delete");
fs.close();
return result;
}
}

6)运行结果如下:

使用MultipleInputs和MultipleOutputs的更多相关文章

  1. 批处理引擎MapReduce程序设计

    批处理引擎MapReduce程序设计 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.MapReduce API Hadoop同时提供了新旧两套MapReduce API,新AP ...

  2. 使用hadoop multipleOutputs对输出结果进行不一样的组织

    MapReduce job中,可以使用FileInputFormat和FileOutputFormat来对输入路径和输出路径来进行设置.在输出目录中,框架自己会自动对输出文件进行命名和组织,如:par ...

  3. 通过MultipleOutputs写到多个文件

    MultipleOutputs 类可以将数据写到多个文件,这些文件的名称源于输出的键和值或者任意字符串.这允许每个 reducer(或者只有 map 作业的 mapper)创建多个文件. 采用name ...

  4. MapReduce 规划 六系列 MultipleOutputs采用

    在前面的示例,输出文件名是默认: _logs part-r-00001 part-r-00003 part-r-00005 part-r-00007 part-r-00009 part-r-00011 ...

  5. hadoop1.2.1 MultipleOutputs将结果输出到多个文件或文件夹

    hadoop1.2.1 MultipleOutputs将结果输出到多个文件或文件夹 博客分类:http://tydldd.iteye.com/blog/2053867 hadoop   hadoop1 ...

  6. multipleOutputs Hadoop

    package org.lukey.hadoop.muloutput; import java.io.IOException; import org.apache.hadoop.conf.Config ...

  7. (转)MultipleOutputFormat和MultipleOutputs

    MultipleOutputFormat和MultipleOutputs http://www.cnblogs.com/liangzh/archive/2012/05/22/2512264.html ...

  8. MultipleOutputs新旧api

    package MRNB_V4; import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.con ...

  9. hadoop多文件输出MultipleOutputFormat和MultipleOutputs

    1.MultipleOutputFormat可以将相似的记录输出到相同的数据集.在写每条记录之前,MultipleOutputFormat将调用generateFileNameForKeyValue方 ...

随机推荐

  1. javascript实现倒计时(转)

    <html> <head> <title>倒计时</title> <meta charset="utf-8"> < ...

  2. 51Nod 2006 飞行员配对(二分图最大匹配)-匈牙利算法

    2006 飞行员配对(二分图最大匹配) 题目来源: 网络流24题 基准时间限制:1 秒 空间限制:131072 KB 分值: 0 难度:基础题  收藏  关注 第二次世界大战时期,英国皇家空军从沦陷国 ...

  3. 图论:KM算法

    如果,将求二分图的最大匹配的所有匹配边的权重看做1 那么用匈牙利算法求二分图的最大匹配的问题也可以看成求二分图的最大权匹配 如果边权是特例,我们就要使用KM算法来做了 这个算法其实还是比较难的,会用就 ...

  4. 按小时或天切割Nginx日志

    #按小时或天切割Nginx日志到备份文件夹 LOGS_PATH=/home/www/logs/thc SAVE_PATH=/home/www/logs/thc YESTERDAY=$(date -d ...

  5. HDU4889 Scary Path Finding Algorithm

    Fackyyj loves the challenge phase in TwosigmaCrap(TC). One day, he meet a task asking him to find sh ...

  6. [BZOJ1031][JSOI2007]字符加密Cipher 解题报告

    Description 喜欢钻研问题的JS 同学,最近又迷上了对加密方法的思考.一天,他突然想出了一种他认为是终极的加密办法:把需要加密的信息排成一圈,显然,它们有很多种不同的读法.例如下图,可以读作 ...

  7. python module: csv

    转自:sislcb 读 syntax : reader(csvfile[, dialect='excel'][, fmtparam]) csvfile:需要是支持迭代(Iterator)的对象,并且每 ...

  8. Spring+Junit,测试事务时,一直回滚

    研究了好长时间,都不知道原因,也不能解决. 控制台又没有报异常,但是就是一直回滚.注释掉spring声明式配置的代码,就能正确的更新数据. 从网上看了,别人的文章 http://blog.csdn.n ...

  9. 【Android开发日记】之入门篇(十四)——Button控件+自定义Button控件

        好久不见,又是一个新的学期开始了,为什么我感觉好惆怅啊!这一周也发生了不少事情,节假日放了三天的假(好久没有这么悠闲过了),实习公司那边被组长半强制性的要求去解决一个后台登陆的问题,结果就是把 ...

  10. JS ajxa请求 返回数据

    1. 发送ajax请求, 后台返回json集合    JQuery: $.each(list集合,回调函数function(下标,集合对象){}); 如下: <script> $(func ...