multipleOutputs Hadoop

package org.lukey.hadoop.muloutput;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FileStatus;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.input.FileSplit;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;

import org.apache.hadoop.util.GenericOptionsParser;

public class TestMultipleOutput {

    static String baseOutputPath = "/user/hadoop/test_out";

    private static MultipleOutputs<Text, IntWritable> mos;

    // Mapper

    static class WordsOfClassCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);

        private Text className = new Text();

        @Override

        protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)

                throws IOException, InterruptedException {

            // TODO Auto-generated method stub

            FileSplit fileSplit = (FileSplit) context.getInputSplit();

            // 文件名

            String fileName = fileSplit.getPath().getName();

            // 文件夹名

            String dirName = fileSplit.getPath().getParent().getName();

            className.set(dirName + "/" + fileName);

            // Country:ABDBI 1

            mos.write(value, one, className.toString());

            // context.write(className, one);

        }

        @Override

        protected void cleanup(Mapper<LongWritable, Text, Text, IntWritable>.Context context)

                throws IOException, InterruptedException {

            // TODO Auto-generated method stub

            mos.close();

        }

        @Override

        protected void setup(Mapper<LongWritable, Text, Text, IntWritable>.Context context)

                throws IOException, InterruptedException {

            // TODO Auto-generated method stub

            mos = new MultipleOutputs<Text, IntWritable>(context);

        }

    }

    // Reducer

    static class WordsOfClassCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

        // result 表示每个文件里面单词个数

        IntWritable result = new IntWritable();

        @Override

        protected void reduce(Text key, Iterable<IntWritable> values,

                Reducer<Text, IntWritable, Text, IntWritable>.Context context)

                        throws IOException, InterruptedException {

            // TODO Auto-generated method stub

            int sum = 0;

            for (IntWritable value : values) {

                sum += value.get();

            }

            result.set(sum);

            context.write(key, result);

        }

    }

    // Client

    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();

        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

        if (otherArgs.length != 2) {

            System.out.println("Usage <in> <out>");

            System.exit(-1);

        }

        Job job = new Job(conf, "file count");

        job.setJarByClass(TestMultipleOutput.class);

        job.setMapperClass(WordsOfClassCountMapper.class);

        job.setReducerClass(WordsOfClassCountReducer.class);

        FileSystem fileSystem = FileSystem.get(conf);

        Path path = new Path(otherArgs[0]);

        FileStatus[] fileStatus = fileSystem.listStatus(path);

        for (FileStatus fs : fileStatus) {

            if (fs.isDir()) {

                Path p = new Path(fs.getPath().toString());

                FileInputFormat.addInputPath(job, p);

            }else{

                FileInputFormat.addInputPath(job, fs.getPath());

            }

        }

        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

        job.setOutputKeyClass(Text.class);

        job.setOutputValueClass(IntWritable.class);

        System.exit(job.waitForCompletion(true) ? 0 : 1);

    }

}

multipleOutputs Hadoop的更多相关文章

使用hadoop multipleOutputs对输出结果进行不一样的组织
MapReduce job中,可以使用FileInputFormat和FileOutputFormat来对输入路径和输出路径来进行设置.在输出目录中,框架自己会自动对输出文件进行命名和组织,如:par ...
hadoop多文件输出MultipleOutputFormat和MultipleOutputs
1.MultipleOutputFormat可以将相似的记录输出到相同的数据集.在写每条记录之前,MultipleOutputFormat将调用generateFileNameForKeyValue方 ...
Hadoop MultipleOutputs 结果输出到多个文件夹出现数据不全，部分文件为空
如题:出现下图中的情况(设置reduceNum=5) 感觉很奇怪,排除了很久,终于发现是一个第二次犯的错误:丢了这句 this.mOutputs.close(); 加上这句,一切恢复正常!
hadoop multipleoutputs
http://grepalex.com/2013/05/20/multipleoutputs-part1/ http://grepalex.com/2013/07/16/multipleoutputs ...
[Hadoop in Action] 第7章细则手册
向任务传递定制参数获取任务待定的信息生成多个输出与关系数据库交互让输出做全局排序 1.向任务传递作业定制的参数在编写Mapper和Reducer时,通常会想让一些地方可以配 ...
hadoop MapReduce 笔记
1. MapReduce程序开发步骤编写map 和 reduce 程序–> 单元测试 -> 编写驱动程序进行验证-> 本地数据集调试 -> 部署到集群运行用 ...
hadoop拾遗（五）---- mapreduce 输出到多个文件 / 文件夹
今天要把HBase中的部分数据转移到HDFS上,想根据时间戳来自动输出到以时间戳来命名的每个文件夹下.虽然以前也做过相似工作,但有些细节还是忘记了,所以这次写个随笔记录一下. package com. ...
[BigData]关于Hadoop学习笔记第三天(PPT总结)(一)
课程安排 MapReduce原理*** MapReduce执行过程** 数据类型与格式*** Writable接口与序列化机制*** ---------------------------加深拓展- ...
通过MultipleOutputs写到多个文件
MultipleOutputs 类可以将数据写到多个文件,这些文件的名称源于输出的键和值或者任意字符串.这允许每个 reducer(或者只有 map 作业的 mapper)创建多个文件. 采用name ...

随机推荐

VBS基础篇 - 杂项 - Sendkeys
VBS基础篇 - 杂项 - Sendkeys 模拟键盘操作,将一个或多个按键指令发送到指定Windows窗口来控制应用程序运行其使用格式为:object.SendKeys(string) obj ...
Broken Keyboard（悲剧文本）
你有一个键盘,键盘上所有的键都能正常使用,只是Home键和End键有时会自动按下.你并不知道这一情况,而是专心地打稿子,甚至连显示器都没开电源.当你打开显示器之后,展现在你面前的是一段悲剧文本.你的任 ...
Python实战：爬虫的基础
网络爬虫(又被称为网页蜘蛛,网络机器人,在FOAF社区中间,更经常的称为网页追逐者),是一种按照一定的规则,自动地抓取万维网信息的程序或者脚本.另外一些不常使用的名字还有蚂蚁.自动索引.模拟程序或者蠕 ...
why is agreement hard in a distributed system?
same question as: why is PAXOS necessary? 1, what if >1 nodes become leaders simultaneously? that ...
更改Xcode的缺省公司名
更改前: // testAppDelegate.m // test // // Created by gaohf on 11-5-24. // Copyright 2011 __MyCompa ...
Nicholas C. Zakas谈怎样才能成为优秀的前端工程师
黄色阴影为业务,红色字体为哲学昨天,我负责了Yahoo!公司组织的一次面试活动,感触颇深的是其中的应聘者提问环节.我得说自己对应聘者们提出的大多数问题都相当失望.我希望听到一些对在Yahoo!工作充 ...
C#抓取页面时候，获取页面跳转后的地址
static string fanhuiurl(string cahxunurl) { string url = ""; HttpWebRequest req = (HttpWeb ...
ListView控件的Insert、Edit和Delete功能（第二部分）
本系列文章将通过一个简单的实例,结合我自己使用ListView的情况,展示如何用ASP.NET 3.5 ListView控件进行基本的Insert.Edit和Delete操作. 系统要求: Windo ...
document.body与document.documentElement
document.body 获取的是body,document.documentElement获取的是html,在任何浏览器上都是如此相关问题: 1.获取页面滚动条滚动距离 chrome,safar ...
eclipse提升注解提示速度
preference--输入content--java--editor--content Assist--aoto delay 选项改为100或者更低提示速度 --ao ...

multipleOutputs Hadoop

multipleOutputs Hadoop的更多相关文章

随机推荐

热门专题