Hadoop: MapReduce2的几个基本示例

1) WordCount

这个就不多说了,满大街都是,网上有几篇对WordCount的详细分析

http://www.sxt.cn/u/235/blog/5809

http://www.cnblogs.com/zhanghuijunjava/archive/2013/04/27/3036549.html

这二篇都写得不错, 特别几张图画得很清晰

2) 去重处理(Distinct)

类似于db中的select distinct(x) from table , 去重处理甚至比WordCount还要简单,假如我们要对以下文件的内容做去重处理(注:该文件也是后面几个示例的输入参数)

基本上啥也不用做,在map阶段,把每一行的值当成key分发下去,然后在reduce阶段回收上来就可以了.

注:里面用到了一个自己写的类HDFSUtil,可以在 hadoop: hdfs API示例一文中找到.

原理:map阶段完成后,在reduce开始之前,会有一个combine的过程,相同的key值会自动合并,所以自然而然的就去掉了重复.

 package yjmyzz.mr;

 import org.apache.hadoop.conf.Configuration;

 import org.apache.hadoop.fs.Path;

 import org.apache.hadoop.io.NullWritable;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.mapreduce.Job;

 import org.apache.hadoop.mapreduce.Mapper;

 import org.apache.hadoop.mapreduce.Reducer;

 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

 import org.apache.hadoop.util.GenericOptionsParser;

 import yjmyzz.util.HDFSUtil;

 import java.io.IOException;

 public class RemoveDup {

     public static class RemoveDupMapper

             extends Mapper<Object, Text, Text, NullWritable> {

         public void map(Object key, Text value, Context context)

                 throws IOException, InterruptedException {

             context.write(value, NullWritable.get());

             //System.out.println("map: key=" + key + ",value=" + value);

         }

     }

     public static class RemoveDupReducer extends Reducer<Text, NullWritable, Text, NullWritable> {

         public void reduce(Text key, Iterable<NullWritable> values, Context context)

                 throws IOException, InterruptedException {

             context.write(key, NullWritable.get());

             //System.out.println("reduce: key=" + key);

         }

     }

     public static void main(String[] args) throws Exception {

         Configuration conf = new Configuration();

         String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

         if (otherArgs.length < 2) {

             System.err.println("Usage: RemoveDup <in> [<in>...] <out>");

             System.exit(2);

         }

         //删除输出目录(可选,省得多次运行时,总是报OUTPUT目录已存在)

         HDFSUtil.deleteFile(conf, otherArgs[otherArgs.length - 1]);

         Job job = Job.getInstance(conf, "RemoveDup");

         job.setJarByClass(RemoveDup.class);

         job.setMapperClass(RemoveDupMapper.class);

         job.setCombinerClass(RemoveDupReducer.class);

         job.setReducerClass(RemoveDupReducer.class);

         job.setOutputKeyClass(Text.class);

         job.setOutputValueClass(NullWritable.class);

         for (int i = 0; i < otherArgs.length - 1; ++i) {

             FileInputFormat.addInputPath(job, new Path(otherArgs[i]));

         }

         FileOutputFormat.setOutputPath(job,

                 new Path(otherArgs[otherArgs.length - 1]));

         System.exit(job.waitForCompletion(true) ? 0 : 1);

     }

 }

输出:

3) 记录计数(Count)

这个跟WordCount略有不同,类似于Select Count(*) from tables的效果,代码也超级简单,直接拿WordCount改一改就行了

 package yjmyzz.mr;

 import org.apache.hadoop.conf.Configuration;

 import org.apache.hadoop.fs.Path;

 import org.apache.hadoop.io.IntWritable;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.mapreduce.Job;

 import org.apache.hadoop.mapreduce.Mapper;

 import org.apache.hadoop.mapreduce.Reducer;

 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

 import org.apache.hadoop.util.GenericOptionsParser;

 import yjmyzz.util.HDFSUtil;

 import java.io.IOException;

 import java.util.StringTokenizer;

 public class RowCount {

     public static class RowCountMapper

             extends Mapper<Object, Text, Text, IntWritable> {

         private final static IntWritable one = new IntWritable(1);

         private final  static Text countKey = new Text("count");

         public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

                 context.write(countKey, one);

         }

     }

     public static class RowCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

         private IntWritable result = new IntWritable();

         public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

             int sum = 0;

             for (IntWritable val : values) {

                 sum += val.get();

             }

             result.set(sum);

             context.write(key, result);

         }

     }

     public static void main(String[] args) throws Exception {

         Configuration conf = new Configuration();

         String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

         if (otherArgs.length < 2) {

             System.err.println("Usage: RowCount <in> [<in>...] <out>");

             System.exit(2);

         }

         //删除输出目录(可选)

         HDFSUtil.deleteFile(conf, otherArgs[otherArgs.length - 1]);

         Job job = Job.getInstance(conf, "word count");

         job.setJarByClass(RowCount.class);

         job.setMapperClass(RowCountMapper.class);

         job.setCombinerClass(RowCountReducer.class);

         job.setReducerClass(RowCountReducer.class);

         job.setOutputKeyClass(Text.class);

         job.setOutputValueClass(IntWritable.class);

         for (int i = 0; i < otherArgs.length - 1; ++i) {

             FileInputFormat.addInputPath(job, new Path(otherArgs[i]));

         }

         FileOutputFormat.setOutputPath(job,

                 new Path(otherArgs[otherArgs.length - 1]));

         System.exit(job.waitForCompletion(true) ? 0 : 1);

     }

 }

输出: count 11

注:如果只想输出一个数字,不需要"count"这个key,可以改进一下:

 package yjmyzz.mr;

 import org.apache.hadoop.conf.Configuration;

 import org.apache.hadoop.fs.Path;

 import org.apache.hadoop.io.LongWritable;

 import org.apache.hadoop.io.NullWritable;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.mapreduce.Job;

 import org.apache.hadoop.mapreduce.Mapper;

 import org.apache.hadoop.mapreduce.Reducer;

 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

 import org.apache.hadoop.util.GenericOptionsParser;

 import yjmyzz.util.HDFSUtil;

 import java.io.IOException;

 public class RowCount2 {

     public static class RowCount2Mapper

             extends Mapper<LongWritable, Text, LongWritable, NullWritable> {

         public long count = 0;

         public void map(LongWritable key, Text value, Context context)

                 throws IOException, InterruptedException {

             count += 1;

         }

         protected void cleanup(Context context) throws IOException, InterruptedException {

             context.write(new LongWritable(count), NullWritable.get());

         }

     }

     public static class RowCount2Reducer extends Reducer<LongWritable, NullWritable, LongWritable, NullWritable> {

         public long count = 0;

         public void reduce(LongWritable key, Iterable<NullWritable> values, Context context)

                 throws IOException, InterruptedException {

             count += key.get();

         }

         protected void cleanup(Context context) throws IOException, InterruptedException {

             context.write(new LongWritable(count), NullWritable.get());

         }

     }

     public static void main(String[] args) throws Exception {

         Configuration conf = new Configuration();

         String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

         if (otherArgs.length < 2) {

             System.err.println("Usage: FindMax <in> [<in>...] <out>");

             System.exit(2);

         }

         //删除输出目录(可选,省得多次运行时,总是报OUTPUT目录已存在)

         HDFSUtil.deleteFile(conf, otherArgs[otherArgs.length - 1]);

         Job job = Job.getInstance(conf, "RowCount2");

         job.setJarByClass(RowCount2.class);

         job.setMapperClass(RowCount2Mapper.class);

         job.setCombinerClass(RowCount2Reducer.class);

         job.setReducerClass(RowCount2Reducer.class);

         job.setOutputKeyClass(LongWritable.class);

         job.setOutputValueClass(NullWritable.class);

         for (int i = 0; i < otherArgs.length - 1; ++i) {

             FileInputFormat.addInputPath(job, new Path(otherArgs[i]));

         }

         FileOutputFormat.setOutputPath(job,

                 new Path(otherArgs[otherArgs.length - 1]));

         System.exit(job.waitForCompletion(true) ? 0 : 1);

     }

 }

这样输出结果就只有一个数字11了.

注意: 这里context.write(xxx)只能写在cleanup方法中, 该方法在Mapper和Reducer接口中都有, 在map方法及reduce方法执行完后,会触发cleanup方法. 大家可以尝试下,把context.write(xxx)写在map和reduce方法中试试看,结果会出现多行记录,而不是预期的仅1个数字.

4)求最大值(Max)

 package yjmyzz.mr;

 import org.apache.hadoop.conf.Configuration;

 import org.apache.hadoop.fs.Path;

 import org.apache.hadoop.io.LongWritable;

 import org.apache.hadoop.io.NullWritable;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.mapreduce.Job;

 import org.apache.hadoop.mapreduce.Mapper;

 import org.apache.hadoop.mapreduce.Reducer;

 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

 import org.apache.hadoop.util.GenericOptionsParser;

 import yjmyzz.util.HDFSUtil;

 import java.io.IOException;

 public class Max {

     public static class MaxMapper

             extends Mapper<LongWritable, Text, LongWritable, NullWritable> {

         public long max = Long.MIN_VALUE;

         public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

             max = Math.max(Long.parseLong(value.toString()), max);

         }

         protected void cleanup(Mapper.Context context) throws IOException, InterruptedException {

             context.write(new LongWritable(max), NullWritable.get());

         }

     }

     public static class MaxReducer extends Reducer<LongWritable, NullWritable, LongWritable, NullWritable> {

         public long max = Long.MIN_VALUE;

         public void reduce(LongWritable key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {

             max = Math.max(max, key.get());

         }

         protected void cleanup(Reducer.Context context) throws IOException, InterruptedException {

             context.write(new LongWritable(max), NullWritable.get());

         }

     }

     public static void main(String[] args) throws Exception {

         Configuration conf = new Configuration();

         String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

         if (otherArgs.length < 2) {

             System.err.println("Usage: Max <in> [<in>...] <out>");

             System.exit(2);

         }

         //删除输出目录(可选,省得多次运行时,总是报OUTPUT目录已存在)

         HDFSUtil.deleteFile(conf, otherArgs[otherArgs.length - 1]);

         Job job = Job.getInstance(conf, "Max");

         job.setJarByClass(Max.class);

         job.setMapperClass(MaxMapper.class);

         job.setCombinerClass(MaxReducer.class);

         job.setReducerClass(MaxReducer.class);

         job.setOutputKeyClass(LongWritable.class);

         job.setOutputValueClass(NullWritable.class);

         for (int i = 0; i < otherArgs.length - 1; ++i) {

             FileInputFormat.addInputPath(job, new Path(otherArgs[i]));

         }

         FileOutputFormat.setOutputPath(job,

                 new Path(otherArgs[otherArgs.length - 1]));

         System.exit(job.waitForCompletion(true) ? 0 : 1);

     }

 }

输出结果:8

如果看懂了刚才的Count2版本的代码,这个自然不用多解释.

5)求和(Sum)

 package yjmyzz.mr;

 import org.apache.hadoop.conf.Configuration;

 import org.apache.hadoop.fs.Path;

 import org.apache.hadoop.io.LongWritable;

 import org.apache.hadoop.io.NullWritable;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.mapreduce.Job;

 import org.apache.hadoop.mapreduce.Mapper;

 import org.apache.hadoop.mapreduce.Reducer;

 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

 import org.apache.hadoop.util.GenericOptionsParser;

 import yjmyzz.util.HDFSUtil;

 import java.io.IOException;

 public class Sum {

     public static class SumMapper

             extends Mapper<LongWritable, Text, LongWritable, NullWritable> {

         public long sum = 0;

         public void map(LongWritable key, Text value, Context context)

                 throws IOException, InterruptedException {

             sum += Long.parseLong(value.toString());

         }

         protected void cleanup(Context context) throws IOException, InterruptedException {

             context.write(new LongWritable(sum), NullWritable.get());

         }

     }

     public static class SumReducer extends Reducer<LongWritable, NullWritable, LongWritable, NullWritable> {

         public long sum = 0;

         public void reduce(LongWritable key, Iterable<NullWritable> values, Context context)

                 throws IOException, InterruptedException {

             sum += key.get();

         }

         protected void cleanup(Context context) throws IOException, InterruptedException {

             context.write(new LongWritable(sum), NullWritable.get());

         }

     }

     public static void main(String[] args) throws Exception {

         Configuration conf = new Configuration();

         String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

         if (otherArgs.length < 2) {

             System.err.println("Usage: Sum <in> [<in>...] <out>");

             System.exit(2);

         }

         //删除输出目录(可选,省得多次运行时,总是报OUTPUT目录已存在)

         HDFSUtil.deleteFile(conf, otherArgs[otherArgs.length - 1]);

         Job job = Job.getInstance(conf, "Sum");

         job.setJarByClass(Sum.class);

         job.setMapperClass(SumMapper.class);

         job.setCombinerClass(SumReducer.class);

         job.setReducerClass(SumReducer.class);

         job.setOutputKeyClass(LongWritable.class);

         job.setOutputValueClass(NullWritable.class);

         for (int i = 0; i < otherArgs.length - 1; ++i) {

             FileInputFormat.addInputPath(job, new Path(otherArgs[i]));

         }

         FileOutputFormat.setOutputPath(job,

                 new Path(otherArgs[otherArgs.length - 1]));

         System.exit(job.waitForCompletion(true) ? 0 : 1);

     }

 }

输出结果:43

Sum与刚才的Max原理如出一辙,不多解释了,依旧利用了cleanup方法

6)求平均值(Avg)

 package yjmyzz.mr;

 import org.apache.hadoop.conf.Configuration;

 import org.apache.hadoop.fs.Path;

 import org.apache.hadoop.io.*;

 import org.apache.hadoop.mapreduce.Job;

 import org.apache.hadoop.mapreduce.Mapper;

 import org.apache.hadoop.mapreduce.Reducer;

 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

 import org.apache.hadoop.util.GenericOptionsParser;

 import yjmyzz.util.HDFSUtil;

 import java.io.IOException;

 public class Average {

     public static class AvgMapper

             extends Mapper<LongWritable, Text, LongWritable, LongWritable> {

         public long sum = 0;

         public long count = 0;

         public void map(LongWritable key, Text value, Context context)

                 throws IOException, InterruptedException {

             sum += Long.parseLong(value.toString());

             count += 1;

         }

         protected void cleanup(Context context) throws IOException, InterruptedException {

             context.write(new LongWritable(sum), new LongWritable(count));

         }

     }

     public static class AvgCombiner extends Reducer<LongWritable, LongWritable, LongWritable, LongWritable> {

         public long sum = 0;

         public long count = 0;

         public void reduce(LongWritable key, Iterable<LongWritable> values, Context context)

                 throws IOException, InterruptedException {

             sum += key.get();

             for (LongWritable v : values) {

                 count += v.get();

             }

         }

         protected void cleanup(Context context) throws IOException, InterruptedException {

             context.write(new LongWritable(sum), new LongWritable(count));

         }

     }

     public static class AvgReducer extends Reducer<LongWritable, LongWritable, DoubleWritable, NullWritable> {

         public long sum = 0;

         public long count = 0;

         public void reduce(LongWritable key, Iterable<LongWritable> values, Context context)

                 throws IOException, InterruptedException {

             sum += key.get();

             for (LongWritable v : values) {

                 count += v.get();

             }

         }

         protected void cleanup(Context context) throws IOException, InterruptedException {

             context.write(new DoubleWritable(new Double(sum)/count), NullWritable.get());

         }

     }

     public static void main(String[] args) throws Exception {

         Configuration conf = new Configuration();

         String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

         if (otherArgs.length < 2) {

             System.err.println("Usage: Avg <in> [<in>...] <out>");

             System.exit(2);

         }

         //删除输出目录(可选,省得多次运行时,总是报OUTPUT目录已存在)

         HDFSUtil.deleteFile(conf, otherArgs[otherArgs.length - 1]);

         Job job = Job.getInstance(conf, "Avg");

         job.setJarByClass(Average.class);

         job.setMapperClass(AvgMapper.class);

         job.setCombinerClass(AvgCombiner.class);

         job.setReducerClass(AvgReducer.class);

         //注意这里:由于Mapper与Reducer的输出Key,Value类型不同,所以要单独为Mapper设置类型

         job.setMapOutputKeyClass(LongWritable.class);

         job.setMapOutputValueClass(LongWritable.class);

         job.setOutputKeyClass(DoubleWritable.class);

         job.setOutputValueClass(NullWritable.class);

         for (int i = 0; i < otherArgs.length - 1; ++i) {

             FileInputFormat.addInputPath(job, new Path(otherArgs[i]));

         }

         FileOutputFormat.setOutputPath(job,

                 new Path(otherArgs[otherArgs.length - 1]));

         System.exit(job.waitForCompletion(true) ? 0 : 1);

     }

 }

输出:3.909090909090909

这个稍微要复杂一点,平均值大家都知道=Sum/Count,所以这其实前面Count与Max的综合运用而已,思路是在输出的key-value中,用max做key,用count做value,最终形成{sum,count}的输出,然后在最后的cleanup中,sum/count即得avg,但是有一个特点要注意的地方,由于Mapper与Reducer的output {key,value}类型并不一致,所以96-101行这里,分别设置了Map及Reduce的key,value输出类型,如果没有96-97这二行,100-101这二行会默认把Mapper,Combiner,Reducer这三者的输出类型设置成相同的类型.

7) 改进型的WordCount(按词频倒排)

官网示例WordCount只统计出单词出现的次数,并未按词频做倒排,下面的代码示例实现了该功能

 package yjmyzz.mr;

 import org.apache.hadoop.conf.Configuration;

 import org.apache.hadoop.fs.Path;

 import org.apache.hadoop.io.IntWritable;

 import org.apache.hadoop.io.LongWritable;

 import org.apache.hadoop.io.NullWritable;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.mapreduce.Job;

 import org.apache.hadoop.mapreduce.Mapper;

 import org.apache.hadoop.mapreduce.Reducer;

 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

 import org.apache.hadoop.util.GenericOptionsParser;

 import yjmyzz.util.HDFSUtil;

 import java.io.IOException;

 import java.util.Comparator;

 import java.util.StringTokenizer;

 import java.util.TreeMap;

 public class WordCount2 {

     public static class TokenizerMapper

             extends Mapper<Object, Text, Text, IntWritable> {

         private final static IntWritable one = new IntWritable(1);

         private Text word = new Text();

         public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

             StringTokenizer itr = new StringTokenizer(value.toString());

             while (itr.hasMoreTokens()) {

                 word.set(itr.nextToken());

                 context.write(word, one);

             }

         }

     }

     public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

         //定义treeMap来保持统计结果,由于treeMap是按key升序排列的,这里要人为指定Comparator以实现倒排

         private TreeMap<Integer, String> treeMap = new TreeMap<Integer, String>(new Comparator<Integer>() {

             @Override

             public int compare(Integer x, Integer y) {

                 return y.compareTo(x);

             }

         });

         public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

             //reduce后的结果放入treeMap,而不是向context中记入结果

             int sum = 0;

             for (IntWritable val : values) {

                 sum += val.get();

             }

             if (treeMap.containsKey(sum)){

                 String value = treeMap.get(sum) + "," + key.toString();

                 treeMap.put(sum,value);

             }

             else {

                 treeMap.put(sum, key.toString());

             }

         }

         protected void cleanup(Context context) throws IOException, InterruptedException {

             //将treeMap中的结果,按value-key顺序写入contex中

             for (Integer key : treeMap.keySet()) {

                 context.write(new Text(treeMap.get(key)), new IntWritable(key));

             }

         }

     }

     public static void main(String[] args) throws Exception {

         Configuration conf = new Configuration();

         String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

         if (otherArgs.length < 2) {

             System.err.println("Usage: wordcount2 <in> [<in>...] <out>");

             System.exit(2);

         }

         //删除输出目录

         HDFSUtil.deleteFile(conf, otherArgs[otherArgs.length - 1]);

         Job job = Job.getInstance(conf, "word count2");

         job.setJarByClass(WordCount2.class);

         job.setMapperClass(TokenizerMapper.class);

         job.setCombinerClass(IntSumReducer.class);

         job.setReducerClass(IntSumReducer.class);

         job.setOutputKeyClass(Text.class);

         job.setOutputValueClass(IntWritable.class);

         for (int i = 0; i < otherArgs.length - 1; ++i) {

             FileInputFormat.addInputPath(job, new Path(otherArgs[i]));

         }

         FileOutputFormat.setOutputPath(job,

                 new Path(otherArgs[otherArgs.length - 1]));

         System.exit(job.waitForCompletion(true) ? 0 : 1);

     }

 }

原理: 依然用到了cleanup,此外为了实现排序,采用了TreeMap这种内置了key排序的数据结构.

这里为了展示更直观,选用了电影<超能陆战队>主题曲的第一段歌词做为输入:

They say we are what we are

But we do not have to be

I am  bad behavior but I do it in the best way

I will be the watcher

Of the eternal flame

I will be the guard dog

of all your fever dreams

原版的WordCount处理完后,结果如下:

But	1

I	4

Of	1

They	1

all	1

am	1

are	2

bad	1

be	3

behavior	1

best	1

but	1

do	2

dog	1

dreams	1

eternal	1

fever	1

flame	1

guard	1

have	1

in	1

it	1

not	1

of	1

say	1

the	4

to	1

watcher	1

way	1

we	3

what	1

will	2

your	1

改进后的WordCount2处理结果如下:

I,the	4

be,we	3

are,do,will	2

But,Of,They,all,am,bad,behavior,best,but,dog,dreams,eternal,fever,flame,guard,have,in,it,not,of,say,to,watcher,way,what,your	1

Hadoop: MapReduce2的几个基本示例的更多相关文章

hadoop环境安装及简单Map-Reduce示例
说明:这篇博客来自我的csdn博客,http://blog.csdn.net/lxxgreat/article/details/7753511 一.参考书:<hadoop权威指南--第二版(中文 ...
Hadoop版Helloworld之wordcount运行示例
1.编写一个统计单词数量的java程序,并命名为wordcount.java,代码如下: import java.io.IOException; import java.util.StringToke ...
【Big Data - Hadoop - MapReduce】初学Hadoop之图解MapReduce与WordCount示例分析
Hadoop的框架最核心的设计就是:HDFS和MapReduce.HDFS为海量的数据提供了存储,MapReduce则为海量的数据提供了计算. HDFS是Google File System(GFS) ...
初学Hadoop之图解MapReduce与WordCount示例分析
Hadoop的框架最核心的设计就是:HDFS和MapReduce.HDFS为海量的数据提供了存储,MapReduce则为海量的数据提供了计算. HDFS是Google File System(GFS) ...
Hadoop: MapReduce2多个job串行处理
复杂的MapReduce处理中,往往需要将复杂的处理过程,分解成多个简单的Job来执行,第1个Job的输出做为第2个Job的输入,相互之间有一定依赖关系.以上一篇中的求平均数为例,可以分解成三个步骤: ...
Hadoop MapReduce2.0（Yarn）
版权声明:本文为博主原创文章,未经博主同意不得转载. https://blog.csdn.net/cqboy1991/article/details/25056283 MapReduce2.0(Yar ...
【Hadoop】Hadoop 中 RPC框架原理、代码示例
0.内容 1.hadoop中的RPC框架封装思想 2.Hadoop RPC 实现方法 3.服务调用动态转发和负载均衡的实现思考 4.协议代码: package com.ares.hadoop.rpc; ...
Hadoop:pig 安装及入门示例
pig是hadoop的一个子项目,用于简化MapReduce的开发工作,可以用更人性化的脚本方式分析数据. 一.安装 a) 下载从官网http://pig.apache.org下载最新版本(目前是0 ...
Hadoop Bloom filter应用示例
Hadoop0.20.2 Bloom filter应用示例 2014-06-04 11:55 451人阅读评论(0) 收藏举报 1. 简介参见<Hadoop in Action>P1 ...

随机推荐

RPM方式安装MySQL5.5.48 （Aliyun CentOS 7.0 & 卸载MySQL5.7）
环境是阿里云的CentOS7.0,更新了yum源(更新yum源请参考https://help.aliyun.com/knowledge_detail/5974184.html)之后先是尝试安装了MyS ...
Review 代码
最近需要 Review 代码,学习了<代码整洁之道>.<代码质量>等书籍. 把对这些代码之道的学习心得整理成文
ORACLE手工删除数据库
很多人习惯用ORACLE的DBCA工具创建.删除数据库,这里总结一下手工删除数据库实验的步骤,文中大量参考了乐沙弥的手动删除ORACLE数据库这篇博客的内容,当然还有Oracle官方相关文档.此处实验 ...
Linux：kill 进程
在使用Linux时,出现端口占用.进程已启动(但处于不可控状态)情况时如何处理? 发现已知端口被占用时,可以使用netstat -apn | grep yourPort 来查看占用该端口的进程的pid ...
QT学习第2天
回顾: 1.构建一个QT工程步骤 (1)qmake -project (2)qmake (3)make 2.两个常用的组件 QLabel 标签 QPushButton 按钮 --------- ...
Seafile内部云盘
软件列表软件版本备注 centos 6.4 x86_64 64位系统 mysql mysql5.5.49 本机使用 python 2.7 seafile 依赖python pip 8.1.2 安 ...
mysql服务器配置
mysql的配置文件 /etc/mysql/my.cnf 发现如下配置 # Instead of skip-networking the default is now to listen only ...
[转]VS2010中如何创建一个WCF
本文转自:http://www.cnblogs.com/zhangliangzlee/archive/2012/08/28/2659701.html 转载:http://www.cnblogs.com ...
Caffe 抽取CNN网络特征 Python
Caffe Python特征抽取转载请注明出处,楼燚(yì)航的blog,http://www.cnblogs.com/louyihang-loves-baiyan/ Caffe大家一般用到的深度学 ...
Python+excel实现的简单接口自动化 V0.1
好久没写博客了..最近忙着工作以及新工作的事.. 看了下以前写的简单接口自动化,拿出来总结下,也算记录下学习成果先来贴一下最后的结果,结果是写在原来的excel中执行完毕后,会将结果写入到“状态” ...

Hadoop: MapReduce2的几个基本示例

Hadoop: MapReduce2的几个基本示例的更多相关文章

随机推荐

热门专题