1) WordCount

这个就不多说了,满大街都是,网上有几篇对WordCount的详细分析

http://www.sxt.cn/u/235/blog/5809

http://www.cnblogs.com/zhanghuijunjava/archive/2013/04/27/3036549.html

这二篇都写得不错, 特别几张图画得很清晰

2) 去重处理(Distinct)

类似于db中的select distinct(x) from table , 去重处理甚至比WordCount还要简单,假如我们要对以下文件的内容做去重处理(注:该文件也是后面几个示例的输入参数)

2
8
8
3
2
3
5
3
0
2
7

基本上啥也不用做,在map阶段,把每一行的值当成key分发下去,然后在reduce阶段回收上来就可以了.

注:里面用到了一个自己写的类HDFSUtil,可以在 hadoop: hdfs API示例 一文中找到.

原理:map阶段完成后,在reduce开始之前,会有一个combine的过程,相同的key值会自动合并,所以自然而然的就去掉了重复.

 package yjmyzz.mr;

 import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser; import yjmyzz.util.HDFSUtil; import java.io.IOException; public class RemoveDup { public static class RemoveDupMapper
extends Mapper<Object, Text, Text, NullWritable> { public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
context.write(value, NullWritable.get());
//System.out.println("map: key=" + key + ",value=" + value);
} } public static class RemoveDupReducer extends Reducer<Text, NullWritable, Text, NullWritable> {
public void reduce(Text key, Iterable<NullWritable> values, Context context)
throws IOException, InterruptedException {
context.write(key, NullWritable.get());
//System.out.println("reduce: key=" + key);
}
} public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
System.err.println("Usage: RemoveDup <in> [<in>...] <out>");
System.exit(2);
} //删除输出目录(可选,省得多次运行时,总是报OUTPUT目录已存在)
HDFSUtil.deleteFile(conf, otherArgs[otherArgs.length - 1]); Job job = Job.getInstance(conf, "RemoveDup");
job.setJarByClass(RemoveDup.class);
job.setMapperClass(RemoveDupMapper.class);
job.setCombinerClass(RemoveDupReducer.class);
job.setReducerClass(RemoveDupReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class); for (int i = 0; i < otherArgs.length - 1; ++i) {
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job,
new Path(otherArgs[otherArgs.length - 1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
} }

输出:

0
2
3
5
7
8

3) 记录计数(Count)

这个跟WordCount略有不同,类似于Select Count(*) from tables的效果,代码也超级简单,直接拿WordCount改一改就行了

 package yjmyzz.mr;

 import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import yjmyzz.util.HDFSUtil; import java.io.IOException;
import java.util.StringTokenizer; public class RowCount { public static class RowCountMapper
extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1);
private final static Text countKey = new Text("count"); public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
context.write(countKey, one);
}
} public static class RowCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
} public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
System.err.println("Usage: RowCount <in> [<in>...] <out>");
System.exit(2);
}
//删除输出目录(可选)
HDFSUtil.deleteFile(conf, otherArgs[otherArgs.length - 1]); Job job = Job.getInstance(conf, "word count");
job.setJarByClass(RowCount.class);
job.setMapperClass(RowCountMapper.class);
job.setCombinerClass(RowCountReducer.class);
job.setReducerClass(RowCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
for (int i = 0; i < otherArgs.length - 1; ++i) {
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job,
new Path(otherArgs[otherArgs.length - 1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
} }

输出: count 11

注:如果只想输出一个数字,不需要"count"这个key,可以改进一下:

 package yjmyzz.mr;

 import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import yjmyzz.util.HDFSUtil; import java.io.IOException; public class RowCount2 { public static class RowCount2Mapper
extends Mapper<LongWritable, Text, LongWritable, NullWritable> { public long count = 0; public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
count += 1;
} protected void cleanup(Context context) throws IOException, InterruptedException {
context.write(new LongWritable(count), NullWritable.get());
} } public static class RowCount2Reducer extends Reducer<LongWritable, NullWritable, LongWritable, NullWritable> { public long count = 0; public void reduce(LongWritable key, Iterable<NullWritable> values, Context context)
throws IOException, InterruptedException {
count += key.get();
} protected void cleanup(Context context) throws IOException, InterruptedException {
context.write(new LongWritable(count), NullWritable.get());
} } public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
System.err.println("Usage: FindMax <in> [<in>...] <out>");
System.exit(2);
} //删除输出目录(可选,省得多次运行时,总是报OUTPUT目录已存在)
HDFSUtil.deleteFile(conf, otherArgs[otherArgs.length - 1]); Job job = Job.getInstance(conf, "RowCount2");
job.setJarByClass(RowCount2.class);
job.setMapperClass(RowCount2Mapper.class);
job.setCombinerClass(RowCount2Reducer.class);
job.setReducerClass(RowCount2Reducer.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(NullWritable.class); for (int i = 0; i < otherArgs.length - 1; ++i) {
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job,
new Path(otherArgs[otherArgs.length - 1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
} }

这样输出结果就只有一个数字11了.

注意: 这里context.write(xxx)只能写在cleanup方法中, 该方法在Mapper和Reducer接口中都有, 在map方法及reduce方法执行完后,会触发cleanup方法. 大家可以尝试下,把context.write(xxx)写在map和reduce方法中试试看,结果会出现多行记录,而不是预期的仅1个数字.

4)求最大值(Max)

 package yjmyzz.mr;

 import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import yjmyzz.util.HDFSUtil; import java.io.IOException; public class Max { public static class MaxMapper
extends Mapper<LongWritable, Text, LongWritable, NullWritable> { public long max = Long.MIN_VALUE; public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
max = Math.max(Long.parseLong(value.toString()), max);
} protected void cleanup(Mapper.Context context) throws IOException, InterruptedException {
context.write(new LongWritable(max), NullWritable.get());
} } public static class MaxReducer extends Reducer<LongWritable, NullWritable, LongWritable, NullWritable> { public long max = Long.MIN_VALUE; public void reduce(LongWritable key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException { max = Math.max(max, key.get()); } protected void cleanup(Reducer.Context context) throws IOException, InterruptedException {
context.write(new LongWritable(max), NullWritable.get());
} } public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
System.err.println("Usage: Max <in> [<in>...] <out>");
System.exit(2);
} //删除输出目录(可选,省得多次运行时,总是报OUTPUT目录已存在)
HDFSUtil.deleteFile(conf, otherArgs[otherArgs.length - 1]); Job job = Job.getInstance(conf, "Max");
job.setJarByClass(Max.class);
job.setMapperClass(MaxMapper.class);
job.setCombinerClass(MaxReducer.class);
job.setReducerClass(MaxReducer.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(NullWritable.class); for (int i = 0; i < otherArgs.length - 1; ++i) {
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job,
new Path(otherArgs[otherArgs.length - 1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
} }

输出结果:8

如果看懂了刚才的Count2版本的代码,这个自然不用多解释.

5)求和(Sum)

 package yjmyzz.mr;

 import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import yjmyzz.util.HDFSUtil; import java.io.IOException; public class Sum { public static class SumMapper
extends Mapper<LongWritable, Text, LongWritable, NullWritable> { public long sum = 0; public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
sum += Long.parseLong(value.toString());
} protected void cleanup(Context context) throws IOException, InterruptedException {
context.write(new LongWritable(sum), NullWritable.get());
} } public static class SumReducer extends Reducer<LongWritable, NullWritable, LongWritable, NullWritable> { public long sum = 0; public void reduce(LongWritable key, Iterable<NullWritable> values, Context context)
throws IOException, InterruptedException {
sum += key.get();
} protected void cleanup(Context context) throws IOException, InterruptedException {
context.write(new LongWritable(sum), NullWritable.get());
} } public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
System.err.println("Usage: Sum <in> [<in>...] <out>");
System.exit(2);
} //删除输出目录(可选,省得多次运行时,总是报OUTPUT目录已存在)
HDFSUtil.deleteFile(conf, otherArgs[otherArgs.length - 1]); Job job = Job.getInstance(conf, "Sum");
job.setJarByClass(Sum.class);
job.setMapperClass(SumMapper.class);
job.setCombinerClass(SumReducer.class);
job.setReducerClass(SumReducer.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(NullWritable.class); for (int i = 0; i < otherArgs.length - 1; ++i) {
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job,
new Path(otherArgs[otherArgs.length - 1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
} }

输出结果:43

Sum与刚才的Max原理如出一辙,不多解释了,依旧利用了cleanup方法

6)求平均值(Avg)

 package yjmyzz.mr;

 import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import yjmyzz.util.HDFSUtil; import java.io.IOException; public class Average { public static class AvgMapper
extends Mapper<LongWritable, Text, LongWritable, LongWritable> { public long sum = 0;
public long count = 0; public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
sum += Long.parseLong(value.toString());
count += 1;
} protected void cleanup(Context context) throws IOException, InterruptedException {
context.write(new LongWritable(sum), new LongWritable(count));
} } public static class AvgCombiner extends Reducer<LongWritable, LongWritable, LongWritable, LongWritable> { public long sum = 0;
public long count = 0; public void reduce(LongWritable key, Iterable<LongWritable> values, Context context)
throws IOException, InterruptedException {
sum += key.get();
for (LongWritable v : values) {
count += v.get();
}
} protected void cleanup(Context context) throws IOException, InterruptedException {
context.write(new LongWritable(sum), new LongWritable(count));
} } public static class AvgReducer extends Reducer<LongWritable, LongWritable, DoubleWritable, NullWritable> { public long sum = 0;
public long count = 0; public void reduce(LongWritable key, Iterable<LongWritable> values, Context context)
throws IOException, InterruptedException {
sum += key.get();
for (LongWritable v : values) {
count += v.get();
}
} protected void cleanup(Context context) throws IOException, InterruptedException {
context.write(new DoubleWritable(new Double(sum)/count), NullWritable.get());
} } public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
System.err.println("Usage: Avg <in> [<in>...] <out>");
System.exit(2);
} //删除输出目录(可选,省得多次运行时,总是报OUTPUT目录已存在)
HDFSUtil.deleteFile(conf, otherArgs[otherArgs.length - 1]); Job job = Job.getInstance(conf, "Avg");
job.setJarByClass(Average.class);
job.setMapperClass(AvgMapper.class);
job.setCombinerClass(AvgCombiner.class);
job.setReducerClass(AvgReducer.class); //注意这里:由于Mapper与Reducer的输出Key,Value类型不同,所以要单独为Mapper设置类型
job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(LongWritable.class); job.setOutputKeyClass(DoubleWritable.class);
job.setOutputValueClass(NullWritable.class); for (int i = 0; i < otherArgs.length - 1; ++i) {
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job,
new Path(otherArgs[otherArgs.length - 1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
} }

输出:3.909090909090909

这个稍微要复杂一点,平均值大家都知道=Sum/Count,所以这其实前面Count与Max的综合运用而已,思路是在输出的key-value中,用max做key,用count做value,最终形成{sum,count}的输出,然后在最后的cleanup中,sum/count即得avg,但是有一个特点要注意的地方,由于Mapper与Reducer的output {key,value}类型并不一致,所以96-101行这里,分别设置了Map及Reduce的key,value输出类型,如果没有96-97这二行,100-101这二行会默认把Mapper,Combiner,Reducer这三者的输出类型设置成相同的类型.

7) 改进型的WordCount(按词频倒排)

官网示例WordCount只统计出单词出现的次数,并未按词频做倒排,下面的代码示例实现了该功能

 package yjmyzz.mr;

 import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import yjmyzz.util.HDFSUtil; import java.io.IOException;
import java.util.Comparator;
import java.util.StringTokenizer;
import java.util.TreeMap; public class WordCount2 { public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1);
private Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
} public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { //定义treeMap来保持统计结果,由于treeMap是按key升序排列的,这里要人为指定Comparator以实现倒排
private TreeMap<Integer, String> treeMap = new TreeMap<Integer, String>(new Comparator<Integer>() {
@Override
public int compare(Integer x, Integer y) {
return y.compareTo(x);
}
}); public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
//reduce后的结果放入treeMap,而不是向context中记入结果
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
if (treeMap.containsKey(sum)){
String value = treeMap.get(sum) + "," + key.toString();
treeMap.put(sum,value);
}
else {
treeMap.put(sum, key.toString());
}
} protected void cleanup(Context context) throws IOException, InterruptedException {
//将treeMap中的结果,按value-key顺序写入contex中
for (Integer key : treeMap.keySet()) {
context.write(new Text(treeMap.get(key)), new IntWritable(key));
}
}
} public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
System.err.println("Usage: wordcount2 <in> [<in>...] <out>");
System.exit(2);
}
//删除输出目录
HDFSUtil.deleteFile(conf, otherArgs[otherArgs.length - 1]);
Job job = Job.getInstance(conf, "word count2");
job.setJarByClass(WordCount2.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
for (int i = 0; i < otherArgs.length - 1; ++i) {
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job,
new Path(otherArgs[otherArgs.length - 1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
} }

原理: 依然用到了cleanup,此外为了实现排序,采用了TreeMap这种内置了key排序的数据结构.

这里为了展示更直观,选用了电影<超能陆战队>主题曲的第一段歌词做为输入:

They say we are what we are
But we do not have to be
I am bad behavior but I do it in the best way
I will be the watcher
Of the eternal flame
I will be the guard dog
of all your fever dreams

原版的WordCount处理完后,结果如下:

But	1
I 4
Of 1
They 1
all 1
am 1
are 2
bad 1
be 3
behavior 1
best 1
but 1
do 2
dog 1
dreams 1
eternal 1
fever 1
flame 1
guard 1
have 1
in 1
it 1
not 1
of 1
say 1
the 4
to 1
watcher 1
way 1
we 3
what 1
will 2
your 1

改进后的WordCount2处理结果如下:

I,the	4
be,we 3
are,do,will 2
But,Of,They,all,am,bad,behavior,best,but,dog,dreams,eternal,fever,flame,guard,have,in,it,not,of,say,to,watcher,way,what,your 1

Hadoop: MapReduce2的几个基本示例的更多相关文章

  1. hadoop环境安装及简单Map-Reduce示例

    说明:这篇博客来自我的csdn博客,http://blog.csdn.net/lxxgreat/article/details/7753511 一.参考书:<hadoop权威指南--第二版(中文 ...

  2. Hadoop版Helloworld之wordcount运行示例

    1.编写一个统计单词数量的java程序,并命名为wordcount.java,代码如下: import java.io.IOException; import java.util.StringToke ...

  3. 【Big Data - Hadoop - MapReduce】初学Hadoop之图解MapReduce与WordCount示例分析

    Hadoop的框架最核心的设计就是:HDFS和MapReduce.HDFS为海量的数据提供了存储,MapReduce则为海量的数据提供了计算. HDFS是Google File System(GFS) ...

  4. 初学Hadoop之图解MapReduce与WordCount示例分析

    Hadoop的框架最核心的设计就是:HDFS和MapReduce.HDFS为海量的数据提供了存储,MapReduce则为海量的数据提供了计算. HDFS是Google File System(GFS) ...

  5. Hadoop: MapReduce2多个job串行处理

    复杂的MapReduce处理中,往往需要将复杂的处理过程,分解成多个简单的Job来执行,第1个Job的输出做为第2个Job的输入,相互之间有一定依赖关系.以上一篇中的求平均数为例,可以分解成三个步骤: ...

  6. Hadoop MapReduce2.0(Yarn)

    版权声明:本文为博主原创文章,未经博主同意不得转载. https://blog.csdn.net/cqboy1991/article/details/25056283 MapReduce2.0(Yar ...

  7. 【Hadoop】Hadoop 中 RPC框架原理、代码示例

    0.内容 1.hadoop中的RPC框架封装思想 2.Hadoop RPC 实现方法 3.服务调用动态转发和负载均衡的实现思考 4.协议代码: package com.ares.hadoop.rpc; ...

  8. Hadoop:pig 安装及入门示例

    pig是hadoop的一个子项目,用于简化MapReduce的开发工作,可以用更人性化的脚本方式分析数据. 一.安装 a) 下载 从官网http://pig.apache.org下载最新版本(目前是0 ...

  9. Hadoop Bloom filter应用示例

    Hadoop0.20.2 Bloom filter应用示例 2014-06-04 11:55 451人阅读 评论(0) 收藏 举报 1. 简介 参见<Hadoop in Action>P1 ...

随机推荐

  1. rsa && sha1 js code

    jsbn.js /* * Copyright (c) 2003-2005 Tom Wu * All Rights Reserved. * * Permission is hereby granted, ...

  2. 《Unix网络编程 卷I》思维导图

    很久之前看完书总结的.这本经典教材讲的内容比较多,总结一下,方便理清思路[微笑].

  3. 10、技术经理要阅读的书籍 - IT软件人员书籍系列文章

    技术经理是项目组中的重要角色.他需要负责软件项目中的重要部分,如果项目组没有架构师的话,技术经理还需要担负起架构师的职责.同时,技术经理要对项目中的所有重要的技术问题进行处理. 但是,在项目组内部,软 ...

  4. js:插入节点appendChild insertBefore使用方法

    首先 从定义来理解 这两个方法: appendChild() 方法:可向节点的子节点列表的末尾添加新的子节点.语法:appendChild(newchild) insertBefore() 方法:可在 ...

  5. T-SQL基础--TOP

    理解TOP子句 众所周知,TOP子句可以通过控制返回行的数量来影响查询. 我们知道TOP子句能很容易的满足返回指定行数的子集,接下来有一些例子来展示什么情况下使用TOP子句来返回一个结果集: 你打算返 ...

  6. Java并发之ScheduledExecutorService(schedule、scheduleAtFixedRate、scheduleWithFixedDelay)

    package com.thread.test.thread; import java.util.Timer; import java.util.TimerTask; import java.util ...

  7. Oracle安装前用户信息设置

    如果是重复安装,首先需要清除已经存在的软件安装记录: rm -fr /usr/local/bin/*oraenv rm -fr /usr/local/bin/dbhome rm -fr /usr/tm ...

  8. SQLServer 数据修复命令DBCC一览

    1. DBCC CHECKDB  重启服务器后,在没有进行任何操作的情况下,在SQL查询分析器中执行以下SQL进行数据库的修复,修复数据库存在的一致性错误与分配错误. use master decla ...

  9. Validation failed for one or more entities. See ‘EntityValidationErrors’解决方法

    Validation failed for one or more entities. See ‘EntityValidationErrors’解决方法 You can extract all the ...

  10. js判断游览器是移动端还是PC端

    js判断网页游览器是移动端还是PC端 <script type="text/javascript"> function browserRedirect() { var ...