hadoop wordcount
Mapper
// map的数量与数的分片有关系
public class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable>{ protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String[] words = StringUtils.split(line, " ");
for (String word : words) {
context.write(new Text(word), new LongWritable(1));
}
}
}
Reducer
public class WCReducer extends Reducer<Text, LongWritable, Text, LongWritable> { @Override
protected void reduce(Text key, Iterable<LongWritable> values, Context context)
throws IOException, InterruptedException {
long count = 0;
for (LongWritable l : values) {
count ++;
}
context.write(key, new LongWritable(count));
}
}
Runner
public class WCRunner { public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf); job.setJarByClass(WCRunner.class); job.setMapperClass(WCMapper.class);
job.setReducerClass(WCReducer.class); job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class); job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
// 设置reduce的数量,对应的会生成设置数量的文件,每个文件的内容是根据
// job.setPartitionerClass(HashPartitioner.class);中的Partitioner确定 job.setNumReduceTasks(10);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
public class WCRunner2 extends Configured implements Tool{ public int run(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf); job.setJarByClass(WCRunner2.class); job.setMapperClass(WCMapper.class);
job.setReducerClass(WCReducer.class); job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class); job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class); FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1])); return job.waitForCompletion(true) ? 0 : 1;
} public static void main(String[] args) throws Exception {
ToolRunner.run(new WCRunner2(), args);
} }
执行: hadoop jar wc.jar com.easytrack.hadoop.mr.WCRunner2 /wordcount.txt /wc/output4
hadoop wordcount的更多相关文章
- Eclipse执行Hadoop WordCount
前期工作 我的Eclipse是安装在Windows下的,通过Eclipse执行程序连接Hadoop, 需要让虚拟机的访问地址和本机的访问地址保持在同一域内,虚拟机的地址更改前面的文章介绍过了,如果想改 ...
- Hadoop wordcount Demon
搭建完成Hadoop后,第一个demon,wordcount.此处参考:http://blog.csdn.net/wangjia55/article/details/53160679 wordcoun ...
- Hadoop WordCount程序
一.把所有Hadoop的依赖jar包导入buildpath,不用一个一个调,都导一遍就可以,因为是一个工程,所以覆盖是没有问题的 二.写wordcount程序 1.工程目录结构如下: 2.写mappe ...
- Hadoop WordCount单词计数原理
计算文件中出现每个单词的频数 输入结果按照字母顺序进行排序 编写WordCount.java 包含Mapper类和Reducer类 编译WordCount.java javac -classpath ...
- hadoop wordcount程序缺陷
在wordcount 程序的main函数中,没有读取运行环境中的各种参数的值,全靠hadoop系统的默认参数跑起来,这样做是有风险的,最突出的就是OOM错误. 自己在刚刚学习hadoop编程时,就是模 ...
- Hadoop - WordCount代码示例
文章来源:http://www.itnose.net/detail/6197823.html import java.io.IOException; import java.util.Iterator ...
- hadoop WordCount例子详解。
[学习笔记] 下载hadoop-2.7.4-src.tar.gz,拷贝hadoop-2.7.4-src.tar.gz中hadoop-mapreduce-project\hadoop-mapreduce ...
- hadoop安装与WordCount例子
1.JDK安装 下载网址: http://www.oracle.com/technetwork/java/javase/downloads/jdk-6u29-download-513648.html ...
- hadoop的wordcount例子运行
可以通过一个简单的例子来说明MapReduce到底是什么: 我们要统计一个大文件中的各个单词出现的次数.由于文件太大.我们把这个文件切分成如果小文件,然后安排多个人去统计.这个过程就是”Map”.然后 ...
随机推荐
- OpenCV学习笔记——图像的腐蚀与膨胀
顺便又复习了一下cvcopy如何进行图像拼接(最近觉得打开多幅图像分别看不如缩小掉放拼接到一幅图像上对比来的好) 首先把拼接的目标图像设置兴趣区域ROI,比如我有一个total,要把a.b.c分别从左 ...
- HDU 3639 Bone Collector II(01背包第K优解)
Bone Collector II Time Limit: 5000/2000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others ...
- fleetctl --help
NAME: fleetctl - fleetctl is a command-line interface to fleet, the cluster-wide CoreOS init syst ...
- ExtJS笔记4 容器与布局(Layouts and Containers)
The layout system is one of the most powerful parts of Ext JS. It handles the sizing and positioning ...
- Class 实现IDisposing方法
public class MyResourceHog : IDisposable { // 已经被处理过的标记 private bool _alreadyDisposed = false; ...
- 控制变量法-初中物理-Nobel Lecture, December 12, 1929-php执行SET GLOBAL connect_timeout=2效果
$link = mysqli_connect("localhost", "wu", "wp", "wdb"); $sql ...
- How Browsers Work: Behind the scenes of modern web browsers
http://www.html5rocks.com/en/tutorials/internals/howbrowserswork/#Parser_Lexer_combination Grammars ...
- Java笔试面试题二(常考问答)转
1.说出ArrayList,Vector, LinkedList的存储性能和特性 ArrayList 和Vector都是使用数组方式存储数据,此数组元素数大于实际存储的数据以便增加和插入元素,它们都允 ...
- PHP5 mysqli 教程
mysqli提供了面向对象和面向过程两种方式来与数据库交互,分别看一下这两种方式. 1.面向对象 在面向对象的方式中,mysqli被封装成一个类,它的构造方法如下: __construct ([ st ...
- Apache Camel
Apache Camel 1 import org.apache.camel.CamelContext; import org.apache.camel.builder.RouteBuilder; i ...