Storm之路-WordCount-实例
初学storm,有不足的地方还请纠正。
网上看了很多wordcount实例,发现都不是我想要的。
实现场景:统计shengjing.txt词频到集合,一次打印结果。
● 消息源Spout
继承BaseRichSpout类 / 实现IRichSpout接口
open,初始化动作;
nextTuple,消息接入,执行数据发射;
ack,tuple成功处理后调用;
fail,tuple处理失败后调用;
declareOutputFields,声明输出字段;
● 处理单元Bolt
继承BaseBasicBolt类 / BaseWindowedBolt / 实现IRichBolt接口
prepare,worker启动时初始化;
execute,接受一个tuple / tupleWindow并执行逻辑处理,发射出去;
cleanup,关闭前调用;
declareOutputFiedls,字段申明;
● 项目结构

● pom.xml文件,配置项目jar依赖
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.scps.storm</groupId>
<artifactId>storm-example</artifactId>
<version>0.0.1</version>
<name>storm.example</name>
<dependencies>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>1.1.0</version>
</dependency>
</dependencies>
</project>
● WordTopology.java文件,入口类,实例Topology、Spout、Bolt,配置等
package com.scps.storm.helloword; import java.util.concurrent.TimeUnit; import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.generated.AlreadyAliveException;
import org.apache.storm.generated.AuthorizationException;
import org.apache.storm.generated.InvalidTopologyException;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.topology.base.BaseWindowedBolt.Duration;
import org.apache.storm.tuple.Fields; import com.scps.storm.helloword.bolt.SlidingWindowBolt;
import com.scps.storm.helloword.bolt.WordCountBolt;
import com.scps.storm.helloword.bolt.WordFinalBolt;
import com.scps.storm.helloword.bolt.WordSplitBolt;
import com.scps.storm.helloword.spout.WordReaderSpout; public class WordTopology { public static void main(String[] args) { TopologyBuilder builder = new TopologyBuilder(); // 1个task去读文件
builder.setSpout("word-reader", new WordReaderSpout(), 1); // 2个task分割行
builder.setBolt("word-split", new WordSplitBolt(), 2).shuffleGrouping("word-reader"); // 2个task分批统计,并发送相同的word到同一个task
builder.setBolt("word-count", new WordCountBolt(), 2).fieldsGrouping("word-split", new Fields("word")); // 1个task汇总,每隔3秒统计最近5秒的tuple,SlidingWindow滑动窗口(间隔)
// builder.setBolt("sliding-window-bolt", new SlidingWindowBolt().withWindow(new Duration(5, TimeUnit.SECONDS), new Duration(3, TimeUnit.SECONDS)), 1).shuffleGrouping("word-count");
// 1个task汇总,统计5秒内的tuple,不能超过15秒?提示超时错误,TumblingWindow滚动窗口
builder.setBolt("sliding-window-bolt", new SlidingWindowBolt().withTumblingWindow(new Duration(5, TimeUnit.SECONDS)), 1).shuffleGrouping("word-count"); // 1个task输出
builder.setBolt("word-final", new WordFinalBolt(), 1).shuffleGrouping("sliding-window-bolt"); Config conf = new Config(); conf.setDebug(false); if (args != null && args.length > 0) { // 在集群运行,需要mvn package编译
// bin/storm jar "/root/storm-example-0.0.1.jar" com.scps.storm.helloword.WordTopology "http://nimbus:8080/uploads/shengjing.txt" wordcount try { String file = args[0];
String name = args[1]; conf.put("file", file);
// conf.setNumWorkers(2); StormSubmitter.submitTopology(name, conf, builder.createTopology()); } catch (AlreadyAliveException e) { e.printStackTrace(); } catch (InvalidTopologyException e) { e.printStackTrace(); } catch (AuthorizationException e) { e.printStackTrace();
} } else { // 直接在eclipse中运行 conf.put("file", "C:\\Users\\Administrator\\Downloads\\shengjing1.txt");
// conf.put("file", "http://192.168.100.170:8080/uploads/shengjing.txt");
// conf.setMaxTaskParallelism(2); // 设置最大task数
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("wordcount", conf, builder.createTopology());
}
}
}
● WordReaderSpout.java文件,读取txt文件,发送行
package com.scps.storm.helloword.spout; import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.UnsupportedEncodingException;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Map; import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichSpout;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import org.apache.storm.utils.Utils; public class WordReaderSpout implements IRichSpout { private static final long serialVersionUID = 1L;
private SpoutOutputCollector outputCollector;
private String filePath;
private boolean completed = false; public void ack(Object arg0) { } public void activate() { } public void close() { } public void deactivate() { } public void fail(Object arg0) { } @SuppressWarnings("rawtypes")
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { filePath = conf.get("file").toString();
outputCollector = collector;
} public void nextTuple() { if (!completed) { String time = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(new Date());
System.out.println("WordReaderSpout nextTuple, " + time); String line = "";
InputStream inputStream = null;
InputStreamReader inputStreamReader = null;
BufferedReader reader = null; try { // filePath = "http://192.168.100.170:8080/uploads/shengjing.txt";
// filePath = "C:\\Users\\Administrator\\Downloads\\shengjing.txt"; if (filePath.startsWith("http://")) { // 远程文件
URL url = new URL(filePath);
URLConnection urlConn = url.openConnection();
inputStream = urlConn.getInputStream();
} else { // 本地文件
inputStream = new FileInputStream(filePath);
} inputStreamReader = new InputStreamReader(inputStream, "utf-8");
reader = new BufferedReader(inputStreamReader);
while ((line = reader.readLine()) != null) {
outputCollector.emit(new Values(line));
} } catch (MalformedURLException e) {
e.printStackTrace();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
completed = true;
try {
if (reader != null) {
reader.close();
}
if (inputStreamReader != null) {
inputStreamReader.close();
}
if (inputStream != null) {
inputStream.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
} Utils.sleep(20000);
} public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("line"));
} public Map<String, Object> getComponentConfiguration() { return null;
}
}
使用集群测试时,先把txt文件上传到nimbus的ui里,随机指派supervisor远程读取文件。
● WordSplitBolt.java文件,接收行,分割行,发送词
package com.scps.storm.helloword.bolt; import java.util.Map; import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values; public class WordSplitBolt implements IRichBolt { private static final long serialVersionUID = 1L;
private OutputCollector outputCollector; @SuppressWarnings("rawtypes")
public void prepare(Map conf, TopologyContext context, OutputCollector collector) { outputCollector = collector;
} public void execute(Tuple input) { String line = input.getStringByField("line"); line = line.trim();
line = line.replace(",", " ");
line = line.replace(".", " ");
line = line.replace(":", " ");
line = line.replace(";", " ");
line = line.replace("?", " ");
line = line.replace("!", " ");
line = line.replace("(", " ");
line = line.replace(")", " ");
line = line.replace("[", " ");
line = line.replace("]", " ");
line = line.trim(); String[] words = line.split(" ");
for (String word : words) {
word = word.trim();
if (!"".equals(word)) {
outputCollector.emit(new Values(word));
}
}
} public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word"));
} public void cleanup() { } public Map<String, Object> getComponentConfiguration() { return null;
}
}
● WordCountBolt.java文件,接收词,统计词,发送集合
package com.scps.storm.helloword.bolt; import java.util.HashMap;
import java.util.Map; import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values; public class WordCountBolt implements IRichBolt { private static final long serialVersionUID = 1L;
Map<String, Integer> counter;
private OutputCollector outputCollector; @SuppressWarnings("rawtypes")
public void prepare(Map conf, TopologyContext context, OutputCollector collector) { counter = new HashMap<String, Integer>();
outputCollector = collector;
} public void execute(Tuple input) { String word = input.getStringByField("word");
int count; if (!counter.containsKey(word)) {
count = 1;
} else {
count = counter.get(word) + 1;
} counter.put(word, count);
outputCollector.emit(new Values(word, count));
} public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word", "count"));
} public void cleanup() { } public Map<String, Object> getComponentConfiguration() { return null;
}
}
● SlidingWindowBolt.java文件,接收集合,合并集合,发送集合
package com.scps.storm.helloword.bolt; import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.HashMap;
import java.util.Map; import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseWindowedBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import org.apache.storm.windowing.TupleWindow; public class SlidingWindowBolt extends BaseWindowedBolt { private static final long serialVersionUID = 1L;
Map<String, Integer> counter;
private OutputCollector outputCollector; @SuppressWarnings("rawtypes")
public void prepare(Map conf, TopologyContext context, OutputCollector collector) { counter = new HashMap<String, Integer>();
outputCollector = collector;
} public void execute(TupleWindow inputWindow) { String time = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(new Date());
System.out.println("SlidingWindowBolt execute, " + time); for (Tuple input : inputWindow.get()) { String word = input.getStringByField("word");
int count = input.getIntegerByField("count"); counter.put(word, count);
} outputCollector.emit(new Values(counter));
} public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("counter"));
}
}
● WordFinalBolt.java文件,接收集合,打印集合
package com.scps.storm.helloword.bolt; import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Date;
import java.util.List;
import java.util.Map; import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Tuple; public class WordFinalBolt implements IRichBolt { private static final long serialVersionUID = 1L; @SuppressWarnings("rawtypes")
public void prepare(Map conf, TopologyContext context, OutputCollector collector) { } @SuppressWarnings("unchecked")
public void execute(Tuple input) { Map<String, Integer> counter = (Map<String, Integer>) input.getValueByField("counter");
List<String> keys = new ArrayList<String>();
keys.addAll(counter.keySet());
Collections.sort(keys);
String time = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(new Date());
System.out.println("-----------------begin------------------, " + time);
for (String key : keys) {
System.out.println(key + " : " + counter.get(key));
}
System.out.println("-----------------end--------------------, " + time);
} public void cleanup() { } public void declareOutputFields(OutputFieldsDeclarer declarer) { } public Map<String, Object> getComponentConfiguration() { return null;
}
}
● 项目源码文件地址:https://pan.baidu.com/s/1mhZtvq4 密码:ypbc
Storm之路-WordCount-实例的更多相关文章
- Hadoop3 在eclipse中访问hadoop并运行WordCount实例
前言: 毕业两年了,之前的工作一直没有接触过大数据的东西,对hadoop等比较陌生,所以最近开始学习了.对于我这样第一次学的人,过程还是充满了很多疑惑和不解的,不过我采取的策略是还是先让环 ...
- Storm系列(二):使用Csharp创建你的第一个Storm拓扑(wordcount)
WordCount在大数据领域就像学习一门语言时的hello world,得益于Storm的开源以及Storm.Net.Adapter,现在我们也可以像Java或Python一样,使用Csharp创建 ...
- hadoop运行wordcount实例,hdfs简单操作
1.查看hadoop版本 [hadoop@ltt1 sbin]$ hadoop version Hadoop -cdh5.12.0 Subversion http://github.com/cloud ...
- hadoop2.6.5运行wordcount实例
运行wordcount实例 在/tmp目录下生成两个文本文件,上面随便写两个单词. cd /tmp/ mkdir file cd file/ echo "Hello world" ...
- 执行hadoop自带的WordCount实例
hadoop 自带的WordCount实例可以统计一批文本文件中各单词出现的次数.下面介绍如何执行WordCount实例. 1.启动hadoop [root@hadoop ~]# start-all. ...
- Python实现MapReduce,wordcount实例,MapReduce实现两表的Join
Python实现MapReduce 下面使用mapreduce模式实现了一个简单的统计日志中单词出现次数的程序: from functools import reduce from multiproc ...
- wordcount实例
scala的wordcount实例 package com.wondersgroup.myscala import scala.actors.{Actor, Future} import scala. ...
- Spark源码编译并在YARN上运行WordCount实例
在学习一门新语言时,想必我们都是"Hello World"程序开始,类似地,分布式计算框架的一个典型实例就是WordCount程序,接触过Hadoop的人肯定都知道用MapRedu ...
- Storm手写WordCount
建立一个maven项目,在pom.xml中进行如下配置: <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:x ...
随机推荐
- 漫谈程序员(十二)IT程序猿之猿体是革命的本钱
IT程序猿之猿体是革命的本钱 前言 程序猿的一大特点就是加班.加班.不停地加班.... 为了美好的生活,为了生活的更加美好.我们选择勤勤恳恳,踏踏实实. 但是,工作只是生活的一部分.生命中最重要的莫过 ...
- Android启动过程分析
Android系统启动过程 首先看一张Android框架结构图 Linux内核启动之后就到Android Init进程,进而启动Android相关的服务和应用. 启动的过程如下图所示:(图片来自网上, ...
- Web报表工具FineReport二次开发JS之字符串
在报表开发过程中,有些需求可能无法通过现有的功能来实现,需要开发人员二次开发,以FineReport为例,可以使用网页脚本.API接口等进行深入的开发与控制. 考虑到JS脚本开发的使用较多,这里先先简 ...
- STL字符串常用方法扩展
前言 STL作为一个标准模版库,很多容器和算法都是很实用的,接口也相对比较友好,而且在实现上也比较轻量级.相对boost来说,要轻量得多,相对loki来说,使用的模版语法不会那么晦涩难懂,基本还是能看 ...
- STL - string(典型操作demo)
1String概念 string是STL的字符串类型,通常用来表示字符串.而在使用string之前,字符串通常是用char*表示的.string与char*都可以用来表示字符串,那么二者有什么区别呢 ...
- 【翻译】对于Ext JS 5,你准备好了吗?
原文:Are You Ready for Ext JS 5? Ext JS 5:准备升级 对于Ext JS 5加入Sencha的大家庭,我们感到非常高兴!作为一个主要版本,在Ext JS 5引入了一堆 ...
- 纯命令提交代码到git仓库(教你怎么装逼)
如果不喜欢用命令的请点链接:http://blog.csdn.net/xiangzhihong8/article/details/50715427 我这里用纯命令,主要是因为这两天不知道什么原因,ba ...
- 【Qt编程】Qt学习之Window and Dialog Widgets
Qt Creator 提供的默认基类只要QMainWindow.QWidget和QDialog三种.其中,QMainWindow是带有菜单栏和工具栏的主窗口类,QDialog是各种对话框的基类,这两个 ...
- OC中的枚举类型
背景 一直对OC中的位移操作枚举不太理解,查找到两篇介绍OC中枚举的文章,觉得很不错. 什么是位移操作枚举呢? typedef NS_OPTIONS(NSUInteger, UIViewAutores ...
- DiskLruCache硬盘缓存技术详解
上次讲了使用内存缓存LruCache去加载很多图片而不造成OOM,而这种缓存的特点是在应用程序运行时管理内存中的资源(图片)的存储和释放,如果LruCache中有一张图片被释放了,再次加载该图片时需要 ...