Strom的trident单词计数代码

 /**

  * 单词计数

  */

 public class LocalTridentCount {

     public static class MyBatchSpout implements IBatchSpout {

         Fields fields;

         HashMap<Long, List<List<Object>>> batches = new HashMap<Long, List<List<Object>>>();

         public MyBatchSpout(Fields fields) {

             this.fields = fields;

         }

         @Override

         public void open(Map conf, TopologyContext context) {

         }

         @Override

         public void emitBatch(long batchId, TridentCollector collector) {

             List<List<Object>> batch = this.batches.get(batchId);

             if(batch == null){

                 batch = new ArrayList<List<Object>>();

                 Collection<File> listFiles = FileUtils.listFiles(new File("d:\\stormtest"), new String[]{"txt"}, true);

                 for (File file : listFiles) {

                     List<String> readLines;

                     try {

                         readLines = FileUtils.readLines(file);

                         for (String line : readLines) {

                             batch.add(new Values(line));

                         }

                         FileUtils.moveFile(file, new File(file.getAbsolutePath()+System.currentTimeMillis()));

                     } catch (IOException e) {

                         e.printStackTrace();

                     }

                 }

                 if(batch.size()>0){

                     this.batches.put(batchId, batch);

                 }

             }

             for(List<Object> list : batch){

                 collector.emit(list);

             }

         }

         @Override

         public void ack(long batchId) {

             this.batches.remove(batchId);

         }

         @Override

         public void close() {

         }

         @Override

         public Map getComponentConfiguration() {

             Config conf = new Config();

             conf.setMaxTaskParallelism(1);

             return conf;

         }

         @Override

         public Fields getOutputFields() {

             return fields;

         }

     }

     /**

      * 对一行行的数据进行切割成一个个单词

      */

     public static class MySplit extends BaseFunction{

         @Override

         public void execute(TridentTuple tuple, TridentCollector collector) {

             String line = tuple.getStringByField("lines");

             String[] words = line.split("\t");

             for (String word : words) {

                 collector.emit(new Values(word));

             }

         }

     }

     public static class MyWordAgge extends BaseAggregator<Map<String, Integer>>{

         @Override

         public Map<String, Integer> init(Object batchId,

                 TridentCollector collector) {

             return new HashMap<String, Integer>();

         }

         @Override

         public void aggregate(Map<String, Integer> val, TridentTuple tuple,

                 TridentCollector collector) {

             String key = tuple.getString(0);

             /*Integer integer = val.get(key);

             if(integer==null){

                 integer=0;

             }

             integer++;

             val.put(key, integer);*/

             val.put(key, MapUtils.getInteger(val, key, 0)+1);

         }

         @Override

         public void complete(Map<String, Integer> val,

                 TridentCollector collector) {

             collector.emit(new Values(val));

         }

     }

     /**

      * 汇总局部的map，并且打印结果

      *

      */

     public static class MyCountPrint extends BaseFunction{

         HashMap<String, Integer> hashMap = new HashMap<String, Integer>();

         @Override

         public void execute(TridentTuple tuple, TridentCollector collector) {

             Map<String, Integer> map = (Map<String, Integer>)tuple.get(0);

             for (Entry<String, Integer> entry : map.entrySet()) {

                 String key = entry.getKey();

                 Integer value = entry.getValue();

                 Integer integer = hashMap.get(key);

                 if(integer==null){

                     integer=0;

                 }

                 hashMap.put(key, integer+value);

             }

             Utils.sleep(1000);

             System.out.println("==================================");

             for (Entry<String, Integer> entry : hashMap.entrySet()) {

                 System.out.println(entry);

             }

         }

     }

     public static void main(String[] args) {

         //大体流程:首先设置一个数据源MyBatchSpout,会监控指定目录下文件的变化,当发现有新文件的时候把文件中的数据取出来,

         //然后封装到一个batch中发射出来.就会对tuple中的数据进行处理,把每个tuple中的数据都取出来,然后切割..切割成一个个的单词.

         //单词发射出来之后,会对单词进行分组,会对一批假设有10个tuple,会对这10个tuple分完词之后的单词进行分组, 相同的单词分一块

         //分完之后聚合 把相同的单词使用同一个聚合器聚合  然后出结果  每个单词出现多少次...

         //进行汇总  先每一批数据局部汇总  最后全局汇总....

         //这个代码也不是很简单...挺多....就是使用批处理的方式.

         TridentTopology tridentTopology = new TridentTopology();

         tridentTopology.newStream("spoutid", new MyBatchSpout(new Fields("lines")))

             .each(new Fields("lines"), new MySplit(), new Fields("word"))

             .groupBy(new Fields("word"))//用到了分组 对一批tuple中的单词进行分组..

             .aggregate(new Fields("word"), new MyWordAgge(), new Fields("wwwww"))//用到了聚合

             .each(new Fields("wwwww"), new MyCountPrint(), new Fields(""));

         LocalCluster localCluster = new LocalCluster();

         String simpleName = TridentMeger.class.getSimpleName();

         localCluster.submitTopology(simpleName, new Config(), tridentTopology.build());

     }

 }

指定路径下文件中的内容:

程序运行结果:

Strom的trident单词计数代码的更多相关文章

Storm官方提供的trident单词计数的例子
上代码: public class TridentWordCount { public static class Split extends BaseFunction { @Override publ ...
Strom实现单词统计代码
import java.io.File; import java.io.IOException; import java.util.Collection; import java.util.HashM ...
自定义实现InputFormat、OutputFormat、输出到多个文件目录中去、hadoop1.x api写单词计数的例子、运行时接收命令行参数，代码例子
一:自定义实现InputFormat *数据源来自于内存 *1.InputFormat是用于处理各种数据源的,下面是实现InputFormat,数据源是来自于内存. *1.1 在程序的job.setI ...
storm（5）-分布式单词计数例子
例子需求: spout:向后端发送{"sentence":"my dog has fleas"}.一般要连数据源,此处简化写死了. 语句分割bolt(Split ...
MapReduce之单词计数
最近在看google那篇经典的MapReduce论文,中文版可以参考孟岩推荐的 mapreduce 中文版中文翻译论文中提到,MapReduce的编程模型就是: 计算利用一个输入key/value ...
hadoop笔记之MapReduce的应用案例(WordCount单词计数)
MapReduce的应用案例(WordCount单词计数) MapReduce的应用案例(WordCount单词计数) 1. WordCount单词计数作用: 计算文件中出现每个单词的频数输入结果 ...
第一章 flex单词计数程序
学习Flex&Bison目标, 读懂SQLite中SQL解析部分代码 Flex&Bison简介Flex做词法分析Bison做语法分析第一个Flex程序, wc.fl, 单词计数程序 ...
大数据【四】MapReduce（单词计数；二次排序；计数器；join；分布式缓存）
前言: 根据前面的几篇博客学习,现在可以进行MapReduce学习了.本篇博客首先阐述了MapReduce的概念及使用原理,其次直接从五个实验中实践学习(单词计数,二次排序,计数器,join,分 ...
Storm实现单词统计代码
import java.io.File; import java.io.IOException; import java.util.Collection; import java.util.HashM ...

随机推荐

C++STL 容器比较
Vector的使用场景:比如软件历史操作记录的存储,我们经常要查看历史记录,比如上一次的记录,上上次的记录,但却不会去删除记录,因为记录是事实的描述. deque的使用场景:比如排队购票系统,对排队者 ...
关于preg_match() / preg_replace()函数的一点小说明
int preg_match ( string $pattern , string $subject [, array &$matches [, int $flags = 0 [, int $ ...
解决linux系统CentOS下调整home和根分区大小
目标:将VolGroup-lv_home缩小到20G,并将剩余的空间添加给VolGroup-lv_root 1.首先查看磁盘使用情况 [root@localhost ~]# df -h 文件系统 ...
Mybatis-Plus 实战完整学习笔记(八)------delete测试
1.根据ID删除一个员工deleteById /** * 删除客户 * * @throws SQLException */ @Test public void deletedMethod() thro ...
RestTemplate将响应数据转换为具有泛型的类对象
前言: 重要,RestTemplate在SpringBoot项目里即便通过HttpMessageConverters添加了Fastjson且优先级比jackson要高也不会在RestTemplate里 ...
高性能高可用的分布式唯一ID服务——mooon-uniq-id
目录目录 1 1. 前言 1 2. 名词 1 3. 功能 1 4. 唯一性原理 2 5. 系统结构 2 5.1. mooon-uniq-agent 2 5.2. mooon-uniq-master ...
【JS库】URI.js
做前端的,应该有不少人都写过操作URL的代码,比如提取问号后面的参数.或者主机名什么的,比如这样: var url="http://jszai.com/foo?bar=baz", ...
jquery扩展实现input框字符长度限制中文2个字符，英文1个字符
<!DOCTYPE html><html lang="en"><head> <meta charset="UTF-8" ...
Beta阶段第五篇Scrum冲刺博客-Day4
1.站立式会议提供当天站立式会议照片一张 2.每个人的工作 (有work item 的ID),并将其记录在码云项目管理中: 昨天已完成的工作. 张晨晨:目标增加单词收藏功能郭琪容:学习收藏功能的实 ...
spring-aop代理的生效原理
主要说下spring里aop的生效的原理吧,并不是讲底层的cglib和gdk动态代理. 还是老一套的分析流程,先找到了aop的标签的handler,然后看下在解析这个标签的时候,都干了些什么,其实主要 ...

Strom的trident单词计数代码

Strom的trident单词计数代码的更多相关文章

随机推荐

热门专题