Storm里面fieldsGrouping和Field的概念详解

这个Field通常和fieldsGrouping分组机制一起使用，这个Field特别难理解，我自己也是在网上看了好多文章，感觉依旧讲的不是很清楚，是似而非，没有抓到重点。这个问题足足困扰了我3-4天时间，一直理解不了Field的概念，

当前我觉得new Fields("word")就相当于表的表头，就是定义这个域，这个域里面放的东西，是emit进去的

如果在declareOutputFields方法中new Fields("word1","word2")有2个及以上的fields，则在emit数据时new Value要与其对应（相当于key与value的关系）,然后在topology组装时，fieldsGrouping中的new Fields()可以为new Fields("word1")或new Fields("word2")或new Fields("word1"，”word2")来指定接受上游spout或bolt的哪些fields

官方文档里有这么一句话：“if the stream is grouped by the “user-id” field, tuples with the same “user-id” will always go to the same task”

一个task就是一个处理逻辑的实例，所以fields能根据tuple stream的id，也就是下面定义的xxx
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("xxx"));
}
xxx所代表的具体内容会由某一个task来处理，并且同一个xxx对应的内容，处理这个内容的task实例是同一个。

比如说：

bolt第一次emit三个流，即xxx有luonq pangyang qinnl三个值，假设分别建立三个task实例来处理：

luonq -> instance1
pangyang -> instance2
qinnl -> instance3

然后第二次emit四个流，即xxx有luonq qinnanluo py pangyang四个值，假设还是由刚才的三个task实例来处理：
luonq -> instance1
qinnanluo -> instance2
py -> instance3
pangyang -> instance2

然后第三次emit两个流，即xxx有py qinnl两个值，假设还是由刚才的三个task实例来处理：
py -> instance3
qinnl -> instance3

最后我们看看三个task实例都处理了哪些值，分别处理了多少次：

instance1: luonq（处理2次）
instance2: pangyang（处理2次） qinnanluo（处理1次）
instance3: qinnl（处理2次） py（处理2次）

结论：
1. emit发出的值第一次由哪个task实例处理是随机的，此后再次出现这个值，就固定由最初处理他的那个task实例再次处理，直到topology结束

2. 一个task实例可以处理多个emit发出的值

3. 和shuffle Grouping的区别就在于，shuffle Grouping当emit发出同样的值时，处理他的task是随机的

例子1：
第一步：定义了一个表头
public void declareOutputFields(OutputFieldsDeclarer declarer)
    {
        declarer.declare(new Fields("word"));
    }
第二步：往这个Field空间里面emit进去内容（可以是Bolt和Spolt）
public void execute(Tuple input, BasicOutputCollector collector)
    {
        String sentence = input.getString(0);
        String[] words = sentence.split(" ");
        for (String word : words)
        {
            word = word.trim();
            if (!word.isEmpty())
            {
                word = word.toLowerCase();
                collector.emit(new Values(word));
            }
        }
    }
第三步：关联步骤
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("word-reader",new WordReader());
builder.setBolt("word-normalizer", new WordNormalizer()).shuffleGrouping("word-reader");
Integer number = 2;
builder.setBolt("word-counter", new WordCounter(), 4).fieldsGrouping("word-normalizer", new Fields("word"));

第四步：
最终实现的结果：
Field：Word
the
sporm
is
...

例子2：

第一步：
public void declareOutputFields(OutputFieldsDeclarer declarer)
{
declarer.declare(new Fields("word", "count"));
}

第二步：
public void execute(Tuple tuple, BasicOutputCollector collector)
{
            String word = tuple.getString(0);
            Integer count = counts.get(word);
            if (count == null)
                count = 0;
            count++;
            counts.put(word, count);
            collector.emit(new Values(word, count));
}
第三步：
Fields("word", "count")
“is”，1
“sporm”，3
“the”，2
.....
例子3：
D:\.....\Workspaces\MyEclipse 8.5\bigData\examples-ch06-real-life-app-master\src\main\java\storm\analytics\....
第一步：
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("read-feed", new UsersNavigationSpout(), 3);
builder.setBolt("get-categ", new GetCategoryBolt(), 3).shuffleGrouping("read-feed");
builder.setBolt("user-history", new UserHistoryBolt(), 5).fieldsGrouping("get-categ", new Fields("user"));

第二步：发送者输出是三个结构体：Fields("user","product", "categ")
GetCategoryBolt.java
public void execute(Tuple input, BasicOutputCollector collector)
{
        NavigationEntry entry = (NavigationEntry)input.getValue(1);
        if("PRODUCT".equals(entry.getPageType())){
            try {
                String product = (String)entry.getOtherData().get("product");

// Call the items API to get item information
                Product itm = reader.readItem(product);
                if(itm ==null)
                    return ;

String categ = itm.getCategory();

collector.emit(new Values(entry.getUserId(), product, categ));

} catch (Exception ex) {
                System.err.println("Error processing PRODUCT tuple"+ ex);
                ex.printStackTrace();
            }
        }
    }

@Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("user","product", "categ"));
    }

第三步：new Fields("user"))只取Fields("user","product", "categ"))中的User
builder.setBolt("user-history", new UserHistoryBolt(), 5).fieldsGrouping("get-categ", new Fields("user"));
---------------------
作者：VessalasdXZ
来源：CSDN
原文：https://blog.csdn.net/vessalasd1/article/details/50472123
版权声明：本文为博主原创文章，转载请附上博文链接！

Storm里面fieldsGrouping和Field的概念详解的更多相关文章

JWT基础概念详解
JWT基础概念详解 JWT介绍之前我们文章讲过分布式session如何存储,其中就讲到过Token.JWT.首先,我们来回顾一下使用Token进行身份认证. 客户端发送登录请求到服务器服务器在用户 ...
Storm 学习之路（二）—— Storm核心概念详解
一.Storm核心概念 1.1 Topologies(拓扑) 一个完整的Storm流处理程序被称为Storm topology(拓扑).它是一个是由Spouts 和Bolts通过Stream连接起来的 ...
Storm 系列（二）—— Storm 核心概念详解
一.Storm核心概念 1.1 Topologies(拓扑) 一个完整的 Storm 流处理程序被称为 Storm topology(拓扑).它是一个是由 Spouts 和 Bolts 通过 Stre ...
Storm里面fieldsGrouping和Field参数和 declareOutputFields
Fields,个人理解,类似于一张表,你取那些字段以及这些字段所对应的数据给后面的bolt用这个Field通常和fieldsGrouping分组机制一起使用,这个Field特别难理解,我自己也是在网 ...
java入门---对象和类&概念详解&实例
Java作为一种面向对象语言.支持以下基本概念: 多态继承封装抽象类对象实例方法重载这篇文章,我们主要来看下: 对象:对象是类的一个实例(对象不是找个女朋友),有状态 ...
Android屏幕密度（Density）和分辨率概念详解
移动设备有大有小,那么如何适应不同屏幕呢,这给我们编程人员造成了很多困惑.我也是突然想到这些问题,然后去网上搜搜相关东西,整理如下. 首先,对下面这些长度单位必须了解. Android中的长度单位 ...
图像处理术语解释：灰度、色相、饱和度、亮度、明度、阿尔法通道、HSL、HSV、RGBA、ARGB和PRGBA以及Premultiplied Alpha（Alpha预乘）等基础概念详解
☞ ░ 前往老猿Python博文目录 ░ 一.引言由于老猿以前没接触过图像处理,在阅读moviepy代码时,对类的有些处理方法代码看不懂是什么含义,为此花了4天时间查阅了大量资料,并加以自己的理解和 ...
1-Hyperledger Fabric概念详解
目录一.Hyperledger Fabric概述二.基本术语 1.共享账本ledger 2.通道Channel 3.组织Org 4.智能合约Chaincode 5.背书Endorse 6.各种节点 ...
Spring概念详解
1.什么是 Spring ? Spring是一个开源框架,Spring是于2003 年兴起的一个轻量级的Java 开发框架,由Rod Johnson 在其著作Expert One-On-One J2E ...

随机推荐

Eclipse插件——EasyExplore安装
Eclipse插件--EasyExplore安装分类: eclipse2011-12-07 09:02 458人阅读评论(0) 收藏举报插件功能 easyexplore是一个eclipse的小 ...
【C#】截取字符串
几个经常用到的字符串的截取 string str="123abc456"; int i=3; 1 取字符串的前i个字符 str=str.Substring(0,i); // or ...
Eclipse插件无法识别的解决方法汇总
参考 http://www.cnblogs.com/apollolee/archive/2013/06/18/3142243.html
洛谷P2911 [USACO08OCT]牛骨头Bovine Bones【水题】
题目大意:输入S1,S2,S3,随机生成三个数x,y,z,求x+y+z出现次数最多的数(如果有多个答案输出最小的),其中1<=x<=S1,1<=y<=S2,1<=z< ...
学习Vim的四周计划
来源:Python程序员 ID:pythonbuluo vim具有自定义配色方案,语法高亮,linting和自动填充功能 Vim是一个以非常难学而闻名的命令行文本编辑器(有个关于Vim的笑话:问如何生 ...
不建议使用Restsharp
Restsharp确实是个优秀的插件,它最大的特点是内置了JsonConverter, 在一定程度上简化了HttpWebRequest的使用,在nuget上面有19.3M的下载量,是个很好的证明. 但 ...
Jenkins+maven+gitlab+shell实现项目自动化部署
确认jdk , maven,git这些已经在服务器上搭建成功,gitlab使用的是公司服务也没有进行搭建下面是jenkins的两种搭建方式 1. 第一种比较简单下载对应jenkins.wa ...
thinkphp5使用第三方没有使用命名空间的类库
特别注意的是,如果你需要调用PHP内置的类库,或者第三方没有使用命名空间的类库,记得在实例化类库的时候加上 \ // 错误的用法 $class = new stdClass(); $xml = new ...
HQL和SQL
hql是面向对象查询,格式:from + 类名 + 类对象 + where + 对象的属性 sql是面向数据库表查询,格式:from + 表名 + where + 表中字段 1.查询一般在hiber ...
025 Reverse Nodes in k-Group 每k个一组翻转链表
给出一个链表,一次翻转 k 个指针节点,并返回修改后的链表.k 是一个正整数,并且小于等于链表的长度.如果指针节点的数量不是 k 的整数倍,那么最后剩余的节点应当保持原来的样子.你不应该改变节点的值, ...

Storm里面fieldsGrouping和Field的概念详解

Storm里面fieldsGrouping和Field的概念详解的更多相关文章

随机推荐

热门专题