Using DRPC to complete the required processing

1. Create a new branch of your source using the following command

git branch chap4
git checkout chap4

2. Create a new class named SplitAndProjectToFields, which extends from BaseFunction

public class SplitAndProjectToFields extends BaseFunction {

    public void execute(TridentTuple tuple, TridentCollector collector) {
        Values vals = new Values();
           for(String word: tuple.getString(0).split(" ")) {
            if(word.length() > 0) {
                vals.add(word);
            }
        }
       collector.emit(vals);
    }
}

3. Once this is complete, edit the TermTopology class, and add the following method

public class TermTopology {

    private static void addTFIDFQueryStream(TridentState tfState, TridentState dfState, TridentState dState, TridentTopology topology, LocalDRPC drpc) {
        topology.newDRPCStream("ftidfQuery", drpc)
            .each(new Fields("args"), new SplitAndProjectToFields(), new Fields("documentId", "term"))
            .each(new Fields(), new StaticSourceFunction(), new Fields("source"))
            .stateQuery(tfState, new Fields("documentId", "term"), new MapGet(), new Fields("tf"))
            .stateQuery(dfState, new Fields("term"), new MapGet(), new Fields("df"))
            .stateQuery(tfState, new Fields("source"), new MapGet(), new Fields("d"))
            .each(new Fields("term", "documentId", "tf", "d", "df"), new TfidfExpression(), new Fields("tfidf"))
            .each(new Fields("tfidf"), new FilterNull())
            .project(new Fields("documentId", "term", "tfidf"));

    }
}

4. Then update your buildTopology method by removing the final stream definition and adding the DRPC creation:

public static TridentTopology buildTopology(ITridentSpout spout, LocalDRPC drpc) {

    TridentTopology topology = new TridentTopology();

    Stream documentStream = getUrlStream(topology, spout)
        .each(new Fields("url"), new DocumentFetchFunction(mimeTypes), new Fields("document", "documentId", "source"));

    Stream termStream = documentStream.parallelismHint(20)
        .each(new Fields("document"), new DocumentTokenizer(), new Fields("dirtyTerm"))
        .each(new Fields("dirtyTerm"), new TermFilter(), new Fields("term"))
        .project(new Fields("term","documentId","source"));

    TridentState dfState = termStream.groupBy(new Fields("term"))
        .persistentAggregate(getStateFactory("df"), new Count(), new Fields("df"));

    TridentState dState = documentStream.groupBy(new Fields("source"))
        .persistentAggregate(getStateFactory("d"), new Count(), new Fields("d"));

    TridentState tfState = termStream.groupBy(new Fields("documentId", "term"))
        .persistentAggregate(getStateFactory("tf"), new Count(), new Fields("tf"));

    addTFIDFQueryStream(tfState, dfState, dState, topology, drpc);

    return topology;
}

Implementing a rolling window topology

1. In order to implement the rolling time window, we will need to use a fork of this state implementation. Start by cloning, building, and installing it into our local Maven repo

git clone https://github.com/quintona/trident-cassandra.git

cd trident-cassandra

lein install

2. Then update your project dependencies to include this new version by changing the following code line:

[trident-cassandra/trident-cassandra "0.0.1-wip1"]

To the following line:

[trident-cassandra/trident-cassandra "0.0.1-bucketwip1"]

Simulating time in integration testing

3. Ensure that you have updated your project dependencies in Eclipse using the process described earlier and then create a new class called TimeBasedRowStrategy

public class TimeBasedRowStrategy implements RowKeyStrategy, Serializable {

    private static final long serialVersionUID = 6981400531506165681L;

    @Override
    public <T> String getRowKey(List<List<Object>> keys, Options<T> options) {
       return options.rowKey + StateUtils.formatHour(new Date());
    }
}

4. And implement the StateUtils.formatHour static method

public static String formatHour(Date date){
    return new SimpleDateFormat("yyyyMMddHH").format(date);
}

5. Finally, replace the getStateFactory method in TermTopology with the following

private static StateFactory getStateFactory(String rowKey) {
    CassandraBucketState.BucketOptions options = new CassandraBucketState.BucketOptions();
    options.keyspace = "trident_test";
    options.columnFamily = "tfid";
    options.rowKey = rowKey;
    options.keyStrategy = new TimeBasedRowStrategy();
    return CassandraBucketState.nonTransactional("localhost", options);
}

Storm(4) - Distributed Remote Procedure Calls的更多相关文章

  1. 分布式计算 要不要把写日志独立成一个Server Remote Procedure Call Protocol

    w https://en.wikipedia.org/wiki/Remote_procedure_call In distributed computing a remote procedure ca ...

  2. Remote procedure call (RPC)

    Remote procedure call (RPC) (using the .NET client) Prerequisites This tutorial assumes RabbitMQ isi ...

  3. win32多线程-异步过程调用(asynchronous Procedure Calls, APCs)

    使用overlapped I/O并搭配event对象-----win32多线程-异步(asynchronous) I/O事例,会产生两个基础性问题. 第一个问题是,使用WaitForMultipleO ...

  4. RPC(Remote Procedure Call Protocol)——远程过程调用协议

    RPC(Remote Procedure Call Protocol)--远程过程调用协议,它是一种通过网络从远程计算机程序上请求服务,而不需要了解底层网络技术的协议.RPC协议假定某些传输协议的存在 ...

  5. RPC(Remote Procedure Call Protocol)远程过程调用协议

    RPC(Remote Procedure Call Protocol)——远程过程调用协议,它是一种通过网络从远程计算机程序上请求服务,而不需要了解底层网络技术的协议.RPC协议假定某些传输协议的存在 ...

  6. RPC远程过程调用(Remote Procedure Call)

    RPC,就是Remote Procedure Call,远程过程调用 远程过程调用,自然是相对于本地过程调用 本地过程调用,就好比你现在在家里,你要想洗碗,那你直接把碗放进洗碗机,打开洗碗机开关就可以 ...

  7. Jmeter Distributed (Remote) Testing: Master Slave Configuration

    What is Distributed Testing? DistributedTestingis a kind of testing which use multiple systems to pe ...

  8. RPC(Remote Procedure Call Protocol)——远程过程调用协议 学习总结

        首先了解什么叫RPC,为什么要RPC,RPC是指远程过程调用,也就是说两台服务器A,B,一个应用部署在A服务器上,想要调用B服务器上应用提供的函数/方法,由于不在一个内存空间,不能直接调用,需 ...

  9. RPC(Remote Procedure Call Protocol)

    远程过程调用协议: 1.调用客户端句柄:执行传送参数 2.调用本地系统内核发送网络消息 3.消息传送到远程主机 4.服务器句柄得到消息并取得参数 5.执行远程过程 6.执行的过程将结果返回服务器句柄 ...

随机推荐

  1. html textarea 获取换行

    1.需求: 获取textarea中的换行符,存到数据库中,并在取出时显示出换行操作 2.实践 2.1 发现可以取到换行符 "/n" ,并且可以存储到MySQL数据库中,并不需要特殊 ...

  2. [转载] linux cgroup

    原文: http://coolshell.cn/articles/17049.html 感谢左耳朵耗子的精彩文章. 前面,我们介绍了Linux Namespace,但是Namespace解决的问题主要 ...

  3. mysql 理解 int(11)

    1.这里的int(11) 与int的大小和存储字节,没有一毛钱关系,int的存储字节是4个字节,最大值为 65536*65536 = 40多亿,对于有符号的int,是20多亿.2.那么这里的(11) ...

  4. Java注解Annotation学习

    学习注解Annotation的原理,这篇讲的不错:http://blog.csdn.net/lylwo317/article/details/52163304 先自定义一个运行时注解 @Target( ...

  5. Java编程思想笔记

    打好java基础 后续会增加相应基础笔试题 目录如下 1 对象导论2 一切都是对象3 操作符4 控制执行流程5 初始化与清理6 访问控制权限7 复用类8 多态9 接口10 内部类11 持有对象12 通 ...

  6. java读取properties配置文件的方法

    app.properties mail.smtp.host=smtp.163.com mail.transport.protocol=smtp import java.io.InputStream; ...

  7. Jni碰到的一个异常

    Java与C++都有String对象,而c没有,只有char类型,所以在向C传入String类型的时候,如何处理需要注意一点 jstring Java_com_skymaster_hs_test4_M ...

  8. phalcon: (非官方)简单的多模块

    phalcon: [非官方]多模块 配合router使用 例如:我的模块叫做: home 入口文件增加引入: use Phalcon\Mvc\Router; 在自动引入前面增加,自动引入命名空间: / ...

  9. java 多线程3(线程安全)

    需求:模拟三个窗口同时买票. 问题1:static 修饰num,只创建一份在内存中,要不就会创建3份. 问题2:线程安全问题.(代码重1的红色字体) 出现的原因:存在两个或两个以上的线程对象,而且线程 ...

  10. dede如何新建一个ajax服务端输出文件

    <?phprequire_once(dirname(__FILE__)."/include/common.inc.php");AjaxHead();    $dsql-> ...