Storm(4) - Distributed Remote Procedure Calls
Using DRPC to complete the required processing
1. Create a new branch of your source using the following command
git branch chap4
git checkout chap4
2. Create a new class named SplitAndProjectToFields, which extends from BaseFunction
public class SplitAndProjectToFields extends BaseFunction {
public void execute(TridentTuple tuple, TridentCollector collector) {
Values vals = new Values();
for(String word: tuple.getString(0).split(" ")) {
if(word.length() > 0) {
vals.add(word);
}
}
collector.emit(vals);
}
}
3. Once this is complete, edit the TermTopology class, and add the following method
public class TermTopology {
private static void addTFIDFQueryStream(TridentState tfState, TridentState dfState, TridentState dState, TridentTopology topology, LocalDRPC drpc) {
topology.newDRPCStream("ftidfQuery", drpc)
.each(new Fields("args"), new SplitAndProjectToFields(), new Fields("documentId", "term"))
.each(new Fields(), new StaticSourceFunction(), new Fields("source"))
.stateQuery(tfState, new Fields("documentId", "term"), new MapGet(), new Fields("tf"))
.stateQuery(dfState, new Fields("term"), new MapGet(), new Fields("df"))
.stateQuery(tfState, new Fields("source"), new MapGet(), new Fields("d"))
.each(new Fields("term", "documentId", "tf", "d", "df"), new TfidfExpression(), new Fields("tfidf"))
.each(new Fields("tfidf"), new FilterNull())
.project(new Fields("documentId", "term", "tfidf"));
}
}
4. Then update your buildTopology method by removing the final stream definition and adding the DRPC creation:
public static TridentTopology buildTopology(ITridentSpout spout, LocalDRPC drpc) {
TridentTopology topology = new TridentTopology();
Stream documentStream = getUrlStream(topology, spout)
.each(new Fields("url"), new DocumentFetchFunction(mimeTypes), new Fields("document", "documentId", "source"));
Stream termStream = documentStream.parallelismHint(20)
.each(new Fields("document"), new DocumentTokenizer(), new Fields("dirtyTerm"))
.each(new Fields("dirtyTerm"), new TermFilter(), new Fields("term"))
.project(new Fields("term","documentId","source"));
TridentState dfState = termStream.groupBy(new Fields("term"))
.persistentAggregate(getStateFactory("df"), new Count(), new Fields("df"));
TridentState dState = documentStream.groupBy(new Fields("source"))
.persistentAggregate(getStateFactory("d"), new Count(), new Fields("d"));
TridentState tfState = termStream.groupBy(new Fields("documentId", "term"))
.persistentAggregate(getStateFactory("tf"), new Count(), new Fields("tf"));
addTFIDFQueryStream(tfState, dfState, dState, topology, drpc);
return topology;
}
Implementing a rolling window topology
1. In order to implement the rolling time window, we will need to use a fork of this state implementation. Start by cloning, building, and installing it into our local Maven repo
git clone https://github.com/quintona/trident-cassandra.git
cd trident-cassandra
lein install
2. Then update your project dependencies to include this new version by changing the following code line:
[trident-cassandra/trident-cassandra "0.0.1-wip1"]
To the following line:
[trident-cassandra/trident-cassandra "0.0.1-bucketwip1"]
Simulating time in integration testing
3. Ensure that you have updated your project dependencies in Eclipse using the process described earlier and then create a new class called TimeBasedRowStrategy
public class TimeBasedRowStrategy implements RowKeyStrategy, Serializable {
private static final long serialVersionUID = 6981400531506165681L;
@Override
public <T> String getRowKey(List<List<Object>> keys, Options<T> options) {
return options.rowKey + StateUtils.formatHour(new Date());
}
}
4. And implement the StateUtils.formatHour static method
public static String formatHour(Date date){
return new SimpleDateFormat("yyyyMMddHH").format(date);
}
5. Finally, replace the getStateFactory method in TermTopology with the following
private static StateFactory getStateFactory(String rowKey) {
CassandraBucketState.BucketOptions options = new CassandraBucketState.BucketOptions();
options.keyspace = "trident_test";
options.columnFamily = "tfid";
options.rowKey = rowKey;
options.keyStrategy = new TimeBasedRowStrategy();
return CassandraBucketState.nonTransactional("localhost", options);
}
Storm(4) - Distributed Remote Procedure Calls的更多相关文章
- 分布式计算 要不要把写日志独立成一个Server Remote Procedure Call Protocol
w https://en.wikipedia.org/wiki/Remote_procedure_call In distributed computing a remote procedure ca ...
- Remote procedure call (RPC)
Remote procedure call (RPC) (using the .NET client) Prerequisites This tutorial assumes RabbitMQ isi ...
- win32多线程-异步过程调用(asynchronous Procedure Calls, APCs)
使用overlapped I/O并搭配event对象-----win32多线程-异步(asynchronous) I/O事例,会产生两个基础性问题. 第一个问题是,使用WaitForMultipleO ...
- RPC(Remote Procedure Call Protocol)——远程过程调用协议
RPC(Remote Procedure Call Protocol)--远程过程调用协议,它是一种通过网络从远程计算机程序上请求服务,而不需要了解底层网络技术的协议.RPC协议假定某些传输协议的存在 ...
- RPC(Remote Procedure Call Protocol)远程过程调用协议
RPC(Remote Procedure Call Protocol)——远程过程调用协议,它是一种通过网络从远程计算机程序上请求服务,而不需要了解底层网络技术的协议.RPC协议假定某些传输协议的存在 ...
- RPC远程过程调用(Remote Procedure Call)
RPC,就是Remote Procedure Call,远程过程调用 远程过程调用,自然是相对于本地过程调用 本地过程调用,就好比你现在在家里,你要想洗碗,那你直接把碗放进洗碗机,打开洗碗机开关就可以 ...
- Jmeter Distributed (Remote) Testing: Master Slave Configuration
What is Distributed Testing? DistributedTestingis a kind of testing which use multiple systems to pe ...
- RPC(Remote Procedure Call Protocol)——远程过程调用协议 学习总结
首先了解什么叫RPC,为什么要RPC,RPC是指远程过程调用,也就是说两台服务器A,B,一个应用部署在A服务器上,想要调用B服务器上应用提供的函数/方法,由于不在一个内存空间,不能直接调用,需 ...
- RPC(Remote Procedure Call Protocol)
远程过程调用协议: 1.调用客户端句柄:执行传送参数 2.调用本地系统内核发送网络消息 3.消息传送到远程主机 4.服务器句柄得到消息并取得参数 5.执行远程过程 6.执行的过程将结果返回服务器句柄 ...
随机推荐
- jQuery 预习视频
1.事件补充 <input type="button" onclick="CheckAll('#edit_mode','#tb1');" value=&q ...
- 关于gitlab的一个问题解决
这两天折腾一个关于gitlab的问题,搞得比较焦头烂额.不过经过折腾,最终还是成功了. 当面对着一个问题,并且看到还没被解决的时候,感觉很不舒服: 努力折腾之后,解决之后,也会身体轻松.或许工程师天生 ...
- git学习笔记04-将本地仓库添加到GitHub远程仓库-git比svn先进的地方
第1步:创建SSH Key.在用户主目录下,看看有没有.ssh目录,如果有,再看看这个目录下有没有id_rsa和id_rsa.pub这两个文件,如果已经有了,可直接跳到下一步. 如果没有,打开Shel ...
- Android_Nexus4_屏幕截图
1. 一般都是 音量-键 + 电源键,同时按一秒以上 2. 3.
- 微信开发时遇到的UrlConnection乱码的问题
昨天做一个微信的模板消息推送的功能,功能倒是很快写完了,我本地测试微信收到的推送消息是正常的,但是一部署到服务器后微信收到的推送消息就变成乱码了. 为了找到原因,做了很多测试,查了一下午百度,最后得出 ...
- PLSQL Developer连接远程Oracle方法(非安装客户端)
Oracle比较麻烦,通常需要安装oracle的客户端才能实现.通过instantclient可以比较简单的连接远程的Oracle. 1.新建目录D:\Oracle_Cleint用于存放相关文件,新建 ...
- 转:关于C++14:你需要知道的新特性
关于C++14:你需要知道的新特性 遇见C++ Lambda C++14 lambda 教程 C++11 lambda表达式 C++标准库:使用 std::for_each std::generate ...
- Oracle在Linux下使用异步IO(aio)配置
1.首先用root用户安装以下必要的rpm包 # rpm -Uvh libaio-0.3.106-3.2.x86_64.rpm# rpm -Uvh libaio-devel-0.3.106-3.2.x ...
- Mvc4_MvcPager 概述
MvcPager分页控件是在ASP.NET MVC Web应用程序中实现分页功能的一系列扩展方法,该分页控件的最初的实现方法借鉴了网上流行的部分源代码, 尤其是ScottGu的PagedList< ...
- C#_接口
.Net提供了接口,这个不同于Class或者Struct的类型定义.接口有些情况,看似和抽象类一样,因此有些人认为在.Net可以完全用接口来替换抽象类.其实不然,接口和抽象类各有长处和缺陷,因此往往在 ...