大数据处理框架之Strom:kafka storm 整合
storm 使用kafka做数据源,还可以使用文件、redis、jdbc、hive、HDFS、hbase、netty做数据源。
新建一个maven 工程:
pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion> <groupId>storm06</groupId>
<artifactId>storm06</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging> <name>storm07</name>
<url>http://maven.apache.org</url>
<repositories>
<!-- Repository where we can found the storm dependencies -->
<repository>
<id>clojars.org</id>
<url>http://clojars.org/repo</url>
</repository>
</repositories>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>0.9.2-incubating</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.10</artifactId>
<version>0.9.0.1</version>
<exclusions>
<exclusion>
<groupId>com.sun.jdmk</groupId>
<artifactId>jmxtools</artifactId>
</exclusion>
<exclusion>
<groupId>com.sun.jmx</groupId>
<artifactId>jmxri</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
<version>2.0-beta9</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-1.2-api</artifactId>
<version>2.0-beta9</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
<version>1.7.10</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.10</version>
</dependency>
<!-- storm & kafka sqout -->
<dependency>
<groupId>net.wurstmeister.storm</groupId>
<artifactId>storm-kafka-0.8-plus</artifactId>
<version>0.4.0</version>
</dependency>
<dependency>
<groupId>commons-collections</groupId>
<artifactId>commons-collections</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>15.0</version>
</dependency>
</dependencies>
<build>
<finalName>storm06</finalName>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-war-plugin</artifactId>
<version>2.4</version>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.1</version>
<configuration>
<source>1.7</source>
<target>1.7</target>
</configuration>
</plugin>
<!-- 单元测试 -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<skip>true</skip>
<includes>
<include>**/*Test*.java</include>
</includes>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>2.1.2</version>
<executions>
<!-- 绑定到特定的生命周期之后,运行maven-source-pluin 运行目标为jar-no-fork -->
<execution>
<phase>package</phase>
<goals>
<goal>jar-no-fork</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
KafkaTopology
package bhz.storm.kafka.example; import storm.kafka.KafkaSpout;
import storm.kafka.SpoutConfig;
import storm.kafka.StringScheme;
import storm.kafka.ZkHosts;
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.generated.AlreadyAliveException;
import backtype.storm.generated.InvalidTopologyException;
import backtype.storm.spout.SchemeAsMultiScheme;
import backtype.storm.topology.TopologyBuilder; public class KafkaTopology {
public static void main(String[] args) throws
AlreadyAliveException, InvalidTopologyException {
// zookeeper hosts for the Kafka cluster
ZkHosts zkHosts = new ZkHosts("134.32.123.101:2181,134.32.123.102:2181,134.32.123.103:2181"); // Create the KafkaSpout configuartion
// Second argument is the topic name
// Third argument is the zookeeper root for Kafka
// Fourth argument is consumer group id
SpoutConfig kafkaConfig = new SpoutConfig(zkHosts,"words_topic", "", "id7"); // Specify that the kafka messages are String
kafkaConfig.scheme = new SchemeAsMultiScheme(new StringScheme()); // We want to consume all the first messages in the topic everytime
// we run the topology to help in debugging. In production, this
// property should be false
kafkaConfig.forceFromStart = true; // Now we create the topology
TopologyBuilder builder = new TopologyBuilder(); // set the kafka spout class
builder.setSpout("KafkaSpout", new KafkaSpout(kafkaConfig), 1); // configure the bolts
builder.setBolt("SentenceBolt", new SentenceBolt(), 1).globalGrouping("KafkaSpout");
builder.setBolt("PrinterBolt", new PrinterBolt(), 1).globalGrouping("SentenceBolt"); // create an instance of LocalCluster class for executing topology in local mode.
LocalCluster cluster = new LocalCluster();
Config conf = new Config(); // Submit topology for execution
cluster.submitTopology("KafkaToplogy", conf, builder.createTopology()); try {
// Wait for some time before exiting
System.out.println("Waiting to consume from kafka");
Thread.sleep(10000);
} catch (Exception exception) {
System.out.println("Thread interrupted exception : " + exception);
} // kill the KafkaTopology
cluster.killTopology("KafkaToplogy"); // shut down the storm test cluster
cluster.shutdown();
}
}
package bhz.storm.kafka.example; import java.util.ArrayList;
import java.util.List; import org.apache.commons.lang.StringUtils; import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple; import com.google.common.collect.ImmutableList; public class SentenceBolt extends BaseBasicBolt { // list used for aggregating the words
private List<String> words = new ArrayList<String>(); public void execute(Tuple input, BasicOutputCollector collector) {
// Get the word from the tuple
String word = input.getString(0); if(StringUtils.isBlank(word)){
// ignore blank lines
return;
} System.out.println("Received Word:" + word); // add word to current list of words
words.add(word); if (word.endsWith(".")) {
// word ends with '.' which means this is the end of
// the sentence publishes a sentence tuple
collector.emit(ImmutableList.of(
(Object) StringUtils.join(words, ' '))); // and reset the words list.
words.clear();
}
} public void declareOutputFields(OutputFieldsDeclarer declarer) {
// here we declare we will be emitting tuples with
// a single field called "sentence"
declarer.declare(new Fields("sentence"));
}
}
package bhz.storm.kafka.example; import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Tuple; public class PrinterBolt extends BaseBasicBolt { public void execute(Tuple input, BasicOutputCollector collector) {
// get the sentence from the tuple and print it
String sentence = input.getString(0);
System.out.println("Received Sentence:" + sentence);
} public void declareOutputFields(OutputFieldsDeclarer declarer) {
// we don't emit anything
}
}
大数据处理框架之Strom:kafka storm 整合的更多相关文章
- 大数据处理框架之Strom: Storm----helloword
大数据处理框架之Strom: Storm----helloword Storm按照设计好的拓扑流程运转,所以写代码之前要先设计好拓扑图.这里写一个简单的拓扑: 第一步:创建一个拓扑类含有main方法的 ...
- 大数据处理框架之Strom:认识storm
Storm是分布式实时计算系统,用于数据的实时分析.持续计算,分布式RPC等. (备注:5种常见的大数据处理框架:· 仅批处理框架:Apache Hadoop:· 仅流处理框架:Apache Stor ...
- 大数据处理框架之Strom:Flume+Kafka+Storm整合
环境 虚拟机:VMware 10 Linux版本:CentOS-6.5-x86_64 客户端:Xshell4 FTP:Xftp4 jdk1.8 storm-0.9 apache-flume-1.6.0 ...
- 大数据处理框架之Strom:redis storm 整合
storm 引入redis ,主要是使用redis缓存库暂存storm的计算结果,然后redis供其他应用调用取出数据. 新建maven工程 pom.xml <project xmlns=&qu ...
- 大数据处理框架之Strom: Storm拓扑的并行机制和通信机制
一.并行机制 Storm的并行度 ,通过提高并行度可以提高storm程序的计算能力. 1.组件关系:Supervisor node物理节点,可以运行1到多个worker,不能超过supervisor. ...
- 大数据处理框架之Strom:Storm集群环境搭建
搭建环境 Red Hat Enterprise Linux Server release 7.3 (Maipo) zookeeper-3.4.11 jdk1.7.0_80 Pyth ...
- 大数据处理框架之Strom:DRPC
环境 虚拟机:VMware 10 Linux版本:CentOS-6.5-x86_64 客户端:Xshell4 FTP:Xftp4 jdk1.8 storm-0.9 一.DRPC DRPC:Distri ...
- 大数据处理框架之Strom:容错机制
1.集群节点宕机Nimbus服务器 单点故障,大部分时间是闲置的,在supervisor挂掉时会影响,所以宕机影响不大,重启即可非Nimbus服务器 故障时,该节点上所有Task任务都会超时,Nimb ...
- 大数据处理框架之Strom:事务
环境 虚拟机:VMware 10 Linux版本:CentOS-6.5-x86_64 客户端:Xshell4 FTP:Xftp4 jdk1.8 storm-0.9 apache-flume-1.6.0 ...
随机推荐
- Python Mock的入门(转)
原文:https://segmentfault.com/a/1190000002965620 Mock是什么 Mock这个词在英语中有模拟的这个意思,因此我们可以猜测出这个库的主要功能是模拟一些东西. ...
- SQL查询优化:详解SQL Server非聚集索引(转载)
本文是转载,原文地址 http://tech.it168.com/a2011/1228/1295/000001295176.shtml 在SQL SERVER中,非聚集索引其实可以看作是一个含有聚集索 ...
- ext.js的mvc
1.Ext.js的mvc开发模式 在ext.js4.0以后引入mvc开发模式,将js分成model-view-controller三层,使得大量js代码变得更加易于维护和重用,这就是ext.jsmvc ...
- 网页制作中规范使用DIV+CSS命名规则,可以改善优化功效特别是团队合作时候可以提供合作制作效率,具体DIV CSS命名规则CSS命名大全内容如下:
页头:header 如:#header{属性:属性值;}或.header{属性:属性值;},也许你需要了解class与id区别及用法登录条:loginBar 标志:logo ...
- 海思QT4.8.0开发总结
1.QT4.8.0移植 2.QT4.8.0界面的透明度设置 发现设置空间透明度时候,QT显示的绿色的背景,没有透明!设置如下: 在程序起始的地方设置: hisi_init(); QWSServer:: ...
- InnoDB log file 设置多大合适?
简介: 数据库的东西,往往一个参数就牵涉N多知识点.所以简单的说一下.大家都知道innodb是支持事务的存储引擎.事务的四个特性ACID即原子性(atomicity),一致性(consistency) ...
- 20181226 PL/SQL批量生产处理语句
总有那么些时候,对某些特定条件的数据需要做更新或者修改,或者Truncate特定的一些表,而你又不想要每一个表都去写Truncate table 语句. 那么可以采用Select 的拼接语句 如下: ...
- Jmeter测试实践:文件上传接口
1.打开jmeter4.0,新建测试计划,添加线程组.根据实际情况配置线程属性. 2.添加HTTP请求. Basic部分修改如下: Advanced部分我做任何修改,完全保持默认.但是有人说Imple ...
- Delphi启动数据库连接属性对话框
有时候需要客户端进行服务器连接配置,自己写配置窗体,总不如直接使用系统提供的使用方便快捷 例子一: //此例子有个坏处不管用户点了确定还是取消,均返回值 procedure TForm1.Button ...
- Element-table-formatter格式化数据
1.formatter 用来格式化内容 对table的值进行处理.Function(row, column, cellValue, index){} 使用formatter需要注意以下几点: ...