大数据处理框架之Strom：kafka storm 整合

storm 使用kafka做数据源，还可以使用文件、redis、jdbc、hive、HDFS、hbase、netty做数据源。

新建一个maven 工程：

pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

  <modelVersion>4.0.0</modelVersion>

  <groupId>storm06</groupId>

  <artifactId>storm06</artifactId>

  <version>0.0.1-SNAPSHOT</version>

  <packaging>jar</packaging>

  <name>storm07</name>

  <url>http://maven.apache.org</url>

  <repositories>

        <!-- Repository where we can found the storm dependencies  -->

        <repository>

            <id>clojars.org</id>

            <url>http://clojars.org/repo</url>

        </repository>

  </repositories>

  <properties>

    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

  </properties>

  <dependencies>

    <dependency>

        <groupId>org.apache.storm</groupId>

        <artifactId>storm-core</artifactId>

        <version>0.9.2-incubating</version>

    </dependency>

    <dependency>

      <groupId>junit</groupId>

      <artifactId>junit</artifactId>

      <version>4.11</version>

      <scope>test</scope>

    </dependency>

     <dependency>

        <groupId>org.apache.kafka</groupId>

        <artifactId>kafka_2.10</artifactId>

        <version>0.9.0.1</version>

        <exclusions>

            <exclusion>

                <groupId>com.sun.jdmk</groupId>

                <artifactId>jmxtools</artifactId>

            </exclusion>

            <exclusion>

                <groupId>com.sun.jmx</groupId>

                <artifactId>jmxri</artifactId>

            </exclusion>

        </exclusions>

    </dependency>

    <dependency>

        <groupId>org.apache.logging.log4j</groupId>

        <artifactId>log4j-slf4j-impl</artifactId>

        <version>2.0-beta9</version>

    </dependency>

    <dependency>

        <groupId>org.apache.logging.log4j</groupId>

        <artifactId>log4j-1.2-api</artifactId>

        <version>2.0-beta9</version>

    </dependency>

    <dependency>

        <groupId>org.slf4j</groupId>

        <artifactId>log4j-over-slf4j</artifactId>

        <version>1.7.10</version>

    </dependency>

    <dependency>

        <groupId>org.slf4j</groupId>

        <artifactId>slf4j-log4j12</artifactId>

        <version>1.7.10</version>

    </dependency>

    <!-- storm & kafka sqout -->

    <dependency>

        <groupId>net.wurstmeister.storm</groupId>

        <artifactId>storm-kafka-0.8-plus</artifactId>

        <version>0.4.0</version>

    </dependency>

    <dependency>

        <groupId>commons-collections</groupId>

        <artifactId>commons-collections</artifactId>

        <version>3.2.1</version>

    </dependency>

    <dependency>

        <groupId>com.google.guava</groupId>

        <artifactId>guava</artifactId>

        <version>15.0</version>

    </dependency>

  </dependencies>

    <build>

    <finalName>storm06</finalName>

   <plugins>

        <plugin>

            <groupId>org.apache.maven.plugins</groupId>

            <artifactId>maven-war-plugin</artifactId>

            <version>2.4</version>

        </plugin>

        <plugin>

            <groupId>org.apache.maven.plugins</groupId>

            <artifactId>maven-compiler-plugin</artifactId>

            <version>2.1</version>

            <configuration>

                <source>1.7</source>

                <target>1.7</target>

            </configuration>

        </plugin>

        <!-- 单元测试 -->

        <plugin>

            <groupId>org.apache.maven.plugins</groupId>

            <artifactId>maven-surefire-plugin</artifactId>

            <configuration>

                <skip>true</skip>

                <includes>

                    <include>**/*Test*.java</include>

                </includes>

            </configuration>

        </plugin>

        <plugin>

            <groupId>org.apache.maven.plugins</groupId>

            <artifactId>maven-source-plugin</artifactId>

            <version>2.1.2</version>

            <executions>

                <!-- 绑定到特定的生命周期之后，运行maven-source-pluin 运行目标为jar-no-fork -->

                <execution>

                    <phase>package</phase>

                    <goals>

                        <goal>jar-no-fork</goal>

                    </goals>

                </execution>

            </executions>

        </plugin>

    </plugins>

  </build>

</project>

KafkaTopology

package bhz.storm.kafka.example;

import storm.kafka.KafkaSpout;

import storm.kafka.SpoutConfig;

import storm.kafka.StringScheme;

import storm.kafka.ZkHosts;

import backtype.storm.Config;

import backtype.storm.LocalCluster;

import backtype.storm.generated.AlreadyAliveException;

import backtype.storm.generated.InvalidTopologyException;

import backtype.storm.spout.SchemeAsMultiScheme;

import backtype.storm.topology.TopologyBuilder;

public class KafkaTopology {

    public static void main(String[] args) throws

        AlreadyAliveException, InvalidTopologyException {

        // zookeeper hosts for the Kafka cluster

        ZkHosts zkHosts = new ZkHosts("134.32.123.101:2181,134.32.123.102:2181,134.32.123.103:2181");

        // Create the KafkaSpout configuartion

        // Second argument is the topic name

        // Third argument is the zookeeper root for Kafka

        // Fourth argument is consumer group id

        SpoutConfig kafkaConfig = new SpoutConfig(zkHosts,"words_topic", "", "id7");

        // Specify that the kafka messages are String

        kafkaConfig.scheme = new SchemeAsMultiScheme(new StringScheme());

        // We want to consume all the first messages in the topic everytime

        // we run the topology to help in debugging. In production, this

        // property should be false

        kafkaConfig.forceFromStart = true;

        // Now we create the topology

        TopologyBuilder builder = new TopologyBuilder();

        // set the kafka spout class

        builder.setSpout("KafkaSpout", new KafkaSpout(kafkaConfig), 1);

        // configure the bolts

        builder.setBolt("SentenceBolt", new SentenceBolt(), 1).globalGrouping("KafkaSpout");

        builder.setBolt("PrinterBolt", new PrinterBolt(), 1).globalGrouping("SentenceBolt");

        // create an instance of LocalCluster class for executing topology in local mode.

        LocalCluster cluster = new LocalCluster();

        Config conf = new Config();

        // Submit topology for execution

        cluster.submitTopology("KafkaToplogy", conf, builder.createTopology());

        try {

            // Wait for some time before exiting

            System.out.println("Waiting to consume from kafka");

            Thread.sleep(10000);

        } catch (Exception exception) {

            System.out.println("Thread interrupted exception : " + exception);

        }

        // kill the KafkaTopology

        cluster.killTopology("KafkaToplogy");

        // shut down the storm test cluster

        cluster.shutdown();

    }

}

package bhz.storm.kafka.example;

import java.util.ArrayList;

import java.util.List;

import org.apache.commons.lang.StringUtils;

import backtype.storm.topology.BasicOutputCollector;

import backtype.storm.topology.OutputFieldsDeclarer;

import backtype.storm.topology.base.BaseBasicBolt;

import backtype.storm.tuple.Fields;

import backtype.storm.tuple.Tuple;

import com.google.common.collect.ImmutableList;

public class SentenceBolt extends BaseBasicBolt {

    // list used for aggregating the words

    private List<String> words = new ArrayList<String>();

    public void execute(Tuple input, BasicOutputCollector collector) {

        // Get the word from the tuple

        String word = input.getString(0);

        if(StringUtils.isBlank(word)){

            // ignore blank lines

            return;

        }

        System.out.println("Received Word:" + word);

        // add word to current list of words

        words.add(word);

        if (word.endsWith(".")) {

            // word ends with '.' which means this is the end of

            // the sentence publishes a sentence tuple

            collector.emit(ImmutableList.of(

                    (Object) StringUtils.join(words, ' ')));

            // and reset the words list.

            words.clear();

        }

    }

    public void declareOutputFields(OutputFieldsDeclarer declarer) {

        // here we declare we will be emitting tuples with

        // a single field called "sentence"

        declarer.declare(new Fields("sentence"));

    }

}

package bhz.storm.kafka.example;

import backtype.storm.topology.BasicOutputCollector;

import backtype.storm.topology.OutputFieldsDeclarer;

import backtype.storm.topology.base.BaseBasicBolt;

import backtype.storm.tuple.Tuple;

public class PrinterBolt extends BaseBasicBolt {

    public void execute(Tuple input, BasicOutputCollector collector) {

        // get the sentence from the tuple and print it

        String sentence = input.getString(0);

        System.out.println("Received Sentence:" + sentence);

    }

    public void declareOutputFields(OutputFieldsDeclarer declarer) {

        // we don't emit anything

    }

}

大数据处理框架之Strom：kafka storm 整合的更多相关文章

大数据处理框架之Strom: Storm----helloword
大数据处理框架之Strom: Storm----helloword Storm按照设计好的拓扑流程运转,所以写代码之前要先设计好拓扑图.这里写一个简单的拓扑: 第一步:创建一个拓扑类含有main方法的 ...
大数据处理框架之Strom：认识storm
Storm是分布式实时计算系统,用于数据的实时分析.持续计算,分布式RPC等. (备注:5种常见的大数据处理框架:· 仅批处理框架:Apache Hadoop:· 仅流处理框架:Apache Stor ...
大数据处理框架之Strom：Flume+Kafka+Storm整合
环境虚拟机:VMware 10 Linux版本:CentOS-6.5-x86_64 客户端:Xshell4 FTP:Xftp4 jdk1.8 storm-0.9 apache-flume-1.6.0 ...
大数据处理框架之Strom：redis storm 整合
storm 引入redis ,主要是使用redis缓存库暂存storm的计算结果,然后redis供其他应用调用取出数据. 新建maven工程 pom.xml <project xmlns=&qu ...
大数据处理框架之Strom: Storm拓扑的并行机制和通信机制
一.并行机制 Storm的并行度 ,通过提高并行度可以提高storm程序的计算能力. 1.组件关系:Supervisor node物理节点,可以运行1到多个worker,不能超过supervisor. ...
大数据处理框架之Strom:Storm集群环境搭建
搭建环境 Red Hat Enterprise Linux Server release 7.3 (Maipo) zookeeper-3.4.11 jdk1.7.0_80 Pyth ...
大数据处理框架之Strom：DRPC
环境虚拟机:VMware 10 Linux版本:CentOS-6.5-x86_64 客户端:Xshell4 FTP:Xftp4 jdk1.8 storm-0.9 一.DRPC DRPC:Distri ...
大数据处理框架之Strom:容错机制
1.集群节点宕机Nimbus服务器单点故障,大部分时间是闲置的,在supervisor挂掉时会影响,所以宕机影响不大,重启即可非Nimbus服务器故障时,该节点上所有Task任务都会超时,Nimb ...
大数据处理框架之Strom：事务
环境虚拟机:VMware 10 Linux版本:CentOS-6.5-x86_64 客户端:Xshell4 FTP:Xftp4 jdk1.8 storm-0.9 apache-flume-1.6.0 ...

随机推荐

Nand Flash 裸机程序
硬件平台 :JZ2440 实现功能:初始化 Nand Flash 和 sdram,并将代码从 Nand Flash 拷贝到 sdram. start.s --> 上电初始化 nand ...
转载：caffe中的Reshape层
http://blog.csdn.net/terrenceyuu/article/details/76228317 #作用:在不改变数据的情况下,改变输入的维度 layer { name: " ...
jquery重置表单
表单一般都有重置功能,在重置表单时需要将各个输入框中的值清空,如果输入框比较多,一个一个清空会比较麻烦,使用jquery的方法直接将表单中的所有输入框全部清空,首先给出一个form表单: <fo ...
对集合类的属性进行kvo观察
在进行容器对象操作时,先调用下面方法通过key或者keyPath获取集合对象,然后再对容器对象进行add或remove等操作时,就会触发KVO的消息通知了. - (NSMutableArray *)m ...
SQL Server--疑难杂症之坑爹的Windows故障转移群集（转）
估计是春节前最后一次写博客,也估计是本年值班最后一次踩雷,感叹下成也SQL SERVER,败也SQL SERVER. --======================================= ...
004-mac上安装以及Nginx 配置文件nginx.conf详解
1.mac上nginx安装安装brew:go-001-环境部署,IDEA插件安装nginx: brew search nginx brew install nginx 当然也可以编译安装安装完以 ...
char *直接赋值和strcpy之间什么区别
赋值是两个指针指向同一个位置. 而strcpy则是把内容拷贝了一份给另个一指针.
C 逗號表達式與返回值
逗號表達式的返回值是最後一個表達式的值 int z; z=(,); printf("%d\n",z);//6 int a; printf(*,a*,a+));//20 a=3*5, ...
IOT专用IOP平台
10.110.20.200(iot) root 789a? centos7 给展湾paul,iot-hub 10.110.20.199(iot) root ce ...
System.Web.UI.Page的页面基类
服务器端的page类所有我们编写的页面都继承自page类,可见page类是非常重要的,page类提供了哪些功能,直接决定了我们的页面类可以继承什么功能,或者说,直接决定了我们的页面类功能的强大与否! ...

大数据处理框架之Strom：kafka storm 整合

大数据处理框架之Strom：kafka storm 整合的更多相关文章

随机推荐

热门专题