Spark Streaming 实现读取Kafka 生产数据
在kafka 目录下执行生产消息命令:
./kafka-console-producer --broker-list nodexx:9092 --topic 201609

在spark bin 目录下执行
./run-example streaming.JavaDirectKafkaWordCount nodexx:9092, nodexx:9092 201609


import java.util.HashMap;
import java.util.HashSet;
import java.util.Arrays;
import java.util.regex.Pattern; import scala.Tuple2; import com.google.common.collect.Lists;
import kafka.serializer.StringDecoder; import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.*;
import org.apache.spark.streaming.api.java.*;
import org.apache.spark.streaming.kafka.KafkaUtils;
import org.apache.spark.streaming.Durations; /**
* Consumes messages from one or more topics in Kafka and does wordcount.
* Usage: JavaDirectKafkaWordCount <brokers> <topics>
* <brokers> is a list of one or more Kafka brokers
* <topics> is a list of one or more kafka topics to consume from
*
* Example:
* $ bin/run-example streaming.JavaDirectKafkaWordCount broker1-host:port,broker2-host:port topic1,topic2
*/ public final class JavaDirectKafkaWordCount {
private static final Pattern SPACE = Pattern.compile(" "); public static void main(String[] args) {
if (args.length < 2) {
System.err.println("Usage: JavaDirectKafkaWordCount <brokers> <topics>\n" +
" <brokers> is a list of one or more Kafka brokers\n" +
" <topics> is a list of one or more kafka topics to consume from\n\n");
System.exit(1);
} StreamingExamples.setStreamingLogLevels(); String brokers = args[0];
String topics = args[1]; // Create context with a 2 seconds batch interval
SparkConf sparkConf = new SparkConf().setAppName("JavaDirectKafkaWordCount");
JavaStreamingContext jssc;
jssc = new (sparkConf, Durations.seconds(2)); HashSet<String> topicsSet = new HashSet<String>(Arrays.asList(topics.split(",")));
HashMap<String, String> kafkaParams = new HashMap<String, String>();
kafkaParams.put("metadata.broker.list", brokers); // Create direct kafka stream with brokers and topics
JavaPairInputDStream<String, String> messages = KafkaUtils.createDirectStream(
jssc,
String.class,
String.class,
StringDecoder.class,
StringDecoder.class,
kafkaParams,
topicsSet
); // Get the lines, split them into words, count the words and print
JavaDStream<String> lines = messages.map(new Function<Tuple2<String, String>, String>() {
@Override
public String call(Tuple2<String, String> tuple2) {
return tuple2._2();
}
});
JavaDStream<String> words = lines.flatMap(new FlatMapFunction<String, String>() {
@Override
public Iterable<String> call(String x) {
return Lists.newArrayList(SPACE.split(x));
}
});
JavaPairDStream<String, Integer> wordCounts = words.mapToPair(
new PairFunction<String, String, Integer>() {
@Override
public Tuple2<String, Integer> call(String s) {
return new Tuple2<String, Integer>(s, 1);
}
}).reduceByKey(
new Function2<Integer, Integer, Integer>() {
@Override
public Integer call(Integer i1, Integer i2) {
return i1 + i2;
}
});
wordCounts.print(); // Start the computation
jssc.start();
jssc.awaitTermination();
}
}
Spark Streaming 实现读取Kafka 生产数据的更多相关文章
- spark streaming中维护kafka偏移量到外部介质
spark streaming中维护kafka偏移量到外部介质 以kafka偏移量维护到redis为例. redis存储格式 使用的数据结构为string,其中key为topic:partition, ...
- Spark Streaming的接收KAFKA的数据
https://github.com/lw-lin/CoolplaySpark/blob/master/Spark%20Streaming%20%E6%BA%90%E7%A0%81%E8%A7%A3% ...
- Spark Streaming整合logstash + Kafka wordCount
1.安装logstash,直接解压即可 测试logstash是否可以正常运行 bin/logstash -e 'input { stdin { } } output { stdout {codec = ...
- Flink与Spark Streaming在与kafka结合的区别!
本文主要是想聊聊flink与kafka结合.当然,单纯的介绍flink与kafka的结合呢,比较单调,也没有可对比性,所以的准备顺便帮大家简单回顾一下Spark Streaming与kafka的结合. ...
- Spark Streaming整合Flume + Kafka wordCount
flume配置文件 flume_to_kafka.conf a1.sources = r1 a1.sinks = k1 a1.channels = c1 a1.sources.r1.type = sp ...
- Exactly-once Spark Streaming from Apache Kafka
这篇文章我已经看过两遍了.收获颇多,抽个时间翻译下,先贴个原文链接吧.也给自己留个任务 http://blog.cloudera.com/blog/2015/03/exactly-once-spark ...
- Spark Streaming消费Kafka Direct方式数据零丢失实现
使用场景 Spark Streaming实时消费kafka数据的时候,程序停止或者Kafka节点挂掉会导致数据丢失,Spark Streaming也没有设置CheckPoint(据说比较鸡肋,虽然可以 ...
- Spark Streaming + Kafka整合(Kafka broker版本0.8.2.1+)
这篇博客是基于Spark Streaming整合Kafka-0.8.2.1官方文档. 本文主要讲解了Spark Streaming如何从Kafka接收数据.Spark Streaming从Kafka接 ...
- 【Spark】Spark Streaming + Kafka direct 的 offset 存入Zookeeper并重用
Spark Streaming + Kafka direct 的 offset 存入Zookeeper并重用 streaming offset设置_百度搜索 将 Spark Streaming + K ...
随机推荐
- django学习笔记二:一个项目多个App项目搭建
django充许在一个项目中存在多个app,如一个大门户网站中可以包含论坛,新闻等内容,其中每一个模块称之为一个App,也可以理解为一个个独立的小型项目最终集成在一个门户网站中最终呈现给用户 本次测试 ...
- 出发 Let's Go
今天是中秋佳节,而恰好我这天过生日,晚上睡觉前又恰好听到温岚唱的祝我生日快乐,心里挺高兴的. 最近,由于公司需要,可能要学习Python和Tribon了,全是未知的,一点不了解的东西,也忽然想起了在这 ...
- ORA-01034/ORA-27101解决
sql> shutdown immediate 后就无法进行任何操作了,重新通过sqlplus不能登录,提示ORA-01034和ORA-27101错误 解决,以下全部在cmd中: 1. 启动or ...
- jdbc之二:DAO模式
详细代码请参见 https://code.csdn.net/jediael_lu/daopattern 1.创建Dao接口. package com.ljh.jasonnews.server.dao; ...
- AngularJS中serivce,factory,provider的区别
一.service引导 刚开始学习Angular的时候,经常被误解和被初学者问到的组件是 service(), factory(), 和 provide()这几个方法之间的差别.This is whe ...
- cocos2dx工程中接入支付宝sdk
1. 首先去支付宝官网下载开发者文档 2. 然后按着开发者文档将支付宝的sdk导入到你的工程中,并关联到工程中,步骤入下图: (1)将从支付宝官方网站获得的支付宝的sdk的jar包拷贝到工程中的lib ...
- http status 400,http 400,400 错误
转载:http://blog.csdn.net/xu_zh_h/article/details/2294233 4 请求失败4xx 4xx应答定义了特定服务器响应的请求失败的情况.客户端不应当在不更改 ...
- 1041. Be Unique (20)
Being unique is so important to people on Mars that even their lottery is designed in a unique way. ...
- 李维作答 《insideVCL》——李维实在很勤奋,而且勇于突破,从不以旧的内容充数
(编者按)<Inside VCL(VCL核心架构剖析)>一书出版以来,众多热心读者给李维先生.博文视点公司.CSDN写来信件,有更多朋友在各个论坛上发表关于该书的言论.读者们不但盛赞该书, ...
- 设置edittext的hint位置
<EditText android:id="@+id/edt_content" android:layout_width="fill_parent" an ...