Spark Streaming 实现读取Kafka 生产数据
在kafka 目录下执行生产消息命令:
./kafka-console-producer --broker-list nodexx:9092 --topic 201609

在spark bin 目录下执行
./run-example streaming.JavaDirectKafkaWordCount nodexx:9092, nodexx:9092 201609


import java.util.HashMap;
import java.util.HashSet;
import java.util.Arrays;
import java.util.regex.Pattern; import scala.Tuple2; import com.google.common.collect.Lists;
import kafka.serializer.StringDecoder; import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.*;
import org.apache.spark.streaming.api.java.*;
import org.apache.spark.streaming.kafka.KafkaUtils;
import org.apache.spark.streaming.Durations; /**
* Consumes messages from one or more topics in Kafka and does wordcount.
* Usage: JavaDirectKafkaWordCount <brokers> <topics>
* <brokers> is a list of one or more Kafka brokers
* <topics> is a list of one or more kafka topics to consume from
*
* Example:
* $ bin/run-example streaming.JavaDirectKafkaWordCount broker1-host:port,broker2-host:port topic1,topic2
*/ public final class JavaDirectKafkaWordCount {
private static final Pattern SPACE = Pattern.compile(" "); public static void main(String[] args) {
if (args.length < 2) {
System.err.println("Usage: JavaDirectKafkaWordCount <brokers> <topics>\n" +
" <brokers> is a list of one or more Kafka brokers\n" +
" <topics> is a list of one or more kafka topics to consume from\n\n");
System.exit(1);
} StreamingExamples.setStreamingLogLevels(); String brokers = args[0];
String topics = args[1]; // Create context with a 2 seconds batch interval
SparkConf sparkConf = new SparkConf().setAppName("JavaDirectKafkaWordCount");
JavaStreamingContext jssc;
jssc = new (sparkConf, Durations.seconds(2)); HashSet<String> topicsSet = new HashSet<String>(Arrays.asList(topics.split(",")));
HashMap<String, String> kafkaParams = new HashMap<String, String>();
kafkaParams.put("metadata.broker.list", brokers); // Create direct kafka stream with brokers and topics
JavaPairInputDStream<String, String> messages = KafkaUtils.createDirectStream(
jssc,
String.class,
String.class,
StringDecoder.class,
StringDecoder.class,
kafkaParams,
topicsSet
); // Get the lines, split them into words, count the words and print
JavaDStream<String> lines = messages.map(new Function<Tuple2<String, String>, String>() {
@Override
public String call(Tuple2<String, String> tuple2) {
return tuple2._2();
}
});
JavaDStream<String> words = lines.flatMap(new FlatMapFunction<String, String>() {
@Override
public Iterable<String> call(String x) {
return Lists.newArrayList(SPACE.split(x));
}
});
JavaPairDStream<String, Integer> wordCounts = words.mapToPair(
new PairFunction<String, String, Integer>() {
@Override
public Tuple2<String, Integer> call(String s) {
return new Tuple2<String, Integer>(s, 1);
}
}).reduceByKey(
new Function2<Integer, Integer, Integer>() {
@Override
public Integer call(Integer i1, Integer i2) {
return i1 + i2;
}
});
wordCounts.print(); // Start the computation
jssc.start();
jssc.awaitTermination();
}
}
Spark Streaming 实现读取Kafka 生产数据的更多相关文章
- spark streaming中维护kafka偏移量到外部介质
spark streaming中维护kafka偏移量到外部介质 以kafka偏移量维护到redis为例. redis存储格式 使用的数据结构为string,其中key为topic:partition, ...
- Spark Streaming的接收KAFKA的数据
https://github.com/lw-lin/CoolplaySpark/blob/master/Spark%20Streaming%20%E6%BA%90%E7%A0%81%E8%A7%A3% ...
- Spark Streaming整合logstash + Kafka wordCount
1.安装logstash,直接解压即可 测试logstash是否可以正常运行 bin/logstash -e 'input { stdin { } } output { stdout {codec = ...
- Flink与Spark Streaming在与kafka结合的区别!
本文主要是想聊聊flink与kafka结合.当然,单纯的介绍flink与kafka的结合呢,比较单调,也没有可对比性,所以的准备顺便帮大家简单回顾一下Spark Streaming与kafka的结合. ...
- Spark Streaming整合Flume + Kafka wordCount
flume配置文件 flume_to_kafka.conf a1.sources = r1 a1.sinks = k1 a1.channels = c1 a1.sources.r1.type = sp ...
- Exactly-once Spark Streaming from Apache Kafka
这篇文章我已经看过两遍了.收获颇多,抽个时间翻译下,先贴个原文链接吧.也给自己留个任务 http://blog.cloudera.com/blog/2015/03/exactly-once-spark ...
- Spark Streaming消费Kafka Direct方式数据零丢失实现
使用场景 Spark Streaming实时消费kafka数据的时候,程序停止或者Kafka节点挂掉会导致数据丢失,Spark Streaming也没有设置CheckPoint(据说比较鸡肋,虽然可以 ...
- Spark Streaming + Kafka整合(Kafka broker版本0.8.2.1+)
这篇博客是基于Spark Streaming整合Kafka-0.8.2.1官方文档. 本文主要讲解了Spark Streaming如何从Kafka接收数据.Spark Streaming从Kafka接 ...
- 【Spark】Spark Streaming + Kafka direct 的 offset 存入Zookeeper并重用
Spark Streaming + Kafka direct 的 offset 存入Zookeeper并重用 streaming offset设置_百度搜索 将 Spark Streaming + K ...
随机推荐
- 数据库学习之ADO.NET五大对象
1 [ADO.NET] ado.net 是一种数据访问技术,使得应用程序能够连接到数据存储,并以各种方式操作存储在里面的数据. 2 [ADO.NET五大常用对象] Connec ...
- 不同SQL Server数据库之间的跨数据库查询
--不同SQL Server数据库之间的跨数据库查询 EXEC sp_addlinkedserver @server=N'OldDatabase', --自己定义别名 @srvproduct=N'', ...
- 生成html文件
第一步:建立一个MbPage.html页面 第二步:后台生成 public void ProcessRequest(HttpContext context) { c ...
- AppSettings
1.winform中读写配置文件appSettings 一节中的配置. #region 读写配置文件 /// <summary> /// 修改配置文件中某项的值 /// </summ ...
- ORA-01034/ORA-27101解决
sql> shutdown immediate 后就无法进行任何操作了,重新通过sqlplus不能登录,提示ORA-01034和ORA-27101错误 解决,以下全部在cmd中: 1. 启动or ...
- $ cd `dirname $0` 和PWD%/* shell变量的一些特殊用法
在命令行状态下单纯执行 $ cd `dirname $0` 是毫无意义的.因为他返回当前路径的".". $0:当前Shell程序的文件名dirname $0,获取当前Shell程序 ...
- SSM三大框架整合详细教程
使用SSM(Spring.SpringMVC和Mybatis)已经有三个多月了,项目在技术上已经没有什么难点了,基于现有的技术就可以实现想要的功能,当然肯定有很多可以改进的地方.之前没有记录SSM整合 ...
- ckeditor的使用与验证
1.使id=id的textArea变为富文本编辑框 function inittextarea(id) { CKEDITOR.replace(id,{ width:'600px', ...
- 获取mssqlserver与access数据库插入的当前行的id
//mssqlserver public static int GetInsertId(string sql) { try { SqlCommand cmd = new SqlCommand(); u ...
- Android Bitmap圆角
代码如下: public Bitmap transform(Bitmap source) { int size = Math.min(source.getWidth(), source.getHeig ...