"""
Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
Usage: kafka_wordcount.py <zk> <topic>
To run this on your local machine, you need to setup Kafka and create a producer first, see
http://kafka.apache.org/documentation.html#quickstart
and then run the example
`$ bin/spark-submit --jars \
external/kafka-assembly/target/scala-*/spark-streaming-kafka-assembly-*.jar \
examples/src/main/python/streaming/kafka_wordcount.py \
localhost:2181 test`
"""
from __future__ import print_function import sys from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: kafka_wordcount.py <zk> <topic>", file=sys.stderr)
exit(-1) sc = SparkContext(appName="PythonStreamingKafkaWordCount")
ssc = StreamingContext(sc, 1) zkQuorum, topic = sys.argv[1:]
kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", {topic: 1})
lines = kvs.map(lambda x: x[1])
counts = lines.flatMap(lambda line: line.split(" ")) \
.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a+b)
counts.pprint() ssc.start()
ssc.awaitTermination()

/spark-kafka/spark-2.1.1-bin-hadoop2.6# ./bin/spark-submit --jars ~/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar examples/src/main/python/streaming/kafka_wordcount.py localhost:2181 test

其中:spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar在 http://search.maven.org/#search%7Cga%7C1%7Cspark-streaming-kafka-0-8-assembly 下载

kafka 使用0.11版本:

1.3 Quick Start

This tutorial assumes you are starting fresh and have no existing Kafka or ZooKeeper data. Since Kafka console scripts are different for Unix-based and Windows platforms, on Windows platforms use bin\windows\ instead of bin/, and change the script extension to .bat.

Step 1: Download the code

Download the 0.11.0.0 release and un-tar it.

1
2
> tar -xzf kafka_2.11-0.11.0.0.tgz
> cd kafka_2.11-0.11.0.0

Step 2: Start the server

Kafka uses ZooKeeper so you need to first start a ZooKeeper server if you don't already have one. You can use the convenience script packaged with kafka to get a quick-and-dirty single-node ZooKeeper instance.

1
2
3
> bin/zookeeper-server-start.sh config/zookeeper.properties
[2013-04-22 15:01:37,495] INFO Reading configuration from: config/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
...

Now start the Kafka server:

1
2
3
4
> bin/kafka-server-start.sh config/server.properties
[2013-04-22 15:01:47,028] INFO Verifying properties (kafka.utils.VerifiableProperties)
[2013-04-22 15:01:47,051] INFO Property socket.send.buffer.bytes is overridden to 1048576 (kafka.utils.VerifiableProperties)
...

Step 3: Create a topic

Let's create a topic named "test" with a single partition and only one replica:

1
> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

We can now see that topic if we run the list topic command:

1
2
> bin/kafka-topics.sh --list --zookeeper localhost:2181
test

Alternatively, instead of manually creating topics you can also configure your brokers to auto-create topics when a non-existent topic is published to.

Step 4: Send some messages

Kafka comes with a command line client that will take input from a file or from standard input and send it out as messages to the Kafka cluster. By default, each line will be sent as a separate message.

Run the producer and then type a few messages into the console to send to the server.

1
2
3
> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
This is a message
This is another message

Step 5: Start a consumer

Kafka also has a command line consumer that will dump out messages to standard output.

1
2
3
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
This is a message
This is another message

kafka 0.11 spark 2.11 streaming例子的更多相关文章

  1. kafka 0.8+spark offset 提交至mysql

    kafka版本:<kafka.version> 0.8.2.1</kafka.version> spark版本 <artifactId>spark-streamin ...

  2. 【原创】Kafka 0.11消息设计

    Kafka 0.11版本增加了很多新功能,包括支持事务.精确一次处理语义和幂等producer等,而实现这些新功能的前提就是要提供支持这些功能的新版本消息格式,同时也要维护与老版本的兼容性.本文将详细 ...

  3. 【译】Flink + Kafka 0.11端到端精确一次处理语义的实现

    本文是翻译作品,作者是Piotr Nowojski和Michael Winters.前者是该方案的实现者. 原文地址是https://data-artisans.com/blog/end-to-end ...

  4. Kafka 0.11.0.0 实现 producer的Exactly-once 语义(中文)

    很高兴地告诉大家,具备新的里程碑意义的功能的Kafka 0.11.x版本(对应 Confluent Platform 3.3)已经release,该版本引入了exactly-once语义,本文阐述的内 ...

  5. Kafka 0.11.0.0 实现 producer的Exactly-once 语义(英文)

    Exactly-once Semantics are Possible: Here’s How Kafka Does it I’m thrilled that we have hit an excit ...

  6. Kafka设计解析(二十二)Flink + Kafka 0.11端到端精确一次处理语义的实现

    转载自 huxihx,原文链接 [译]Flink + Kafka 0.11端到端精确一次处理语义的实现 本文是翻译作品,作者是Piotr Nowojski和Michael Winters.前者是该方案 ...

  7. Kafka设计解析(十六)Kafka 0.11消息设计

    转载自 huxihx,原文链接 [原创]Kafka 0.11消息设计 目录 一.Kafka消息层次设计 1. v1格式 2. v2格式 二.v1消息格式 三.v2消息格式 四.测试对比 Kafka 0 ...

  8. Kafka 0.11.0.0 实现 producer的Exactly-once 语义(官方DEMO)

    <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients&l ...

  9. Kafka 0.11新功能介绍:空消费组延迟rebalance

    Kafka 0.11新功能介绍:空消费组延迟rebalance 在0.11之前的版本中,多个consumer实例加入到一个空消费组将导致多次的rebalance,这是由于每个consumer inst ...

随机推荐

  1. 开源ext2read代码走读之-在windows下怎样推断有几个硬盘设备?

    int get_ndisks() {     HANDLE hDevice;               // handle to the drive to be examined     int n ...

  2. 错误 'Cannot run program "/home/uv/IDE/adt/sdk/platform-tools/adb": error=2, No such file or directory

    转 Linux下Android SDK中adb找不到的解决方案 2013年04月22日 20:41:48 阅读数:7621 在Linux平台下配置Android SDK开发环境过程中,Eclipse会 ...

  3. 利用机器学习进行DNS隐蔽通道检测——数据收集,利用iodine进行DNS隐蔽通道样本收集

    我们在使用机器学习做DNS隐蔽通道检测的过程中,不得不面临样本收集的问题,没办法,机器学习没有样本真是“巧妇难为无米之炊”啊! 本文简单介绍了DNS隐蔽通道传输工具iodine,并介绍如何从iodin ...

  4. [IOI 1999] 花店橱窗布置

    [题目链接] https://www.luogu.org/problemnew/show/P1854v [算法] f[i][j]表示放了前i束花,第i束花放在第j个花瓶中,所能获得的最大美学值 由于要 ...

  5. mysql授予IP远程访问访问语句

    mysql -u root -p; GRANT ALL PRIVILEGES ON *.* TO '用户名'@'你的IP地址' IDENTIFIED BY '密码' WITH GRANT OPTION ...

  6. 洛谷P4015 运输问题(费用流)

    题目描述 WW 公司有 mm 个仓库和 nn 个零售商店.第 ii 个仓库有 a_iai​ 个单位的货物:第 jj 个零售商店需要 b_jbj​ 个单位的货物. 货物供需平衡,即\sum\limits ...

  7. django前端到后端一次完整请求实例

    一.创建项目:# django-admin startproject mysite# cd mysite# python manage.py startapp blog 目录结构: 一.html文件: ...

  8. 基于vue项目的js工具方法汇总

    以下是个人过去一年在vue项目的开发过程中经常会用到的一些公共方法,在此进行汇总,方便以后及有需要的朋友查看~ let util = {}; /** * @description 日期格式化 * @p ...

  9. 3ds Max制作欧式风格的墙壁路灯效果

    在这篇文章中,我将解释我创建我的形象元宵节的步骤.我只是在寻找一个很好的参考图像在互联网上的东西,我觉得我想要的模型,这个形象.我发现了一个巨大的灯笼形象,但在白天的图片拍摄.我想改变我的形象和显示的 ...

  10. NTP同步底层实现

    RFC http://www.ietf.org/rfc/rfc5905.txt https://www.eecis.udel.edu/~mills/ntp/html/select.html https ...