kafka 0.11 spark 2.11 streaming例子
"""
Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
Usage: kafka_wordcount.py <zk> <topic>
To run this on your local machine, you need to setup Kafka and create a producer first, see
http://kafka.apache.org/documentation.html#quickstart
and then run the example
`$ bin/spark-submit --jars \
external/kafka-assembly/target/scala-*/spark-streaming-kafka-assembly-*.jar \
examples/src/main/python/streaming/kafka_wordcount.py \
localhost:2181 test`
"""
from __future__ import print_function import sys from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: kafka_wordcount.py <zk> <topic>", file=sys.stderr)
exit(-1) sc = SparkContext(appName="PythonStreamingKafkaWordCount")
ssc = StreamingContext(sc, 1) zkQuorum, topic = sys.argv[1:]
kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", {topic: 1})
lines = kvs.map(lambda x: x[1])
counts = lines.flatMap(lambda line: line.split(" ")) \
.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a+b)
counts.pprint() ssc.start()
ssc.awaitTermination()
/spark-kafka/spark-2.1.1-bin-hadoop2.6# ./bin/spark-submit --jars ~/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar examples/src/main/python/streaming/kafka_wordcount.py localhost:2181 test
其中:spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar在 http://search.maven.org/#search%7Cga%7C1%7Cspark-streaming-kafka-0-8-assembly 下载
kafka 使用0.11版本:
1.3 Quick Start
This tutorial assumes you are starting fresh and have no existing Kafka or ZooKeeper data. Since Kafka console scripts are different for Unix-based and Windows platforms, on Windows platforms use bin\windows\ instead of bin/, and change the script extension to .bat.
Step 1: Download the code
Download the 0.11.0.0 release and un-tar it.
|
1
2
|
> tar -xzf kafka_2.11-0.11.0.0.tgz> cd kafka_2.11-0.11.0.0 |
Step 2: Start the server
Kafka uses ZooKeeper so you need to first start a ZooKeeper server if you don't already have one. You can use the convenience script packaged with kafka to get a quick-and-dirty single-node ZooKeeper instance.
|
1
2
3
|
> bin/zookeeper-server-start.sh config/zookeeper.properties[2013-04-22 15:01:37,495] INFO Reading configuration from: config/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig)... |
Now start the Kafka server:
|
1
2
3
4
|
> bin/kafka-server-start.sh config/server.properties[2013-04-22 15:01:47,028] INFO Verifying properties (kafka.utils.VerifiableProperties)[2013-04-22 15:01:47,051] INFO Property socket.send.buffer.bytes is overridden to 1048576 (kafka.utils.VerifiableProperties)... |
Step 3: Create a topic
Let's create a topic named "test" with a single partition and only one replica:
|
1
|
> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test |
We can now see that topic if we run the list topic command:
|
1
2
|
> bin/kafka-topics.sh --list --zookeeper localhost:2181test |
Alternatively, instead of manually creating topics you can also configure your brokers to auto-create topics when a non-existent topic is published to.
Step 4: Send some messages
Kafka comes with a command line client that will take input from a file or from standard input and send it out as messages to the Kafka cluster. By default, each line will be sent as a separate message.
Run the producer and then type a few messages into the console to send to the server.
|
1
2
3
|
> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic testThis is a messageThis is another message |
Step 5: Start a consumer
Kafka also has a command line consumer that will dump out messages to standard output.
|
1
2
3
|
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginningThis is a messageThis is another message |
kafka 0.11 spark 2.11 streaming例子的更多相关文章
- kafka 0.8+spark offset 提交至mysql
kafka版本:<kafka.version> 0.8.2.1</kafka.version> spark版本 <artifactId>spark-streamin ...
- 【原创】Kafka 0.11消息设计
Kafka 0.11版本增加了很多新功能,包括支持事务.精确一次处理语义和幂等producer等,而实现这些新功能的前提就是要提供支持这些功能的新版本消息格式,同时也要维护与老版本的兼容性.本文将详细 ...
- 【译】Flink + Kafka 0.11端到端精确一次处理语义的实现
本文是翻译作品,作者是Piotr Nowojski和Michael Winters.前者是该方案的实现者. 原文地址是https://data-artisans.com/blog/end-to-end ...
- Kafka 0.11.0.0 实现 producer的Exactly-once 语义(中文)
很高兴地告诉大家,具备新的里程碑意义的功能的Kafka 0.11.x版本(对应 Confluent Platform 3.3)已经release,该版本引入了exactly-once语义,本文阐述的内 ...
- Kafka 0.11.0.0 实现 producer的Exactly-once 语义(英文)
Exactly-once Semantics are Possible: Here’s How Kafka Does it I’m thrilled that we have hit an excit ...
- Kafka设计解析(二十二)Flink + Kafka 0.11端到端精确一次处理语义的实现
转载自 huxihx,原文链接 [译]Flink + Kafka 0.11端到端精确一次处理语义的实现 本文是翻译作品,作者是Piotr Nowojski和Michael Winters.前者是该方案 ...
- Kafka设计解析(十六)Kafka 0.11消息设计
转载自 huxihx,原文链接 [原创]Kafka 0.11消息设计 目录 一.Kafka消息层次设计 1. v1格式 2. v2格式 二.v1消息格式 三.v2消息格式 四.测试对比 Kafka 0 ...
- Kafka 0.11.0.0 实现 producer的Exactly-once 语义(官方DEMO)
<dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients&l ...
- Kafka 0.11新功能介绍:空消费组延迟rebalance
Kafka 0.11新功能介绍:空消费组延迟rebalance 在0.11之前的版本中,多个consumer实例加入到一个空消费组将导致多次的rebalance,这是由于每个consumer inst ...
随机推荐
- How do I UPDATE from a SELECT in SQL Server?
方法1 https://stackoverflow.com/questions/2334712/how-do-i-update-from-a-select-in-sql-server UPDATE T ...
- 对于NAS,IP SAN以及iSCSCI SAN存储的一些认识和理解
一直以来用户对于在选购存储产品上有许多不清楚,市场上有NAS, FC SAN,IP SAN和iSCSCI SAN产品,到底哪种类型的产品更适合支撑企业的应用系统呢? 我们经常可以听到用户讲: “NAS ...
- [jzoj 6092] [GDOI2019模拟2019.3.30] 附耳而至 解题报告 (平面图转对偶图+最小割)
题目链接: https://jzoj.net/senior/#main/show/6092 题目: 知识点--平面图转对偶图 在求最小割的时候,我们可以把平面图转为对偶图,用最短路来求最小割,这样会比 ...
- 5.listview(QStringList QStringListModel)
UI mainwindow.h #ifndef MAINWINDOW_H #define MAINWINDOW_H #include <QMainWindow> #include < ...
- APUE学习笔记6——线程和线程同步
1 概念 线程是程序执行流的最小单元.线程是进程中的一个实体,是被系统独立调度和分派的基本单位,线程自己不拥有系统资源,只拥有一点在运行中必不可少的资源,但它可与同属一个进程的其它线程共享进程所拥有的 ...
- Hacking PHP
0X01 SQL注入 这里主要是PHP的防范注入的几个配置,注入手法不再赘述 magic_quotes_gpc 对 $_GET $_POST $_COOKIE 变量中的 ' " \ 空字符( ...
- TESTUSERB 仅能对TESTUSERA 用户下的某些表增删改查、有些表仅能对某些列update,查询TESTUSERB 用户权限,获取批量赋予语句。
TESTUSERB 仅能对TESTUSERA 用户下的某些表增删改查.有些表仅能对某些列update,查询TESTUSERB 用户权限,获取批量赋予语句. select 'grant '|| PRIV ...
- STL中的迭代器的使用
package com.text; import java.lang.reflect.Field;import java.util.ArrayList;import java.util.Iterato ...
- Servlet的生命周期和Jsp的生命周期
Servlet的生命周期: 1)构造方法(第1次访问) 2)init方法(第1次访问) 3)service方法 4)destroy方法 Jsp的生命周期 1)翻译: jsp->java文件 2) ...
- Vue学习之路第四篇:v-html指令
上一篇我们讲解了两种方式,把Vue对象的数据展示在页面上: 1.插值表达式 2.v-text指令 但是如果我们展示的数据包含元素标签或者样式,我们想展示标签或样式所定义的属性作用,该怎么进行渲染,比如 ...