scala spark-streaming整合kafka (spark 2.3 kafka 0.10)
Maven组件如下:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
<version>2.3.0</version>
</dependency>
官网代码如下:
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/ // scalastyle:off println
package org.apache.spark.examples.streaming import org.apache.spark.SparkConf
import org.apache.spark.streaming._
import org.apache.spark.streaming.kafka010._ /**
* Consumes messages from one or more topics in Kafka and does wordcount.
* Usage: DirectKafkaWordCount <brokers> <topics>
* <brokers> is a list of one or more Kafka brokers
* <topics> is a list of one or more kafka topics to consume from
*
* Example:
* $ bin/run-example streaming.DirectKafkaWordCount broker1-host:port,broker2-host:port \
* topic1,topic2
*/
object DirectKafkaWordCount {
def main(args: Array[String]) {
if (args.length < 2) {
System.err.println(s"""
|Usage: DirectKafkaWordCount <brokers> <topics>
| <brokers> is a list of one or more Kafka brokers
| <topics> is a list of one or more kafka topics to consume from
|
""".stripMargin)
System.exit(1)
} StreamingExamples.setStreamingLogLevels() val Array(brokers, topics) = args // Create context with 2 second batch interval
val sparkConf = new SparkConf().setAppName("DirectKafkaWordCount")
val ssc = new StreamingContext(sparkConf, Seconds(2)) // Create direct kafka stream with brokers and topics
val topicsSet = topics.split(",").toSet
val kafkaParams = Map[String, String]("metadata.broker.list" -> brokers)
val messages = KafkaUtils.createDirectStream[String, String](
ssc,
LocationStrategies.PreferConsistent,
ConsumerStrategies.Subscribe[String, String](topicsSet, kafkaParams)) // Get the lines, split them into words, count the words and print
val lines = messages.map(_.value)
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1L)).reduceByKey(_ + _)
wordCounts.print() // Start the computation
ssc.start()
ssc.awaitTermination()
}
}
// scalastyle:on println
运行以上代码出现如下错误等:
Exception in thread "main" org.apache.kafka.common.config.ConfigException: Missing required configuration "bootstrap.servers" which has no default value.
由错误可见,是因为没有设置kafka相关参数。
把官网代码修改如下:
package cn.xdf.userprofile.stream
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.kafka010._ import scala.collection.mutable object DirectKafka {
def main(args: Array[String]): Unit = {
if (args.length < 2) {
System.err.println(
s"""
|Usage: DirectKafkaWordCount <brokers> <topics>
| <brokers> is a list of one or more Kafka brokers
| <topics> is a list of one or more kafka topics to consume from
|
""".stripMargin)
System.exit(1)
}
val Array(brokers,topics)=args var conf = new SparkConf()
.setAppName("DirectKafka")
.setMaster("local[2]") val ssc = new StreamingContext(conf, Seconds(2)) val topicsSet=topics.split(",").toSet
val kafkaParams=mutable.HashMap[String,String]()
//必须添加以下参数,否则会报错
kafkaParams.put("bootstrap.servers" ,brokers)
kafkaParams.put("group.id", "group1")
kafkaParams.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
kafkaParams.put("value.deserializer" , "org.apache.kafka.common.serialization.StringDeserializer")
val messages=KafkaUtils.createDirectStream [String,String](
ssc,
LocationStrategies.PreferConsistent,
ConsumerStrategies.Subscribe[String,String](topicsSet,kafkaParams
)
)
// Get the lines, split them into words, count the words and print
val lines = messages.map(_.value)
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1L)).reduceByKey(_ + _)
wordCounts.print() // Start the computation
ssc.start()
ssc.awaitTermination() }
}
运行过程如下:
bin/kafka-server-start ./etc/kafka/server.properties &
运行spark
/usr/local/spark-2.3.0/bin/spark-submit --class cn.xdf.userprofile.stream.DirectKafka --master yarn --driver-memory 2g --num-executors 1 --executor-memory 2g --executor-cores 1 userprofile2.0.jar localhost:9092 test
启动生产者
> hello me
查看结果:
scala spark-streaming整合kafka (spark 2.3 kafka 0.10)的更多相关文章
- Spark学习之路(十六)—— Spark Streaming 整合 Kafka
一.版本说明 Spark针对Kafka的不同版本,提供了两套整合方案:spark-streaming-kafka-0-8和spark-streaming-kafka-0-10,其主要区别如下: s ...
- Spark 系列(十六)—— Spark Streaming 整合 Kafka
一.版本说明 Spark 针对 Kafka 的不同版本,提供了两套整合方案:spark-streaming-kafka-0-8 和 spark-streaming-kafka-0-10,其主要区别如下 ...
- spark streaming整合kafka
版本说明:spark:2.2.0: kafka:0.10.0.0 object StreamingDemo { def main(args: Array[String]): Unit = { Logg ...
- spark streaming从指定offset处消费Kafka数据
spark streaming从指定offset处消费Kafka数据 -- : 770人阅读 评论() 收藏 举报 分类: spark() 原文地址:http://blog.csdn.net/high ...
- Spark学习之路(十五)—— Spark Streaming 整合 Flume
一.简介 Apache Flume是一个分布式,高可用的数据收集系统,可以从不同的数据源收集数据,经过聚合后发送到分布式计算框架或者存储系统中.Spark Straming提供了以下两种方式用于Flu ...
- Spark 系列(十五)—— Spark Streaming 整合 Flume
一.简介 Apache Flume 是一个分布式,高可用的数据收集系统,可以从不同的数据源收集数据,经过聚合后发送到分布式计算框架或者存储系统中.Spark Straming 提供了以下两种方式用于 ...
- Spark Streaming 整合 Flume
Spark Streaming 整合 Flume 一.简介二.推送式方法 2.1 配置日志收集Flume 2.2 项目依赖 2.3 Spark Strea ...
- Spark Streaming,Flink,Storm,Kafka Streams,Samza:如何选择流处理框架
根据最新的统计显示,仅在过去的两年中,当今世界上90%的数据都是在新产生的,每天创建2.5万亿字节的数据,并且随着新设备,传感器和技术的出现,数据增长速度可能会进一步加快. 从技术上讲,这意味着我们的 ...
- Spark Streaming揭秘 Day35 Spark core思考
Spark Streaming揭秘 Day35 Spark core思考 Spark上的子框架,都是后来加上去的.都是在Spark core上完成的,所有框架一切的实现最终还是由Spark core来 ...
- spark streaming 整合 kafka(一)
转载:https://www.iteblog.com/archives/1322.html Apache Kafka是一个分布式的消息发布-订阅系统.可以说,任何实时大数据处理工具缺少与Kafka整合 ...
随机推荐
- 【转载】Oracle 中count(1) 、count(*) 和count(列名) 函数的区别
1)count(1)与count(*)比较: 1.如果你的数据表没有主键,那么count(1)比count(*)快2.如果有主键的话,那主键(联合主键)作为count的条件也比count(*)要快3. ...
- 二十一. Python基础(21)--Python基础(21)
二十一. Python基础(21)--Python基础(21) 1 ● 类的命名空间 #对于类的静态属性: #类.属性: 调用的就是类中的属性 #对象.属性: 先从自己的内存空间里找名 ...
- python基础--windows环境下 安装python2和python3
一. python 安装 1. 下载安装包 1 2 3 https://www.python.org/ftp/python/2.7.14/python-2.7.14.amd64.msi # 2 ...
- nopcommerce 4.1 学习2 -插件之挂件
先了解下nop4.1的一些插件有哪些类型: 1.支付插件 Nop.Plugin.Payments.PayPalStandard Nop.Plugin.Payments.CheckMoneyOrd ...
- ionic3 自定义组件 滑动选择器 ion-multi-picker
1.ionic3中有一个 ion-datatime 给大家选择时间提供了一个很方便的组件 效果如图 链接 https://ionicframework.com/docs/api/component ...
- asp.net mvc模板布局
- radhat 添加用户,组,shell,附加组,家目录
linux下强制踢掉登陆用户 查看机器中登陆的用户 [root@sunsyk ~]# w 16:29:02 up 2 days, 2:35, 5 users, load average: 0.03, ...
- pthon入门之strip()和split()函数简单区分
小白,分享记录学习新感悟 路飞的第一次作业写一个登录的程序,作业的升级需求中有个锁文件的需求,大致上如果用户数错了密码三次将用户写到黑名单上,下次登录锁定: ok基本的要求写完,我们上代码 usern ...
- Windows10 bypassUAC绕过用户账户控制
使用这个github上的项目: https://github.com/L3cr0f/DccwBypassUAC 可以自行编译 全程UAC不介入,没反应. 测试: 权限提升真实有效
- 图像特征匹配,sift,surf法
今天想把这一段时间做的一些工作做个总结,望能帮到大家,尊重原创作品,转摘请注明原创地址:http://www.cnblogs.com/ggYYa/p/7902900.html,在此感谢!