Maven组件如下:

<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
<version>2.3.0</version>
</dependency>

官网代码如下:

/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/ // scalastyle:off println
package org.apache.spark.examples.streaming import org.apache.spark.SparkConf
import org.apache.spark.streaming._
import org.apache.spark.streaming.kafka010._ /**
* Consumes messages from one or more topics in Kafka and does wordcount.
* Usage: DirectKafkaWordCount <brokers> <topics>
* <brokers> is a list of one or more Kafka brokers
* <topics> is a list of one or more kafka topics to consume from
*
* Example:
* $ bin/run-example streaming.DirectKafkaWordCount broker1-host:port,broker2-host:port \
* topic1,topic2
*/
object DirectKafkaWordCount {
def main(args: Array[String]) {
if (args.length < 2) {
System.err.println(s"""
|Usage: DirectKafkaWordCount <brokers> <topics>
| <brokers> is a list of one or more Kafka brokers
| <topics> is a list of one or more kafka topics to consume from
|
""".stripMargin)
System.exit(1)
} StreamingExamples.setStreamingLogLevels() val Array(brokers, topics) = args // Create context with 2 second batch interval
val sparkConf = new SparkConf().setAppName("DirectKafkaWordCount")
val ssc = new StreamingContext(sparkConf, Seconds(2)) // Create direct kafka stream with brokers and topics
val topicsSet = topics.split(",").toSet
val kafkaParams = Map[String, String]("metadata.broker.list" -> brokers)
val messages = KafkaUtils.createDirectStream[String, String](
ssc,
LocationStrategies.PreferConsistent,
ConsumerStrategies.Subscribe[String, String](topicsSet, kafkaParams)) // Get the lines, split them into words, count the words and print
val lines = messages.map(_.value)
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1L)).reduceByKey(_ + _)
wordCounts.print() // Start the computation
ssc.start()
ssc.awaitTermination()
}
}
// scalastyle:on println

运行以上代码出现如下错误等:

Exception in thread "main" org.apache.kafka.common.config.ConfigException: Missing required configuration "bootstrap.servers" which has no default value.

由错误可见,是因为没有设置kafka相关参数。

把官网代码修改如下:

package cn.xdf.userprofile.stream
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.kafka010._ import scala.collection.mutable object DirectKafka {
def main(args: Array[String]): Unit = {
if (args.length < 2) {
System.err.println(
s"""
|Usage: DirectKafkaWordCount <brokers> <topics>
| <brokers> is a list of one or more Kafka brokers
| <topics> is a list of one or more kafka topics to consume from
|
""".stripMargin)
System.exit(1)
}
val Array(brokers,topics)=args var conf = new SparkConf()
.setAppName("DirectKafka")
.setMaster("local[2]") val ssc = new StreamingContext(conf, Seconds(2)) val topicsSet=topics.split(",").toSet
val kafkaParams=mutable.HashMap[String,String]()
//必须添加以下参数,否则会报错
kafkaParams.put("bootstrap.servers" ,brokers)
kafkaParams.put("group.id", "group1")
kafkaParams.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
kafkaParams.put("value.deserializer" , "org.apache.kafka.common.serialization.StringDeserializer")
val messages=KafkaUtils.createDirectStream [String,String](
ssc,
LocationStrategies.PreferConsistent,
ConsumerStrategies.Subscribe[String,String](topicsSet,kafkaParams
)
)
// Get the lines, split them into words, count the words and print
val lines = messages.map(_.value)
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1L)).reduceByKey(_ + _)
wordCounts.print() // Start the computation
ssc.start()
ssc.awaitTermination() }
}

运行过程如下:

 启动kafka

bin/kafka-server-start ./etc/kafka/server.properties &

[2018-10-22 11:24:14,748] INFO [GroupCoordinator 0]: Stabilized group group1 generation 1 (__consumer_offsets-40) (kafka.coordinator.group.GroupCoordinator)
[2018-10-22 11:24:14,761] INFO [GroupCoordinator 0]: Assignment received from leader for group group1 for generation 1 (kafka.coordinator.group.GroupCoordinator)
[2018-10-22 11:24:14,779] INFO Updated PartitionLeaderEpoch. New: {epoch:0, offset:0}, Current: {epoch:-1, offset-1} for Partition: __consumer_offsets-40. Cache now contains 0 entries. (kafka.server.epoch.LeaderEpochFileCache)
[2018-10-22 11:28:19,010] INFO [GroupCoordinator 0]: Preparing to rebalance group group1 with old generation 1 (__consumer_offsets-40) (kafka.coordinator.group.GroupCoordinator)
[2018-10-22 11:28:19,013] INFO [GroupCoordinator 0]: Group group1 with generation 2 is now empty (__consumer_offsets-40) (kafka.coordinator.group.GroupCoordinator)
[2018-10-22 11:29:29,424] INFO [GroupMetadataManager brokerId=0] Removed 0 expired offsets in 11 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-10-22 11:39:29,414] INFO [GroupMetadataManager brokerId=0] Removed 0 expired offsets in 1 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-10-22 11:49:29,414] INFO [GroupMetadataManager brokerId=0] Removed 0 expired offsets in 1 milliseconds. (kafka.coordinator.group.GroupMetadataManager)

运行spark

/usr/local/spark-2.3.0/bin/spark-submit --class cn.xdf.userprofile.stream.DirectKafka --master yarn --driver-memory 2g     --num-executors 1      --executor-memory 2g     --executor-cores 1  userprofile2.0.jar localhost:9092 test

2018-10-22 11:28:16 INFO  DAGScheduler:54 - Submitting 1 missing tasks from ResultStage 483 (ShuffledRDD[604] at reduceByKey at DirectKafka.scala:46) (first 15 tasks are for partitions Vector(1))
2018-10-22 11:28:16 INFO  TaskSchedulerImpl:54 - Adding task set 483.0 with 1 tasks
2018-10-22 11:28:16 INFO  TaskSetManager:54 - Starting task 0.0 in stage 483.0 (TID 362, localhost, executor driver, partition 1, PROCESS_LOCAL, 7649 bytes)
2018-10-22 11:28:16 INFO  Executor:54 - Running task 0.0 in stage 483.0 (TID 362)
2018-10-22 11:28:16 INFO  ShuffleBlockFetcherIterator:54 - Getting 0 non-empty blocks out of 1 blocks
2018-10-22 11:28:16 INFO  ShuffleBlockFetcherIterator:54 - Started 0 remote fetches in 0 ms
2018-10-22 11:28:16 INFO  Executor:54 - Finished task 0.0 in stage 483.0 (TID 362). 1091 bytes result sent to driver
2018-10-22 11:28:16 INFO  TaskSetManager:54 - Finished task 0.0 in stage 483.0 (TID 362) in 4 ms on localhost (executor driver) (1/1)
2018-10-22 11:28:16 INFO  TaskSchedulerImpl:54 - Removed TaskSet 483.0, whose tasks have all completed, from pool 
2018-10-22 11:28:16 INFO  DAGScheduler:54 - ResultStage 483 (print at DirectKafka.scala:47) finished in 0.008 s
2018-10-22 11:28:16 INFO  DAGScheduler:54 - Job 241 finished: print at DirectKafka.scala:47, took 0.009993 s
-------------------------------------------
Time: 1540178896000 ms
-------------------------------------------

启动生产者

[root@master kafka_2.11-1.0.0]# bin/kafka-console-producer.sh --topic test --broker-list localhost:9092
>  hello you

>  hello me

查看结果:

(hello,2)
(me,1)
(you,1)
2018-10-22 11:57:08 INFO  JobScheduler:54 - Finished job streaming job 1540180628000 ms.0 from job set of time 1540180628000 ms
2018-10-22 11:57:08 INFO  JobScheduler:54 - Total delay: 0.119 s for time 1540180628000 ms (execution: 0.072 s)
2018-10-22 11:57:08 INFO  ShuffledRDD:54 - Removing RDD 154 from persistence list
2018-10-22 11:57:08 INFO  MapPartitionsRDD:54 - Removing RDD 153 from persistence list
2018-10-22 11:57:08 INFO  BlockManager:54 - Removing RDD 153
2018-10-22 11:57:08 INFO  BlockManager:54 - Removing RDD 154
2018-10-22 11:57:08 INFO  MapPartitionsRDD:54 - Removing RDD 152 from persistence list
2018-10-22 11:57:08 INFO  BlockManager:54 - Removing RDD 152
2018-10-22 11:57:08 INFO  MapPartitionsRDD:54 - Removing RDD 151 from persistence list
2018-10-22 11:57:08 INFO  BlockManager:54 - Removing RDD 151
2018-10-22 11:57:08 INFO  KafkaRDD:54 - Removing RDD 150 from persistence list
2018-10-22 11:57:08 INFO  BlockManager:54 - Removing RDD 150

scala spark-streaming整合kafka (spark 2.3 kafka 0.10)的更多相关文章

  1. Spark学习之路(十六)—— Spark Streaming 整合 Kafka

    一.版本说明 Spark针对Kafka的不同版本,提供了两套整合方案:spark-streaming-kafka-0-8和spark-streaming-kafka-0-10,其主要区别如下:   s ...

  2. Spark 系列(十六)—— Spark Streaming 整合 Kafka

    一.版本说明 Spark 针对 Kafka 的不同版本,提供了两套整合方案:spark-streaming-kafka-0-8 和 spark-streaming-kafka-0-10,其主要区别如下 ...

  3. spark streaming整合kafka

    版本说明:spark:2.2.0: kafka:0.10.0.0 object StreamingDemo { def main(args: Array[String]): Unit = { Logg ...

  4. spark streaming从指定offset处消费Kafka数据

    spark streaming从指定offset处消费Kafka数据 -- : 770人阅读 评论() 收藏 举报 分类: spark() 原文地址:http://blog.csdn.net/high ...

  5. Spark学习之路(十五)—— Spark Streaming 整合 Flume

    一.简介 Apache Flume是一个分布式,高可用的数据收集系统,可以从不同的数据源收集数据,经过聚合后发送到分布式计算框架或者存储系统中.Spark Straming提供了以下两种方式用于Flu ...

  6. Spark 系列(十五)—— Spark Streaming 整合 Flume

    一.简介 Apache Flume 是一个分布式,高可用的数据收集系统,可以从不同的数据源收集数据,经过聚合后发送到分布式计算框架或者存储系统中.Spark Straming 提供了以下两种方式用于 ...

  7. Spark Streaming 整合 Flume

    Spark Streaming 整合 Flume ​ 一.简介二.推送式方法        2.1 配置日志收集Flume        2.2 项目依赖        2.3 Spark Strea ...

  8. Spark Streaming,Flink,Storm,Kafka Streams,Samza:如何选择流处理框架

    根据最新的统计显示,仅在过去的两年中,当今世界上90%的数据都是在新产生的,每天创建2.5万亿字节的数据,并且随着新设备,传感器和技术的出现,数据增长速度可能会进一步加快. 从技术上讲,这意味着我们的 ...

  9. Spark Streaming揭秘 Day35 Spark core思考

    Spark Streaming揭秘 Day35 Spark core思考 Spark上的子框架,都是后来加上去的.都是在Spark core上完成的,所有框架一切的实现最终还是由Spark core来 ...

  10. spark streaming 整合 kafka(一)

    转载:https://www.iteblog.com/archives/1322.html Apache Kafka是一个分布式的消息发布-订阅系统.可以说,任何实时大数据处理工具缺少与Kafka整合 ...

随机推荐

  1. Win10系列:C#应用控件基础20

    SemanticZoom控件 SemanticZoom控件由相互关联的缩小视图和放大视图所组成,缩小视图用来显示内容的索引,放大视图可以用来显示内容的详细信息,用户可以根据阅读需要在两种视图之间自由切 ...

  2. PHP写的验证码类

    class Captcha { private $width; private $height; private $codeNum; private $code; private $im; funct ...

  3. day_7

    r 读w 写a 追加写 f = open('s.txt',mode='r',encoding='utf-8') ff = f.readline() #读取时一行一行的时候末尾会跟着\nprint(ff ...

  4. 在Linux终端安装Julia

    官方参考文档:https://julialang.org/downloads/platform.html#generic-binaries 一.centos终端安装 打开Linux终端输入 sudo ...

  5. SQL语句复制父子级表数据

    原始表Book数据如下: id为自增长整数,text为内容,pid为上级IDid text pid1 第一层 02 数学 03 第二层 14 语文 15 第三层 36 英语 3…………………… 现要求 ...

  6. 外网ssh内网Linux服务器holer实现

    外网ssh访问内网linux 内网的主机上安装了Linux系统,只能在局域网内访问,怎样从公网也能ssh访问本地Linux服务器? 本文将介绍使用holer实现的具体步骤. 1. 准备工作 1.1 安 ...

  7. C#添加文字水印

    使用的是iTextSharp添加PDF水印,由于是接口动态生成PDF,所以采用的是全部是内存流的形式,而且水印是平铺是.iTextSharp版本是5.5 /// <summary> /// ...

  8. canvas画圆类似于锯齿指针 angular5

    拿到图的时候大致是这样的,里面的圆是有动态效果的,考虑到gif图耗资源,于是想要用canvas画出来: 仔细看图不难发现,这个锯齿圆类似于表盘,计算好弧度,不难实现: 因为项目现在用的框架是angul ...

  9. js实现颜色渐变

    #grad { background: -webkit-linear-gradient(red, blue); /* Safari 5.1 - 6.0 */ background: -o-linear ...

  10. 2018 vue前端面试题

    1.active-class是哪个组件的属性?嵌套路由怎么定义?答:vue-router模块的router-link组件. 2.怎么定义vue-router的动态路由?怎么获取传过来的动态参数? 答: ...