spark streaming中使用flume数据源
有两种方式,一种是sparkstreaming中的driver起监听,flume来推数据;另一种是sparkstreaming按照时间策略轮训的向flume拉数据。
最开始我以为只有第一种方法,但是尼玛问题在于driver起来的结点是没谱的,所以每次我重启streaming后发现尼玛每次都要修改flume的sinks,蛋疼死了,后来才发现有后面的方法,好吧,把不同的方法代码写出来,其实变化不大。(代码转自官方的githup)
第一种,监听端口:
package org.apache.spark.examples.streaming import org.apache.spark.SparkConf
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming._
import org.apache.spark.streaming.flume._
import org.apache.spark.util.IntParam /**
* Produces a count of events received from Flume.
*
* This should be used in conjunction with an AvroSink in Flume. It will start
* an Avro server on at the request host:port address and listen for requests.
* Your Flume AvroSink should be pointed to this address.
*
* Usage: FlumeEventCount <host> <port>
* <host> is the host the Flume receiver will be started on - a receiver
* creates a server and listens for flume events.
* <port> is the port the Flume receiver will listen on.
*
* To run this example:
* `$ bin/run-example org.apache.spark.examples.streaming.FlumeEventCount <host> <port> `
*/
object FlumeEventCount {
def main(args: Array[String]) {
if (args.length < 2) {
System.err.println(
"Usage: FlumeEventCount <host> <port>")
System.exit(1)
} StreamingExamples.setStreamingLogLevels() val Array(host, IntParam(port)) = args val batchInterval = Milliseconds(2000) // Create the context and set the batch size
val sparkConf = new SparkConf().setAppName("FlumeEventCount")
val ssc = new StreamingContext(sparkConf, batchInterval) // Create a flume stream
val stream = FlumeUtils.createStream(ssc, host, port, StorageLevel.MEMORY_ONLY_SER_2) // Print out the count of events received from this server in each batch
stream.count().map(cnt => "Received " + cnt + " flume events." ).print() ssc.start()
ssc.awaitTermination()
}
}
第二种是轮训主动向flume拿数据
package org.apache.spark.examples.streaming import org.apache.spark.SparkConf
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming._
import org.apache.spark.streaming.flume._
import org.apache.spark.util.IntParam
import java.net.InetSocketAddress /**
* Produces a count of events received from Flume.
*
* This should be used in conjunction with the Spark Sink running in a Flume agent. See
* the Spark Streaming programming guide for more details.
*
* Usage: FlumePollingEventCount <host> <port>
* `host` is the host on which the Spark Sink is running.
* `port` is the port at which the Spark Sink is listening.
*
* To run this example:
* `$ bin/run-example org.apache.spark.examples.streaming.FlumePollingEventCount [host] [port] `
*/
object FlumePollingEventCount {
def main(args: Array[String]) {
if (args.length < 2) {
System.err.println(
"Usage: FlumePollingEventCount <host> <port>")
System.exit(1)
} StreamingExamples.setStreamingLogLevels() val Array(host, IntParam(port)) = args val batchInterval = Milliseconds(2000) // Create the context and set the batch size
val sparkConf = new SparkConf().setAppName("FlumePollingEventCount")
val ssc = new StreamingContext(sparkConf, batchInterval) // Create a flume stream that polls the Spark Sink running in a Flume agent
val stream = FlumeUtils.createPollingStream(ssc, host, port) // Print out the count of events received from this server in each batch
stream.count().map(cnt => "Received " + cnt + " flume events." ).print() ssc.start()
ssc.awaitTermination()
}
}
spark streaming中使用flume数据源的更多相关文章
- Spark Streaming中向flume拉取数据
在这里看到的解决方法 https://issues.apache.org/jira/browse/SPARK-1729 请是个人理解,有问题请大家留言. 其实本身flume是不支持像KAFKA一样的发 ...
- Spark Streaming中的操作函数分析
根据Spark官方文档中的描述,在Spark Streaming应用中,一个DStream对象可以调用多种操作,主要分为以下几类 Transformations Window Operations J ...
- Spark Streaming中的操作函数讲解
Spark Streaming中的操作函数讲解 根据根据Spark官方文档中的描述,在Spark Streaming应用中,一个DStream对象可以调用多种操作,主要分为以下几类 Transform ...
- spark streaming中维护kafka偏移量到外部介质
spark streaming中维护kafka偏移量到外部介质 以kafka偏移量维护到redis为例. redis存储格式 使用的数据结构为string,其中key为topic:partition, ...
- Spark Streaming中动态Batch Size实现初探
本期内容 : BatchDuration与 Process Time 动态Batch Size Spark Streaming中有很多算子,是否每一个算子都是预期中的类似线性规律的时间消耗呢? 例如: ...
- flink和spark Streaming中的Back Pressure
Spark Streaming的back pressure 在讲flink的back pressure之前,我们先讲讲Spark Streaming的back pressure.Spark Strea ...
- spark streaming中使用checkpoint
从官方的Programming Guides中看到的 我理解streaming中的checkpoint有两种,一种指的是metadata的checkpoint,用于恢复你的streaming:一种是r ...
- Spark Streaming数据限流简述
Spark Streaming对实时数据流进行分析处理,源源不断的从数据源接收数据切割成一个个时间间隔进行处理: 流处理与批处理有明显区别,批处理中的数据有明显的边界.数据规模已知:而流处理数 ...
- Apache Spark 2.2.0 中文文档 - Spark Streaming 编程指南 | ApacheCN
Spark Streaming 编程指南 概述 一个入门示例 基础概念 依赖 初始化 StreamingContext Discretized Streams (DStreams)(离散化流) Inp ...
随机推荐
- emmet-vim
最近啊,我投奔了网页的开发,看了一本<head first HTML and CSS>的书,感觉非常不错,然后又配置了一些vim里面用到的插件,现在我把学习到的东西记录下来! 首先,我不会 ...
- C语言异常处理和连接数据库
#include <stdio.h> #include <setjmp.h> jmp_buf j; void Exception(void); double diva(doub ...
- Linux 查看CPU,内存,硬盘 !转
Linux 查看CPU,内存,硬盘 本文转自:http://hi.baidu.com/mumachuntian/item/a401368dbe8a66cab07154e8 1 查看CPU 1.1 查看 ...
- php的urlencode()URL编码函数浅析
URLEncode:是指针对网页url中的中文字符的一种编码转化方式,最常见的就是Baidu.Google等搜索引擎中输入中文查询时候,生成经过Encode过的网页URL. URLEncode的方 ...
- (JS高手不用看了!我只是在碎碎念,因为我也不知道面什么)JavaScript的算术运算
Math.pow(2,53) //2的51次幂 Math.round(0.6) //四舍五入 Math.cell(0.6) //向上求整 Math.floor(0.6) / ...
- RecContentType有哪些
HTML 页面text/javascript `type="text/javascript"` 是比较老的写法IETF 推荐的是 `type="application ...
- bmob
移动后台: bmob http://baike.baidu.com/link?url=GHdwJY3cGygcfQDdzosckQnhVy1pvIGZA2Ws0K26lSSFGu7QRX4R1wlo6 ...
- OpenCV入门(一)
参考:http://blog.csdn.net/poem_qianmo/article/details/20537737 这位同学挺牛的,才研一就出书了,实在是让人汗颜啊,不说了,多学习. 这一篇主要 ...
- 找不到提交和更新按钮,subversion不见了,无法更新和上传代码
1.查看settings/plugins/下有没有subversion 插件,如果有,确保勾上. 2.VCS->Enable Version Control Integration...
- win7 64位安装redis 及Redis Desktop Manager使用
写基于dapper的一套自动化程序,看到 mgravell的另一个项目,StackExchange.Redis,之前在.NET上用过一段时间redis,不过一直是其它的驱动开发包,这个根据作者介绍,是 ...