spark streaming中使用flume数据源

有两种方式，一种是sparkstreaming中的driver起监听，flume来推数据；另一种是sparkstreaming按照时间策略轮训的向flume拉数据。

最开始我以为只有第一种方法，但是尼玛问题在于driver起来的结点是没谱的，所以每次我重启streaming后发现尼玛每次都要修改flume的sinks，蛋疼死了，后来才发现有后面的方法，好吧，把不同的方法代码写出来，其实变化不大。（代码转自官方的githup）

第一种，监听端口：

package org.apache.spark.examples.streaming

import org.apache.spark.SparkConf

import org.apache.spark.storage.StorageLevel

import org.apache.spark.streaming._

import org.apache.spark.streaming.flume._

import org.apache.spark.util.IntParam

/**

 *  Produces a count of events received from Flume.

 *

 *  This should be used in conjunction with an AvroSink in Flume. It will start

 *  an Avro server on at the request host:port address and listen for requests.

 *  Your Flume AvroSink should be pointed to this address.

 *

 *  Usage: FlumeEventCount <host> <port>

 *    <host> is the host the Flume receiver will be started on - a receiver

 *           creates a server and listens for flume events.

 *    <port> is the port the Flume receiver will listen on.

 *

 *  To run this example:

 *    `$ bin/run-example org.apache.spark.examples.streaming.FlumeEventCount <host> <port> `

 */

object FlumeEventCount {

  def main(args: Array[String]) {

    if (args.length < 2) {

      System.err.println(

        "Usage: FlumeEventCount <host> <port>")

      System.exit(1)

    }

    StreamingExamples.setStreamingLogLevels()

    val Array(host, IntParam(port)) = args

    val batchInterval = Milliseconds(2000)

    // Create the context and set the batch size

    val sparkConf = new SparkConf().setAppName("FlumeEventCount")

    val ssc = new StreamingContext(sparkConf, batchInterval)

    // Create a flume stream

    val stream = FlumeUtils.createStream(ssc, host, port, StorageLevel.MEMORY_ONLY_SER_2)

    // Print out the count of events received from this server in each batch

    stream.count().map(cnt => "Received " + cnt + " flume events." ).print()

    ssc.start()

    ssc.awaitTermination()

  }

}

第二种是轮训主动向flume拿数据

package org.apache.spark.examples.streaming

import org.apache.spark.SparkConf

import org.apache.spark.storage.StorageLevel

import org.apache.spark.streaming._

import org.apache.spark.streaming.flume._

import org.apache.spark.util.IntParam

import java.net.InetSocketAddress

/**

 *  Produces a count of events received from Flume.

 *

 *  This should be used in conjunction with the Spark Sink running in a Flume agent. See

 *  the Spark Streaming programming guide for more details.

 *

 *  Usage: FlumePollingEventCount <host> <port>

 *    `host` is the host on which the Spark Sink is running.

 *    `port` is the port at which the Spark Sink is listening.

 *

 *  To run this example:

 *    `$ bin/run-example org.apache.spark.examples.streaming.FlumePollingEventCount [host] [port] `

 */

object FlumePollingEventCount {

  def main(args: Array[String]) {

    if (args.length < 2) {

      System.err.println(

        "Usage: FlumePollingEventCount <host> <port>")

      System.exit(1)

    }

    StreamingExamples.setStreamingLogLevels()

    val Array(host, IntParam(port)) = args

    val batchInterval = Milliseconds(2000)

    // Create the context and set the batch size

    val sparkConf = new SparkConf().setAppName("FlumePollingEventCount")

    val ssc = new StreamingContext(sparkConf, batchInterval)

    // Create a flume stream that polls the Spark Sink running in a Flume agent

    val stream = FlumeUtils.createPollingStream(ssc, host, port)

    // Print out the count of events received from this server in each batch

    stream.count().map(cnt => "Received " + cnt + " flume events." ).print()

    ssc.start()

    ssc.awaitTermination()

  }

}

spark streaming中使用flume数据源的更多相关文章

Spark Streaming中向flume拉取数据
在这里看到的解决方法 https://issues.apache.org/jira/browse/SPARK-1729 请是个人理解,有问题请大家留言. 其实本身flume是不支持像KAFKA一样的发 ...
Spark Streaming中的操作函数分析
根据Spark官方文档中的描述,在Spark Streaming应用中,一个DStream对象可以调用多种操作,主要分为以下几类 Transformations Window Operations J ...
Spark Streaming中的操作函数讲解
Spark Streaming中的操作函数讲解根据根据Spark官方文档中的描述,在Spark Streaming应用中,一个DStream对象可以调用多种操作,主要分为以下几类 Transform ...
spark streaming中维护kafka偏移量到外部介质
spark streaming中维护kafka偏移量到外部介质以kafka偏移量维护到redis为例. redis存储格式使用的数据结构为string,其中key为topic:partition, ...
Spark Streaming中动态Batch Size实现初探
本期内容 : BatchDuration与 Process Time 动态Batch Size Spark Streaming中有很多算子,是否每一个算子都是预期中的类似线性规律的时间消耗呢? 例如: ...
flink和spark Streaming中的Back Pressure
Spark Streaming的back pressure 在讲flink的back pressure之前,我们先讲讲Spark Streaming的back pressure.Spark Strea ...
spark streaming中使用checkpoint
从官方的Programming Guides中看到的我理解streaming中的checkpoint有两种,一种指的是metadata的checkpoint,用于恢复你的streaming:一种是r ...
Spark Streaming数据限流简述
Spark Streaming对实时数据流进行分析处理,源源不断的从数据源接收数据切割成一个个时间间隔进行处理: 流处理与批处理有明显区别,批处理中的数据有明显的边界.数据规模已知:而流处理数 ...
Apache Spark 2.2.0 中文文档 - Spark Streaming 编程指南 | ApacheCN
Spark Streaming 编程指南概述一个入门示例基础概念依赖初始化 StreamingContext Discretized Streams (DStreams)(离散化流) Inp ...

随机推荐

emmet-vim
最近啊,我投奔了网页的开发,看了一本<head first HTML and CSS>的书,感觉非常不错,然后又配置了一些vim里面用到的插件,现在我把学习到的东西记录下来! 首先,我不会 ...
C语言异常处理和连接数据库
#include <stdio.h> #include <setjmp.h> jmp_buf j; void Exception(void); double diva(doub ...
Linux 查看CPU，内存，硬盘！转
Linux 查看CPU,内存,硬盘本文转自:http://hi.baidu.com/mumachuntian/item/a401368dbe8a66cab07154e8 1 查看CPU 1.1 查看 ...
php的urlencode()URL编码函数浅析
URLEncode:是指针对网页url中的中文字符的一种编码转化方式,最常见的就是Baidu.Google等搜索引擎中输入中文查询时候,生成经过Encode过的网页URL. URLEncode的方 ...
（JS高手不用看了！我只是在碎碎念，因为我也不知道面什么）JavaScript的算术运算
Math.pow(2,53) //2的51次幂 Math.round(0.6) //四舍五入 Math.cell(0.6) //向上求整 Math.floor(0.6) / ...
RecContentType有哪些
HTML 页面text/javascript `type="text/javascript"` 是比较老的写法IETF 推荐的是 `type="application ...
bmob
移动后台: bmob http://baike.baidu.com/link?url=GHdwJY3cGygcfQDdzosckQnhVy1pvIGZA2Ws0K26lSSFGu7QRX4R1wlo6 ...
OpenCV入门（一）
参考:http://blog.csdn.net/poem_qianmo/article/details/20537737 这位同学挺牛的,才研一就出书了,实在是让人汗颜啊,不说了,多学习. 这一篇主要 ...
找不到提交和更新按钮，subversion不见了，无法更新和上传代码
1.查看settings/plugins/下有没有subversion 插件,如果有,确保勾上. 2.VCS->Enable Version Control Integration...
win7 64位安装redis 及Redis Desktop Manager使用
写基于dapper的一套自动化程序,看到 mgravell的另一个项目,StackExchange.Redis,之前在.NET上用过一段时间redis,不过一直是其它的驱动开发包,这个根据作者介绍,是 ...

spark streaming中使用flume数据源

spark streaming中使用flume数据源的更多相关文章

随机推荐

热门专题