spark streaming中使用flume数据源

有两种方式，一种是sparkstreaming中的driver起监听，flume来推数据；另一种是sparkstreaming按照时间策略轮训的向flume拉数据。

最开始我以为只有第一种方法，但是尼玛问题在于driver起来的结点是没谱的，所以每次我重启streaming后发现尼玛每次都要修改flume的sinks，蛋疼死了，后来才发现有后面的方法，好吧，把不同的方法代码写出来，其实变化不大。（代码转自官方的githup）

第一种，监听端口：

package org.apache.spark.examples.streaming

import org.apache.spark.SparkConf

import org.apache.spark.storage.StorageLevel

import org.apache.spark.streaming._

import org.apache.spark.streaming.flume._

import org.apache.spark.util.IntParam

/**

 *  Produces a count of events received from Flume.

 *

 *  This should be used in conjunction with an AvroSink in Flume. It will start

 *  an Avro server on at the request host:port address and listen for requests.

 *  Your Flume AvroSink should be pointed to this address.

 *

 *  Usage: FlumeEventCount <host> <port>

 *    <host> is the host the Flume receiver will be started on - a receiver

 *           creates a server and listens for flume events.

 *    <port> is the port the Flume receiver will listen on.

 *

 *  To run this example:

 *    `$ bin/run-example org.apache.spark.examples.streaming.FlumeEventCount <host> <port> `

 */

object FlumeEventCount {

  def main(args: Array[String]) {

    if (args.length < 2) {

      System.err.println(

        "Usage: FlumeEventCount <host> <port>")

      System.exit(1)

    }

    StreamingExamples.setStreamingLogLevels()

    val Array(host, IntParam(port)) = args

    val batchInterval = Milliseconds(2000)

    // Create the context and set the batch size

    val sparkConf = new SparkConf().setAppName("FlumeEventCount")

    val ssc = new StreamingContext(sparkConf, batchInterval)

    // Create a flume stream

    val stream = FlumeUtils.createStream(ssc, host, port, StorageLevel.MEMORY_ONLY_SER_2)

    // Print out the count of events received from this server in each batch

    stream.count().map(cnt => "Received " + cnt + " flume events." ).print()

    ssc.start()

    ssc.awaitTermination()

  }

}

第二种是轮训主动向flume拿数据

package org.apache.spark.examples.streaming

import org.apache.spark.SparkConf

import org.apache.spark.storage.StorageLevel

import org.apache.spark.streaming._

import org.apache.spark.streaming.flume._

import org.apache.spark.util.IntParam

import java.net.InetSocketAddress

/**

 *  Produces a count of events received from Flume.

 *

 *  This should be used in conjunction with the Spark Sink running in a Flume agent. See

 *  the Spark Streaming programming guide for more details.

 *

 *  Usage: FlumePollingEventCount <host> <port>

 *    `host` is the host on which the Spark Sink is running.

 *    `port` is the port at which the Spark Sink is listening.

 *

 *  To run this example:

 *    `$ bin/run-example org.apache.spark.examples.streaming.FlumePollingEventCount [host] [port] `

 */

object FlumePollingEventCount {

  def main(args: Array[String]) {

    if (args.length < 2) {

      System.err.println(

        "Usage: FlumePollingEventCount <host> <port>")

      System.exit(1)

    }

    StreamingExamples.setStreamingLogLevels()

    val Array(host, IntParam(port)) = args

    val batchInterval = Milliseconds(2000)

    // Create the context and set the batch size

    val sparkConf = new SparkConf().setAppName("FlumePollingEventCount")

    val ssc = new StreamingContext(sparkConf, batchInterval)

    // Create a flume stream that polls the Spark Sink running in a Flume agent

    val stream = FlumeUtils.createPollingStream(ssc, host, port)

    // Print out the count of events received from this server in each batch

    stream.count().map(cnt => "Received " + cnt + " flume events." ).print()

    ssc.start()

    ssc.awaitTermination()

  }

}

spark streaming中使用flume数据源的更多相关文章

Spark Streaming中向flume拉取数据
在这里看到的解决方法 https://issues.apache.org/jira/browse/SPARK-1729 请是个人理解,有问题请大家留言. 其实本身flume是不支持像KAFKA一样的发 ...
Spark Streaming中的操作函数分析
根据Spark官方文档中的描述,在Spark Streaming应用中,一个DStream对象可以调用多种操作,主要分为以下几类 Transformations Window Operations J ...
Spark Streaming中的操作函数讲解
Spark Streaming中的操作函数讲解根据根据Spark官方文档中的描述,在Spark Streaming应用中,一个DStream对象可以调用多种操作,主要分为以下几类 Transform ...
spark streaming中维护kafka偏移量到外部介质
spark streaming中维护kafka偏移量到外部介质以kafka偏移量维护到redis为例. redis存储格式使用的数据结构为string,其中key为topic:partition, ...
Spark Streaming中动态Batch Size实现初探
本期内容 : BatchDuration与 Process Time 动态Batch Size Spark Streaming中有很多算子,是否每一个算子都是预期中的类似线性规律的时间消耗呢? 例如: ...
flink和spark Streaming中的Back Pressure
Spark Streaming的back pressure 在讲flink的back pressure之前,我们先讲讲Spark Streaming的back pressure.Spark Strea ...
spark streaming中使用checkpoint
从官方的Programming Guides中看到的我理解streaming中的checkpoint有两种,一种指的是metadata的checkpoint,用于恢复你的streaming:一种是r ...
Spark Streaming数据限流简述
Spark Streaming对实时数据流进行分析处理,源源不断的从数据源接收数据切割成一个个时间间隔进行处理: 流处理与批处理有明显区别,批处理中的数据有明显的边界.数据规模已知:而流处理数 ...
Apache Spark 2.2.0 中文文档 - Spark Streaming 编程指南 | ApacheCN
Spark Streaming 编程指南概述一个入门示例基础概念依赖初始化 StreamingContext Discretized Streams (DStreams)(离散化流) Inp ...

随机推荐

C++ 迭代器基础介绍
C++ 迭代器基础介绍迭代器提供对一个容器中的对象的访问方法,并且定义了容器中对象的范围.迭代器就如同一个指针.事实上,C++的指针也是一种迭代器.但是,迭代器不仅仅是指针,因此你不能认为他们一定 ...
繁华模拟赛 vicent的字符串
#include<iostream> #include<cstdio> #include<string> #include<cstring> #incl ...
字符串模拟赛T1
// source code from laekov for c0x17 #define PRID "bxjl" #include <cstdio> #include ...
[置顶] Android应用开发之版本更新你莫愁
传送门 ☞ 轮子的专栏 ☞ 转载请注明 ☞ http://blog.csdn.net/leverage_1229 今天我们学习如何实现Android应用的自动更新版本功能,这是在各种语言编写的应用中都 ...
Powershell学习之道-文件夹共享及磁盘映射
导读在Linux环境下,我们很轻易就能得心应手地通过命令操作一切事物,在Windows下,Powershell也算是后起之秀,提供大量的cmdlet以及c#的横向拓展.下面将由小编带领大家通过Pow ...
[BZOJ1941][Sdoi2010]Hide and Seek
[BZOJ1941][Sdoi2010]Hide and Seek 试题描述小猪iPig在PKU刚上完了无聊的猪性代数课,天资聪慧的iPig被这门对他来说无比简单的课弄得非常寂寞,为了消除寂寞感,他 ...
sql注入之你问我答小知识
/*每日更新,珍惜少年时博客*/ 1.问:为啥order by 是排序.而在注入当中后面加的却不是字段而是数字捏? 答:第N个字段嘛.order by 5就是第五个字段,如果5存在,6不存在.就说明只 ...
Dom lesson1
<!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title> ...
56. 2种方法判断二叉树是不是平衡二叉树[is balanced tree]
[本文链接] http://www.cnblogs.com/hellogiser/p/is-balanced-tree.html [题目] 输入一棵二叉树的根结点,判断该树是不是平衡二叉树.如果某二叉 ...
JsRender语法
{{:#data.Name}} 或 {{:Name}} 直接显示html格式{{>#data.Name}} 或 {{>Name}} 转义显示html字符 if else {{if bool ...

spark streaming中使用flume数据源

spark streaming中使用flume数据源的更多相关文章

随机推荐

热门专题