在struct streaming提供了一个类,用来监听流的启动、停止、状态更新

StreamingQueryListener

实例化:StreamingQueryListener 后需要实现3个函数:

abstract class StreamingQueryListener {

import StreamingQueryListener._

/**
* Called when a query is started.
* @note This is called synchronously with
* [[org.apache.spark.sql.streaming.DataStreamWriter `DataStreamWriter.start()`]],
* that is, `onQueryStart` will be called on all listeners before
* `DataStreamWriter.start()` returns the corresponding [[StreamingQuery]]. Please
* don't block this method as it will block your query.
* @since 2.0.0
*/
def onQueryStarted(event: QueryStartedEvent): Unit /**
* Called when there is some status update (ingestion rate updated, etc.)
*
* @note This method is asynchronous. The status in [[StreamingQuery]] will always be
* latest no matter when this method is called. Therefore, the status of [[StreamingQuery]]
* may be changed before/when you process the event. E.g., you may find [[StreamingQuery]]
* is terminated when you are processing `QueryProgressEvent`.
* @since 2.0.0
*/
def onQueryProgress(event: QueryProgressEvent): Unit /**
* Called when a query is stopped, with or without error.
* @since 2.0.0
*/
def onQueryTerminated(event: QueryTerminatedEvent): Unit
}
onQueryStarted:结构化流启动的时候异步回调
onQueryProgress:查询过程中的状态发生更新时候的异步回调
onQueryTerminated:查询结束实时的异步回调

上面这些内容有什么作用?
一般在流处理中添加任务告警时候能用到。比如在onQueryStarted中判断是不是有满足告警的条件 , 如果有的话,就发送邮件告警或者钉钉告警灯
那么在告警信息中我们就可以根据其中的exception获取报错具体详情,然后一并发送到邮件中

@InterfaceStability.Evolving
class QueryTerminatedEvent private[sql](
val id: UUID,
val runId: UUID,
val exception: Option[String]) extends Event

最后,附上一个使用的小例子:

/**
* Created by angel
*/
object Test {
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder
.appName("IQL")
.master("local[4]")
.enableHiveSupport()
.getOrCreate()
spark.sparkContext.setLogLevel("WARN") // Save the code as demo-StreamingQueryManager.scala
// Start it using spark-shell
// $ ./bin/spark-shell -i demo-StreamingQueryManager.scala // Register a StreamingQueryListener to receive notifications about state changes of streaming queries
import org.apache.spark.sql.streaming.StreamingQueryListener
val myQueryListener = new StreamingQueryListener {
import org.apache.spark.sql.streaming.StreamingQueryListener._
def onQueryTerminated(event: QueryTerminatedEvent): Unit = {
println(s"Query ${event.id} terminated")
} def onQueryStarted(event: QueryStartedEvent): Unit = {
println(s"Query ${event.id} started")
}
def onQueryProgress(event: QueryProgressEvent): Unit = {
println(s"Query ${event.progress.name} process")
}
}
spark.streams.addListener(myQueryListener) import org.apache.spark.sql.streaming._
import scala.concurrent.duration._ // Start streaming queries // Start the first query
val q4s = spark.readStream.
format("rate").
load.
writeStream.
format("console").
trigger(Trigger.ProcessingTime(.seconds)).
option("truncate", false).
start // Start another query that is slightly slower
val q10s = spark.readStream.
format("rate").
load.
writeStream.
format("console").
trigger(Trigger.ProcessingTime(.seconds)).
option("truncate", false).
start // Both queries run concurrently
// You should see different outputs in the console
// q4s prints out 4 rows every batch and twice as often as q10s
// q10s prints out 10 rows every batch /*
-------------------------------------------
Batch: 7
-------------------------------------------
+-----------------------+-----+
|timestamp |value|
+-----------------------+-----+
|2017-10-27 13:44:07.462|21 |
|2017-10-27 13:44:08.462|22 |
|2017-10-27 13:44:09.462|23 |
|2017-10-27 13:44:10.462|24 |
+-----------------------+-----+ -------------------------------------------
Batch: 8
-------------------------------------------
+-----------------------+-----+
|timestamp |value|
+-----------------------+-----+
|2017-10-27 13:44:11.462|25 |
|2017-10-27 13:44:12.462|26 |
|2017-10-27 13:44:13.462|27 |
|2017-10-27 13:44:14.462|28 |
+-----------------------+-----+ -------------------------------------------
Batch: 2
-------------------------------------------
+-----------------------+-----+
|timestamp |value|
+-----------------------+-----+
|2017-10-27 13:44:09.847|6 |
|2017-10-27 13:44:10.847|7 |
|2017-10-27 13:44:11.847|8 |
|2017-10-27 13:44:12.847|9 |
|2017-10-27 13:44:13.847|10 |
|2017-10-27 13:44:14.847|11 |
|2017-10-27 13:44:15.847|12 |
|2017-10-27 13:44:16.847|13 |
|2017-10-27 13:44:17.847|14 |
|2017-10-27 13:44:18.847|15 |
+-----------------------+-----+
*/ // Stop q4s on a separate thread
// as we're about to block the current thread awaiting query termination
import java.util.concurrent.Executors
import java.util.concurrent.TimeUnit.SECONDS
def queryTerminator(query: StreamingQuery) = new Runnable {
def run = {
println(s"Stopping streaming query: ${query.id}")
query.stop
}
}
import java.util.concurrent.TimeUnit.SECONDS
// Stop the first query after 10 seconds
Executors.newSingleThreadScheduledExecutor.
scheduleWithFixedDelay(queryTerminator(q4s), , * , SECONDS)
// Stop the other query after 20 seconds
Executors.newSingleThreadScheduledExecutor.
scheduleWithFixedDelay(queryTerminator(q10s), , * , SECONDS) // Use StreamingQueryManager to wait for any query termination (either q1 or q2)
// the current thread will block indefinitely until either streaming query has finished
spark.streams.awaitAnyTermination // You are here only after either streaming query has finished
// Executing spark.streams.awaitAnyTermination again would return immediately // You should have received the QueryTerminatedEvent for the query termination // reset the last terminated streaming query
spark.streams.resetTerminated // You know at least one query has terminated // Wait for the other query to terminate
spark.streams.awaitAnyTermination assert(spark.streams.active.isEmpty) println("The demo went all fine. Exiting...") // leave spark-shell
System.exit()
}
}

小例子

struct streaming中的监听器StreamingQueryListener的更多相关文章

  1. spark streaming中使用checkpoint

    从官方的Programming Guides中看到的 我理解streaming中的checkpoint有两种,一种指的是metadata的checkpoint,用于恢复你的streaming:一种是r ...

  2. Spark Streaming中向flume拉取数据

    在这里看到的解决方法 https://issues.apache.org/jira/browse/SPARK-1729 请是个人理解,有问题请大家留言. 其实本身flume是不支持像KAFKA一样的发 ...

  3. Spark Streaming中的操作函数分析

    根据Spark官方文档中的描述,在Spark Streaming应用中,一个DStream对象可以调用多种操作,主要分为以下几类 Transformations Window Operations J ...

  4. spark streaming中维护kafka偏移量到外部介质

    spark streaming中维护kafka偏移量到外部介质 以kafka偏移量维护到redis为例. redis存储格式 使用的数据结构为string,其中key为topic:partition, ...

  5. 在web.xml中配置监听器来控制ioc容器生命周期

    5.整合关键-在web.xml中配置监听器来控制ioc容器生命周期 原因: 1.配置的组件太多,需保障单实例 2.项目停止后,ioc容器也需要关掉,降低对内存资源的占用. 项目启动创建容器,项目停止销 ...

  6. 在Java Web程序中使用监听器可以通过以下两种方法

    之前学习了很多涉及servlet的内容,本小结我们说一下监听器,说起监听器,编过桌面程序和手机App的都不陌生,常见的套路都是拖一个控件,然后给它绑定一个监听器,即可以对该对象的事件进行监听以便发生响 ...

  7. Button 在布局文件中定义监听器,文字阴影,自定义图片,代码绘制样式,添加音效的方法

    1.Button自己在xml文件中绑定监听器 <LinearLayout xmlns:android="http://schemas.android.com/apk/res/andro ...

  8. Kafka:ZK+Kafka+Spark Streaming集群环境搭建(十六)Structured Streaming中ForeachSink的用法

    Structured Streaming默认支持的sink类型有File sink,Foreach sink,Console sink,Memory sink. ForeachWriter实现: 以写 ...

  9. Spark Streaming中的操作函数讲解

    Spark Streaming中的操作函数讲解 根据根据Spark官方文档中的描述,在Spark Streaming应用中,一个DStream对象可以调用多种操作,主要分为以下几类 Transform ...

随机推荐

  1. 修改hosts文件 解决coursera可以登录但无法播放视频的问题

    我们经常为了学习或者了解一些领域的知识为访问国外的网站,但是在国内,很多优秀的网站都被封锁了.在GFW(墙)的几种封锁方式中,有一种就是DNS污染,GFW会对域名解析过程进行干扰,使得某些被干扰的域名 ...

  2. 移动站Web开发图片自适应两种常见情况解决方案

    本文主要说的是Web中图片根据手机屏幕大小自适应居中显示,图片自适应两种常见情况解决方案.开始吧 在做配合手机客户端的Web wap页面时,发现文章对图片显示的需求有两种特别重要的情况,一是对于图集, ...

  3. 文件名后面加(1).text

    ; //在重复名称后加(序号) while (File.Exists(path)) { if (path.Contains(").")) { int start = path.La ...

  4. 使用webpack + momentjs时, 需要注意的问题

    注意开发HTML页面charset, 如是不是utf-8, 比如是shift_jis,  一般会在webpack里用插件EncodingPlugin把开发的utf-8格式转码成shift_jis格式 ...

  5. 《Redis高阶应用》讲座总结

    数据结构延展 常用数据结构:String,Hash,List,Set,Sorted Set(不聊这些) 高级数据结构:Bitmaps,hyperloglog,GEO 单机拓展到分布式 为什么要分区:性 ...

  6. 浅谈RPC框架

    RPC(Remote Promote Call) RPC(Remote Promote Call):一种进程间通信方式.允许像调用本地服务一样调用远程服务. RPC框架的主要目标就是让远程服务调用更简 ...

  7. SQL Server 2005 实现数据库同步备份 过程--结果---分析

    数据库复制:   简单来说,数据库复制就是由两台服务器,主服务器和备份服务器,主服务器修改后,备份服务器自动修改. 复制的模式有两种:推送模式和请求模式,推送模式是主服务器修改后,自动发给备份服务器, ...

  8. ubuntu系统搭建samba服务

    安装samba服务 # apt-get install samba 创建一个samba服务的分享目录 # mkdir /share 创建一个samba服务限制的用户及组 #useradd public ...

  9. visio连接线随形状移动自动伸缩

    粘附可保持形状和连接线彼此依附.粘附打开时,可在移动形状时保持连接线跟着一起移动.粘附关闭时,移动形状时连接线将不会跟着移动. 1.在“视图”选项卡上的“视觉帮助​​”组中,单击对话框启动器 . 2. ...

  10. mysql数据库:mysql初识

      1.什么是数据库 *****    存放数据的仓库    已学习的文件操作的缺陷        1.IO操作 效率问题        2.多用户竞争数据        3.网络访问        ...