kafka + spark Streaming + Tranquility Server发送数据到druid
花了很长时间尝试druid官网上说的Tranquility嵌入代码进行实时发送数据到druid,结果失败了,各种各样的原因造成了失败,现在还没有找到原因,在IDEA中可以跑起,放到线上就死活不行,有成功了的同仁希望贴个链接供我来学习学习;后来又尝试了从kafka实时发送到druid,还是有些错误(现在已经解决, 后面再记录一下);最后没办法呀,使用Tranquility Server呗 _ _!
Tranquility Server的配置和启动请移步:https://github.com/druid-io/tranquility/blob/master/docs/server.md
(一)、在启动了自己定制的server之后可以利用druid bin目录下的generate-example-metrics生成测试数据 (定制的server.json如下)
server.json的配置
{
"dataSources" : {
"reynold_metrics" : {
"spec" : {
"dataSchema" : {
"dataSource" : "reynold_metrics",
"parser" : {
"type" : "string",
"parseSpec" : {
"timestampSpec" : {
"column" : "timestamp",
"format" : "auto"
},
"dimensionsSpec" : {
"dimensions" : [],
"dimensionExclusions" : [
"timestamp",
"value"
]
},
"format" : "json"
}
},
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "hour",
"queryGranularity" : "none"
},
"metricsSpec" : [
{
"type" : "count",
"name" : "count"
},
{
"name" : "value_sum",
"type" : "doubleSum",
"fieldName" : "value"
},
{
"fieldName" : "value",
"name" : "value_min",
"type" : "doubleMin"
},
{
"type" : "doubleMax",
"name" : "value_max",
"fieldName" : "value"
}
]
},
"tuningConfig" : {
"type" : "realtime",
"maxRowsInMemory" : "",
"intermediatePersistPeriod" : "PT10M",
"windowPeriod" : "PT10M"
}
},
"properties" : {
"task.partitions" : "",
"task.replicants" : ""
}
}
},
"properties" : {
"zookeeper.connect" : "reynold-master:2181,reynold-slave02:2181,reynold-slave03:2181",
"druid.discovery.curator.path" : "/druid/discovery",
"druid.selectors.indexing.serviceName" : "druid/overlord",
"http.port" : "",
"http.threads" : ""
}
}
(二)、创建kafka的topic并往里面发送数据
删除topic:kafka-topics --delete --topic reynold --zookeeper localhost:
创建topic:kafka-topics --create --topic reynold --zookeeper localhost: --partitions --replication-factor
消费数据:kafka-console-consumer --topic reynold --zookeeper localhost: --from-beginning
生产数据:kafka-console-producer --broker-list reynold-master: --topic reynold
{"count": 1, "value_min": 74.0, "timestamp": "2017-03-09T02:34:24.000Z", "value_max": 74.0, "metricType": "request/latency", "server": "www5.example.com", "http_method": "GET", "value_sum": 74.0, "http_code": "200", "unit": "milliseconds", "page": "/"}
{"count": 1, "value_min": 75.0, "timestamp": "2017-03-09T02:34:24.000Z", "value_max": 75.0, "metricType": "request/latency", "server": "www5.example.com", "http_method": "GET", "value_sum": 75.0, "http_code": "200", "unit": "milliseconds", "page": "/list"}
{"count": 1, "value_min": 143.0, "timestamp": "2017-03-09T02:38:06.000Z", "value_max": 143.0, "metricType": "request/latency", "server": "www2.example.com", "http_method": "GET", "value_sum": 143.0, "http_code": "200", "unit": "milliseconds", "page": "/"}
(三)、使用spark streaming消费kafka中的数据并通过http发送到druid
object SparkDruid { val kafkaParam = Map[String, String](
"metadata.broker.list" -> "reynold-master:9092,reynold-slave01:9092,reynold-slave02:9092,reynold-slave03:9092",
"auto.offset.reset" -> "smallest"
) def main(args: Array[String]): Unit = {
val sparkContext = new SparkContext(new SparkConf().setMaster("local[4]").setAppName("SparkDruid"))
val ssc = new StreamingContext(sparkContext, Seconds(3))
val topic: String = "reynold" //消费的 topic 名字
val topics: Set[String] = Set(topic) //创建 stream 时使用的 topic 名字集合 var kafkaStream: InputDStream[(String, String)] = null kafkaStream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParam, topics) kafkaStream.map(msg => msg._2).foreachRDD { rdd =>
rdd.foreach(strJson => Https.post("http://reynold-master:8200/v1/post/zcx_metrics", strJson))
} ssc.start()
ssc.awaitTermination()
}
}
Https类如下:
import java.io.InputStreamReader import com.google.common.io.CharStreams
import org.apache.http.client.methods.{HttpGet, HttpPost}
import org.apache.http.entity.StringEntity
import org.apache.http.impl.client.HttpClients /** * 通过http请求的方式,可以
* 1. 向druid里面发送数据
* 2. 提供一些查询的druid的方法
* 3. 顺带查询hbase数据的方法
*/
object Https {
private val httpClient = HttpClients.createDefault() def get(url: String): String = {
val _get = new HttpGet(url)
val resp = httpClient.execute(_get)
try {
if (resp.getStatusLine.getStatusCode != 200) {
throw new RuntimeException("error: " + resp.getStatusLine)
}
CharStreams.toString(new InputStreamReader(resp.getEntity.getContent))
} finally {
resp.close()
}
} //既可以发送数据,也可以请求数据(以结果的形式返回)
def post(url: String, content: String): String = {
val _post = new HttpPost(url)
_post.setHeader("Content-Type", "application/json")
_post.setEntity(new StringEntity(content,"utf-8"))
val resp = httpClient.execute(_post)
try {
CharStreams.toString(new InputStreamReader(resp.getEntity.getContent))
} finally {
resp.close()
} } object MapTypeRef extends com.fasterxml.jackson.core.`type`.TypeReference[Map[String, Any]] object ListMapTypeRef extends com.fasterxml.jackson.core.`type`.TypeReference[List[Map[String, Any]]] def queryHBase(sql: String): List[Map[String, Any]] = {
//将request为json格式
val request = new String(Mapper.mapper.writeValueAsBytes(Map(
"action" -> "query",
"sql" -> sql
)))
//发送json格式的请求
val resp = post("http://reynold-master:8209/api", request)
val rs = Mapper.mapper.readValue[Map[String, Any]](resp, MapTypeRef)
//val rs = Mapper.mapper.readValue[Map[String, Any]](resp, classOf[Map[String, Any]]) 这种方式也可以
rs("result").asInstanceOf[List[Map[String, Any]]]
} def queryDruid(json: String): String = {
post("http://reynold-master:18082/druid/v2", json)
} private def getDruid(path: String): String = {
get("http://reynold-master:18082/druid/v2" + path)
} def druidDataSources(): List[String] = {
Mapper.mapper.readValue[List[String]](getDruid("/datasources"), ListMapTypeRef)
} def druidDimension(datasource: String): String = {
getDruid(s"/datasources/$datasource/dimensions")
} def druidMetrics(datasource: String): String = {
getDruid(s"/datasources/$datasource/metrics")
} def main(args: Array[String]): Unit = {
println(queryHBase("select * from user_tags limit 2"))
} }
Mapper.scala
package com.donews.util import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.module.scala.DefaultScalaModule /**
* Created by reynold
*/ object Mapper {
val mapper = new ObjectMapper()
mapper.registerModule(DefaultScalaModule)
}
注意在pom文件中添加下面的依赖:
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.3.3</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.module</groupId>
<artifactId>jackson-module-scala_2.11</artifactId>
<version>2.4.5</version>
</dependency>
kafka + spark Streaming + Tranquility Server发送数据到druid的更多相关文章
- Kafka:ZK+Kafka+Spark Streaming集群环境搭建(十一)定制一个arvo格式文件发送到kafka的topic,通过Structured Streaming读取kafka的数据
将arvo格式数据发送到kafka的topic 第一步:定制avro schema: { "type": "record", "name": ...
- Spark Streaming的容错和数据无丢失机制
spark是迭代式的内存计算框架,具有很好的高可用性.sparkStreaming作为其模块之一,常被用于进行实时的流式计算.实时的流式处理系统必须是7*24运行的,同时可以从各种各样的系统错误中恢复 ...
- Kafka:ZK+Kafka+Spark Streaming集群环境搭建(十)安装hadoop2.9.0搭建HA
如何搭建配置centos虚拟机请参考<Kafka:ZK+Kafka+Spark Streaming集群环境搭建(一)VMW安装四台CentOS,并实现本机与它们能交互,虚拟机内部实现可以上网.& ...
- Kafka:ZK+Kafka+Spark Streaming集群环境搭建(九)安装kafka_2.11-1.1.0
如何搭建配置centos虚拟机请参考<Kafka:ZK+Kafka+Spark Streaming集群环境搭建(一)VMW安装四台CentOS,并实现本机与它们能交互,虚拟机内部实现可以上网.& ...
- Kafka:ZK+Kafka+Spark Streaming集群环境搭建(二)安装hadoop2.9.0
如何搭建配置centos虚拟机请参考<Kafka:ZK+Kafka+Spark Streaming集群环境搭建(一)VMW安装四台CentOS,并实现本机与它们能交互,虚拟机内部实现可以上网.& ...
- demo2 Kafka+Spark Streaming+Redis实时计算整合实践 foreachRDD输出到redis
基于Spark通用计算平台,可以很好地扩展各种计算类型的应用,尤其是Spark提供了内建的计算库支持,像Spark Streaming.Spark SQL.MLlib.GraphX,这些内建库都提供了 ...
- Kafka:ZK+Kafka+Spark Streaming集群环境搭建(十三)kafka+spark streaming打包好的程序提交时提示虚拟内存不足(Container is running beyond virtual memory limits. Current usage: 119.5 MB of 1 GB physical memory used; 2.2 GB of 2.1 G)
异常问题:Container is running beyond virtual memory limits. Current usage: 119.5 MB of 1 GB physical mem ...
- Kafka:ZK+Kafka+Spark Streaming集群环境搭建(八)安装zookeeper-3.4.12
如何搭建配置centos虚拟机请参考<Kafka:ZK+Kafka+Spark Streaming集群环境搭建(一)VMW安装四台CentOS,并实现本机与它们能交互,虚拟机内部实现可以上网.& ...
- Apache Kafka + Spark Streaming Integration
1.目标 为了构建实时应用程序,Apache Kafka - Spark Streaming Integration是最佳组合.因此,在本文中,我们将详细了解Kafka中Spark Streamin ...
随机推荐
- pip速度慢解决办法
pip速度慢解决办法 sudo pip3 config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple 注意加不加sudo是 ...
- linux环境java程序cpu爆表问题查证
1.top命令查找导致cup爆表的进程 2. top -H -p10832 (10832是Java进程的PID)命令找出了具体的线程 3.使用用命令 jstack 10832> jstack.t ...
- Opencv中的轮廓(不全)
1.初识轮廓 为了准确,要使用二值化图像.在寻找轮廓之前,要进行阈值化处理,或者Canny边界检测. 查找轮廓的函数会修改原始图像.如果你在找到轮廓之后还想使用原始图像的话,你应该将原始图像存储到其他 ...
- Windows下将Python源代码.py文件封装成exe可执行文件方法
安装pyinstaller cmd中使用pip安装 pip install pyinstaller 同时会自动安装pywin32(pip真慢50M这里就走20KB),可以进行切换为国内源进行提速. 就 ...
- format 可以用 * 星号
procedure TForm1.FormCreate(Sender: TObject); var s:string; a:integer; b:Single; begin a:=; b:=108.4 ...
- PAT Basic 1070 结绳(25) [排序,贪⼼]
题目 给定⼀段⼀段的绳⼦,你需要把它们串成⼀条绳.每次串连的时候,是把两段绳⼦对折,再如下图所示套接在⼀起.这样得到的绳⼦⼜被当成是另⼀段绳⼦,可以再次对折去跟另⼀段绳⼦串连.每次串 连后,原来两段绳 ...
- 内存管理-MRC
MRC内存管理 环境:先关闭arc模式,选中项目->build Settings
- Half of UK 10-year-olds own a smartphone
1. preposition n. 介词 pronoun n. 代词 2. despite /preposition. (1) used to say that something happens ...
- sql plus笔记
指令请走这边 因为sql plus缓冲区有限 所以要查看输出有时会不太方便 使用spool语句将输出写入文件 sql>spool 要保存的完整路径 ; ; ; sql output; ; ; s ...
- Maven依赖三板大斧
一:问题出现场景 记得有一次,面试时候面试官问了个问题,来哥们,“你们项目是maven搭建哈,你的项目里如果出现架包冲突了,你们怎么解决的?”. 我:......,装作很淡定,我们是通过报错,定位哪个 ...