本文介绍Druid查询数据的方式,首先我们保证数据已经成功载入。

Druid查询基于HTTP,Druid提供了查询视图,并对结果进行了格式化。

Druid提供了三种查询方式,SQL,原生JSON,CURL。

一、SQL查询

我们用wiki的数据为例

查询10条最多的页面编辑

SELECT page, COUNT(*) AS Edits
FROM wikipedia
WHERE TIMESTAMP '2015-09-12 00:00:00' <= "__time" AND "__time" < TIMESTAMP '2015-09-13 00:00:00'
GROUP BY page
ORDER BY Edits DESC
LIMIT 10

我们在Query视图中操作

会有提示

选择Smart query limit会自动限制行数

Druid还提供了命令行查询sql 可以运行bin/dsql进行操作

Welcome to dsql, the command-line client for Druid SQL.
Type "\h" for help.
dsql>

提交sql

dsql> SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
┌──────────────────────────────────────────────────────────┬───────┐
│ page │ Edits │
├──────────────────────────────────────────────────────────┼───────┤
│ Wikipedia:Vandalismusmeldung │ 33 │
│ User:Cyde/List of candidates for speedy deletion/Subpage │ 28 │
│ Jeremy Corbyn │ 27 │
│ Wikipedia:Administrators' noticeboard/Incidents │ 21 │
│ Flavia Pennetta │ 20 │
│ Total Drama Presents: The Ridonculous Race │ 18 │
│ User talk:Dudeperson176123 │ 18 │
│ Wikipédia:Le Bistro/12 septembre 2015 │ 18 │
│ Wikipedia:In the news/Candidates │ 17 │
│ Wikipedia:Requests for page protection │ 17 │
└──────────────────────────────────────────────────────────┴───────┘
Retrieved 10 rows in 0.06s.

还可以通过Http发送SQL

curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages-sql.json http://localhost:8888/druid/v2/sql

可以得到如下结果

[
{
"page": "Wikipedia:Vandalismusmeldung",
"Edits": 33
},
{
"page": "User:Cyde/List of candidates for speedy deletion/Subpage",
"Edits": 28
},
{
"page": "Jeremy Corbyn",
"Edits": 27
},
{
"page": "Wikipedia:Administrators' noticeboard/Incidents",
"Edits": 21
},
{
"page": "Flavia Pennetta",
"Edits": 20
},
{
"page": "Total Drama Presents: The Ridonculous Race",
"Edits": 18
},
{
"page": "User talk:Dudeperson176123",
"Edits": 18
},
{
"page": "Wikipédia:Le Bistro/12 septembre 2015",
"Edits": 18
},
{
"page": "Wikipedia:In the news/Candidates",
"Edits": 17
},
{
"page": "Wikipedia:Requests for page protection",
"Edits": 17
}
]

更多SQL示例

时间查询

SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted
FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00'
GROUP BY 1

分组查询

SELECT channel, page, SUM(added)
FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00'
GROUP BY channel, page
ORDER BY SUM(added) DESC

查询原始数据

SELECT user, page
FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 02:00:00' AND TIMESTAMP '2015-09-12 03:00:00'
LIMIT 5

定时查询

也可以在dsql里操作

dsql> EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;

│ DruidQueryRel(query=[{"queryType":"topN","dataSource":{"type":"table","name":"wikipedia"},"virtualColumns":[],"dimension":{"type":"default","dimension":"page","outputName":"d0","outputType":"STRING"},"metric":{"type":"numeric","metric":"a0"},"threshold":10,"intervals":{"type":"intervals","intervals":["2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.001Z"]},"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"count","name":"a0"}],"postAggregations":[],"context":{},"descending":false}], signature=[{d0:STRING, a0:LONG}]) │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Retrieved 1 row in 0.03s.

二、原生JSON查询

Druid支持基于Json的查询

{
"queryType" : "topN",
"dataSource" : "wikipedia",
"intervals" : ["2015-09-12/2015-09-13"],
"granularity" : "all",
"dimension" : "page",
"metric" : "count",
"threshold" : 10,
"aggregations" : [
{
"type" : "count",
"name" : "count"
}
]
}

把json粘贴到json 查询模式窗口

Json查询是通过向router和broker发送请求

curl -X POST '<queryable_host>:<port>/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/json' -d @<query_json_file>

Druid提供了丰富的查询方式

Aggregation查询

Timeseries查询
{
"queryType": "timeseries",
"dataSource": "sample_datasource",
"granularity": "day",
"descending": "true",
"filter": {
"type": "and",
"fields": [
{ "type": "selector", "dimension": "sample_dimension1", "value": "sample_value1" },
{ "type": "or",
"fields": [
{ "type": "selector", "dimension": "sample_dimension2", "value": "sample_value2" },
{ "type": "selector", "dimension": "sample_dimension3", "value": "sample_value3" }
]
}
]
},
"aggregations": [
{ "type": "longSum", "name": "sample_name1", "fieldName": "sample_fieldName1" },
{ "type": "doubleSum", "name": "sample_name2", "fieldName": "sample_fieldName2" }
],
"postAggregations": [
{ "type": "arithmetic",
"name": "sample_divide",
"fn": "/",
"fields": [
{ "type": "fieldAccess", "name": "postAgg__sample_name1", "fieldName": "sample_name1" },
{ "type": "fieldAccess", "name": "postAgg__sample_name2", "fieldName": "sample_name2" }
]
}
],
"intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ]
}
TopN查询
{
"queryType": "topN",
"dataSource": "sample_data",
"dimension": "sample_dim",
"threshold": 5,
"metric": "count",
"granularity": "all",
"filter": {
"type": "and",
"fields": [
{
"type": "selector",
"dimension": "dim1",
"value": "some_value"
},
{
"type": "selector",
"dimension": "dim2",
"value": "some_other_val"
}
]
},
"aggregations": [
{
"type": "longSum",
"name": "count",
"fieldName": "count"
},
{
"type": "doubleSum",
"name": "some_metric",
"fieldName": "some_metric"
}
],
"postAggregations": [
{
"type": "arithmetic",
"name": "average",
"fn": "/",
"fields": [
{
"type": "fieldAccess",
"name": "some_metric",
"fieldName": "some_metric"
},
{
"type": "fieldAccess",
"name": "count",
"fieldName": "count"
}
]
}
],
"intervals": [
"2013-08-31T00:00:00.000/2013-09-03T00:00:00.000"
]
}
GroupBy查询
{
"queryType": "groupBy",
"dataSource": "sample_datasource",
"granularity": "day",
"dimensions": ["country", "device"],
"limitSpec": { "type": "default", "limit": 5000, "columns": ["country", "data_transfer"] },
"filter": {
"type": "and",
"fields": [
{ "type": "selector", "dimension": "carrier", "value": "AT&T" },
{ "type": "or",
"fields": [
{ "type": "selector", "dimension": "make", "value": "Apple" },
{ "type": "selector", "dimension": "make", "value": "Samsung" }
]
}
]
},
"aggregations": [
{ "type": "longSum", "name": "total_usage", "fieldName": "user_count" },
{ "type": "doubleSum", "name": "data_transfer", "fieldName": "data_transfer" }
],
"postAggregations": [
{ "type": "arithmetic",
"name": "avg_usage",
"fn": "/",
"fields": [
{ "type": "fieldAccess", "fieldName": "data_transfer" },
{ "type": "fieldAccess", "fieldName": "total_usage" }
]
}
],
"intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ],
"having": {
"type": "greaterThan",
"aggregation": "total_usage",
"value": 100
}
}

Metadata查询

TimeBoundary 查询
{
"queryType" : "timeBoundary",
"dataSource": "sample_datasource",
"bound" : < "maxTime" | "minTime" > # optional, defaults to returning both timestamps if not set
"filter" : { "type": "and", "fields": [<filter>, <filter>, ...] } # optional
}
SegmentMetadata查询
{
"queryType":"segmentMetadata",
"dataSource":"sample_datasource",
"intervals":["2013-01-01/2014-01-01"]
}
DatasourceMetadata查询
{
"queryType" : "dataSourceMetadata",
"dataSource": "sample_datasource"
}

Search查询

{
"queryType": "search",
"dataSource": "sample_datasource",
"granularity": "day",
"searchDimensions": [
"dim1",
"dim2"
],
"query": {
"type": "insensitive_contains",
"value": "Ke"
},
"sort" : {
"type": "lexicographic"
},
"intervals": [
"2013-01-01T00:00:00.000/2013-01-03T00:00:00.000"
]
}

查询建议

用Timeseries和TopN替代GroupBy

取消查询

DELETE /druid/v2/{queryId}
curl -X DELETE "http://host:port/druid/v2/abc123"

查询失败

{
"error" : "Query timeout",
"errorMessage" : "Timeout waiting for task.",
"errorClass" : "java.util.concurrent.TimeoutException",
"host" : "druid1.example.com:8083"
}

三、CURL

基于Http的查询

curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages.json http://localhost:8888/druid/v2?pretty

四、客户端查询

客户端查询是基于json的

具体查看 https://druid.apache.org/libraries.html

比如python查询的pydruid

from pydruid.client import *
from pylab import plt query = PyDruid(druid_url_goes_here, 'druid/v2') ts = query.timeseries(
datasource='twitterstream',
granularity='day',
intervals='2014-02-02/p4w',
aggregations={'length': doublesum('tweet_length'), 'count': doublesum('count')},
post_aggregations={'avg_tweet_length': (Field('length') / Field('count'))},
filter=Dimension('first_hashtag') == 'sochi2014'
)
df = query.export_pandas()
df['timestamp'] = df['timestamp'].map(lambda x: x.split('T')[0])
df.plot(x='timestamp', y='avg_tweet_length', ylim=(80, 140), rot=20,
title='Sochi 2014')
plt.ylabel('avg tweet length (chars)')
plt.show()

实时流式计算整理了Druid入门指南

持续更新中~

更多实时数据分析相关博文与科技资讯,欢迎关注 “实时流式计算”

获取《Druid实时大数据分析》电子书,请在公号后台回复 “Druid”

Druid 0.17入门(4)—— 数据查询方式大全的更多相关文章

  1. Druid 0.17 入门(3)—— 数据接入指南

    在快速开始中,我们演示了接入本地示例数据方式,但Druid其实支持非常丰富的数据接入方式.比如批处理数据的接入和实时流数据的接入.本文我们将介绍这几种数据接入方式. 文件数据接入:从文件中加载批处理数 ...

  2. Druid 0.17 入门(2)—— 安装与部署

    在Druid快速入门其实已经简单的介绍过最简化配置的单节点部署,本文我们将详细描述Druid的多种部署方式,对于测试开发环境可以选用轻量的单机部署方式,而生产环境我们最好选用集群部署的方式,确保系统的 ...

  3. 8种json数据查询方式

    你有没有对“在复杂的JSON数据结构中查找匹配内容”而烦恼.这里有8种不同的方式可以做到: JsonSQL JsonSQL实现了使用SQL select语句在json数据结构中查询的功能. 例子: ? ...

  4. Dynamics CRM 2015/2016 Web API:新的数据查询方式

    今天我们来看看Web API的数据查询功能,尽管之前介绍CRUD的文章里面提到过怎么去Read数据,可是并没有详细的去深究那些细节,今天我们就来详细看看吧.事实上呢,Web API的数据查询接口也是基 ...

  5. Django之ORM数据查询方式练习

    单表查询 单表查询简单示例 # 字段 models.DateField(auto_now_add) models.DateField(auto_now) # auto_now 和auto_now_ad ...

  6. XML教程、语法手册、数据读取方式大全

    XML简单易懂教程 本文提供全流程,中文翻译.Chinar坚持将简单的生活方式,带给世人!(拥有更好的阅读体验 -- 高分辨率用户请根据需求调整网页缩放比例) 一 XML --数据格式的写法 二 Re ...

  7. [转帖]Druid介绍及入门

    Druid介绍及入门 2018-09-19 19:38:36 拿着核武器的程序员 阅读数 22552更多 分类专栏: Druid   版权声明:本文为博主原创文章,遵循CC 4.0 BY-SA版权协议 ...

  8. Hibernate 查询方式(HQL/QBC/QBE)汇总

    作为老牌的 ORM 框架,Hibernate 在推动数据库持久化层所做出的贡献有目共睹. 它所提供的数据查询方式也越来越丰富,从 SQL 到自创的 HQL,再到面向对象的标准化查询. 虽然查询方式有点 ...

  9. hibernate框架学习之数据查询(HQL)

    lHibernate共提供5种查询方式 •OID数据查询方式 •HQL数据查询方式 •QBC数据查询方式 •本地SQL查询方式 •OGN数据查询方式 OID数据查询方式 l前提:已经获取到了对象的OI ...

随机推荐

  1. google无法播放mp4 chrome无法播放h264

    写在前面 我在chrome上无法播放h264+Acc的mp4,在firefox.ie都可以播放,而且此mp4在vlc终可以正常播放. 视频链接:http://106.14.221.185:7001/p ...

  2. bypass安全狗测试学习

    搭建简单的sql注入环境 在test数据库中创建sqltest表,插入字段数据 编写存在注入的php文件 <?php $id = $_REQUEST['uid']; echo "您当前 ...

  3. php微信公众号开发curl返回false

    最近刚接触温馨公众号开发,在自定义菜单用curl请求时,碰到了一个小坑.一时半会没有解决,便去问度娘,谷歌.发现都是说$url里面有空格导致的失败. 然而我的并没有空格,一直返回false,这个时候我 ...

  4. Asp.Net Core 3.1 的启动过程5

    前言 本文主要讲的是Asp.Net Core的启动过程,帮助大家掌握应用程序的关键配置点. 1.创建项目 1.1.用Visual Studio 2019 创建WebApi项目. 这里面可以看到有两个关 ...

  5. php header() 常用content-type

    //定义编码 header( 'Content-Type:text/html;charset=utf-8 '); //Atom header('Content-type: application/at ...

  6. python学习06循环

    '''while''''''while 布尔表达式:冒号不能省略''''''1+2+3+...+10'''i=1sum1=0while i<=10: sum1+=i i+=1print(sum1 ...

  7. Libra教程之:Libra协议的关键概念

    文章目录 Libra协议 交易和状态 交易详解 账本状态详解 版本数据库 账户 账户地址 Proof 验证节点 存储 Libra协议 Libra协议是Libra区块链的基础,本文主要讲解Libra协议 ...

  8. 使用VSCode连接到IBM Cloud区块链网络

    文章目录 从IBM Cloud控制面板导出连接信息 在VSCode中创建gateway和wallet 在VSCode中提交transaction 上篇文章我们讲到怎么在IBM Cloud搭建区块链环境 ...

  9. 使用IBM Blockchain Platform extension开发你的第一个fabric智能合约

    文章目录 安装IBM Blockchain Platform extension for VS Code 创建一个智能合约项目 理解智能合约 打包智能合约 Local Fabric Ops 安装智能合 ...

  10. MYSQL 索引汇总

    1.MySQL索引类型 先分以下类,MYQL有两大类索引:聚集索引和非聚集索引(只考虑mysql innodb) 聚集索引:在有主键的情况下,主键为聚集索引,其他都是非聚集索引             ...