body { margin: 0 auto; font: 13px / 1 Helvetica, Arial, sans-serif; color: rgba(68, 68, 68, 1); padding: 5px }
h1, h2, h3, h4 { color: rgba(17, 17, 17, 1); font-weight: 400 }
h1, h2, h3, h4, h5, p { margin-bottom: 16px; padding: 0 }
h1 { font-size: 28px }
h2 { font-size: 22px; margin: 20px 0 6px }
h3 { font-size: 21px }
h4 { font-size: 18px }
h5 { font-size: 16px }
a { color: rgba(0, 153, 255, 1); margin: 0; padding: 0; vertical-align: baseline }
a:link, a:visited { text-decoration: none }
a:hover { text-decoration: underline }
ul, ol { padding: 0; margin: 0 }
li { line-height: 24px; margin-left: 30px }
li ul, li ul { margin-left: 24px }
ul, ol { font-size: 14px; line-height: 20px; max-width: 98% }
p { font-size: 14px; line-height: 20px; max-width: 98%; margin-top: 3px }
pre { padding: 0 4px; max-width: 98%; white-space: pre; word-wrap: normal; overflow: auto; font-family: Consolas, Monaco, Andale Mono, monospace; line-height: 1.5; font-size: 13px; border: 1px solid rgba(221, 221, 221, 1); background-color: rgba(247, 247, 247, 1); border-radius: 3px }
code { font-family: Consolas, Monaco, Andale Mono, monospace; line-height: 1.5; font-size: 13px; border: 1px solid rgba(221, 221, 221, 1); background-color: rgba(247, 247, 247, 1); border-radius: 3px }
code pref { color: rgba(255, 0, 0, 1) }
pre code { border: 0 }
aside { display: block; float: right; width: 390px }
blockquote { border-left: 0.5em solid rgba(64, 170, 83, 1); padding: 0 2em; margin-left: 0; max-width: 98% }
blockquote cite { font-size: 14px; line-height: 20px; color: rgba(191, 191, 191, 1) }
blockquote cite:before { content: "— " }
blockquote p { color: rgba(102, 102, 102, 1); max-width: 98% }
hr { height: 1px; border-top: 1px dashed rgba(0, 102, 204, 1); border-right: none; border-bottom: none; border-left: none }
button, input, select, textarea { font-size: 100%; margin: 0; vertical-align: baseline; *vertical-align: middle }
button, input { line-height: normal; *overflow: visible }
{ border: 0; padding: 0 }
button, input[type="button"], input[type="reset"], input[type="submit"] { cursor: pointer; -webkit-appearance: button }
input[type="checkbox"], input[type="radio"] { cursor: pointer }
input:not([type="image"]), textarea { -webkit-box-sizing: content-box; -moz-box-sizing: content-box; box-sizing: content-box }
input[type="search"] { -webkit-appearance: textfield; -webkit-box-sizing: content-box; -moz-box-sizing: content-box; box-sizing: content-box }
{ -webkit-appearance: none }
label, input, select, textarea { font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; font-weight: normal; line-height: normal; margin-bottom: 18px }
input[type="checkbox"], input[type="radio"] { cursor: pointer; margin-bottom: 0 }
input[type="text"], input[type="password"], textarea, select { display: inline-block; width: 210px; padding: 4px; font-size: 13px; font-weight: normal; line-height: 18px; height: 18px; color: rgba(128, 128, 128, 1); border: 1px solid rgba(204, 204, 204, 1); -webkit-border-radius: 3px; -moz-border-radius: 3px; border-radius: 3px }
select, input[type="file"] { height: 27px; line-height: 27px }
textarea { height: auto }
{ color: rgba(191, 191, 191, 1) }
{ color: rgba(191, 191, 191, 1) }
input[type="text"], input[type="password"], select, textarea { -webkit-transition: border linear 0.2s, box-shadow linear 0.2s; -moz-transition: border linear 0.2s, box-shadow linear 0.2s; transition: border 0.2s linear, box-shadow 0.2s linear; -webkit-box-shadow: inset 0 1px 3px rgba(0, 0, 0, 0.1); -moz-box-shadow: inset 0 1px 3px rgba(0, 0, 0, 0.1); box-shadow: inset 0 1px 3px rgba(0, 0, 0, 0.1) }
input[type="text"]:focus, input[type="password"]:focus, textarea:focus { outline: none; border-color: rgba(82, 168, 236, 0.8); -webkit-box-shadow: inset 0 1px 3px rgba(0, 0, 0, 0.1), 0 0 8px rgba(82, 168, 236, 0.6); -moz-box-shadow: inset 0 1px 3px rgba(0, 0, 0, 0.1), 0 0 8px rgba(82, 168, 236, 0.6); box-shadow: inset 0 1px 3px rgba(0, 0, 0, 0.1), 0 0 8px rgba(82, 168, 236, 0.6) }
button { display: inline-block; padding: 4px 14px; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 18px; -webkit-border-radius: 4px; -moz-border-radius: 4px; border-radius: 4px; -webkit-box-shadow: inset 0 1px 0 rgba(255, 255, 255, 0.2), 0 1px 2px rgba(0, 0, 0, 0.05); -moz-box-shadow: inset 0 1px 0 rgba(255, 255, 255, 0.2), 0 1px 2px rgba(0, 0, 0, 0.05); box-shadow: inset 0 1px rgba(255, 255, 255, 0.2), 0 1px 2px rgba(0, 0, 0, 0.05); background-color: rgba(0, 100, 205, 1); background-repeat: repeat-x; color: rgba(255, 255, 255, 1); text-shadow: 0 -1px rgba(0, 0, 0, 0.25); border-top: 1px solid rgba(0, 0, 0, 0.1); border-right: 1px solid rgba(0, 0, 0, 0.1); border-bottom: 1px solid rgba(0, 0, 0, 0.25); border-left: 1px solid rgba(0, 0, 0, 0.1); -webkit-transition: 0.1s linear all; -moz-transition: 0.1s linear all; transition: all 0.1s linear }
button:hover { color: rgba(255, 255, 255, 1); background-position: 0 -15px; text-decoration: none }
button:active { -webkit-box-shadow: inset 0 3px 7px rgba(0, 0, 0, 0.15), 0 1px 2px rgba(0, 0, 0, 0.05); -moz-box-shadow: inset 0 3px 7px rgba(0, 0, 0, 0.15), 0 1px 2px rgba(0, 0, 0, 0.05); box-shadow: inset 0 3px 7px rgba(0, 0, 0, 0.15), 0 1px 2px rgba(0, 0, 0, 0.05) }
{ padding: 0; border: 0 }
table { border-spacing: 0; border: 1px solid rgba(204, 204, 204, 1) }
td, th { border: 1px solid rgba(204, 204, 204, 1); padding: 5px }
pre .literal, pre .comment, pre .template_comment, pre .diff .header, pre .javadoc { color: rgba(0, 128, 0, 1) }
pre .keyword, pre .css .rule .keyword, pre .winutils, pre .javascript .title, pre .nginx .title, pre .subst, pre .request, pre .status { color: rgba(0, 0, 255, 1); font-weight: bold }
pre .number, pre .hexcolor, pre .python .decorator, pre .ruby .constant { color: rgba(0, 0, 255, 1) }
pre .string, pre .tag .value, pre .phpdoc, pre .tex .formula { color: rgba(221, 17, 68, 1) }
pre .title, pre .id { color: rgba(153, 0, 0, 1); font-weight: bold }
pre .javascript .title, pre .lisp .title, pre .clojure .title, pre .subst { font-weight: normal }
pre .class .title, pre .haskell .type, pre .vhdl .literal, pre .tex .command { color: rgba(68, 85, 136, 1); font-weight: bold }
pre .tag, pre .tag .title, pre .rules .property, pre .django .tag .keyword { color: rgba(0, 0, 128, 1); font-weight: normal }
pre .attribute, pre .variable, pre .lisp .body { color: rgba(0, 128, 128, 1) }
pre .regexp { color: rgba(0, 153, 38, 1) }
pre .class { color: rgba(68, 85, 136, 1); font-weight: bold }
pre .symbol, pre .ruby .symbol .string, pre .lisp .keyword, pre .tex .special, pre .prompt { color: rgba(153, 0, 115, 1) }
pre .built_in, pre .lisp .title, pre .clojure .built_in { color: rgba(0, 134, 179, 1) }
pre .preprocessor, pre .pi, pre .doctype, pre .shebang, pre .cdata { color: rgba(153, 153, 153, 1); font-weight: bold }
pre .deletion { background: rgba(255, 221, 221, 1) }
pre .addition { background: rgba(221, 255, 221, 1) }
pre .diff .change { background: rgba(0, 134, 179, 1) }
pre .chunk { color: rgba(170, 170, 170, 1) }
pre .markdown .header { color: rgba(136, 0, 0, 1); font-weight: bold }
pre .markdown .blockquote { color: rgba(136, 136, 136, 1) }
pre .markdown .link_label { color: rgba(136, 136, 255, 1) }
pre .markdown .strong { font-weight: bold }
pre .markdown .emphasis { font-style: italic }
pref { color: rgba(255, 0, 0, 1) }

微信公众号:苏言论

理论联系实际,畅言技术与生活。

LLAP是hive 2.0.0版本引入的新特性,hive官方称为(Live long and process),hortonworks公司的CDH称为(low-latency analytical processing),其实它们都是一样的,都是实现将数据预取、缓存到基于yarn运行的守护进程中,降低和减少系统IO和与HDFS DataNode的交互,具体的特性细节参考官方文档 Hive llap (如果链接未生效,在文章后面的链接中获取),但是由于版本更新频繁和官方文档的维护不力因素,很多地方和使用上让人概念不清、正确和错误分不清,特别是用CDH这样的集成套件,很多细节被忽略,本文一一来细说和总结各类问题。

1 hive llap该怎么部署

分两种情况:
1. 如果使用的hadoop yarn版本是3.1.0以下(不包含3.1.0),需要使用 Apache slider 来部署,因为在hadoop yarn 3.1.0之前,yarn本身不支持长时间运行的服务(long running services),而slider组件是可以打包、管理和部署长时间运行的服务到yarn上运行的。
2. 如果使用的hadoop yarn版本是3.1.0及以上,完全不需要slider组件了,因为从 hadoop yarn 3.1.0 开始,yarn已经合并支持long running services了,slider项目也停止更新了。

因此,部署时要考虑使用的组件版本,再确定部署方案,对于开源项目,使用的版本和环境很重要,如果组件本身已经提供特性和功能,并且一直处于维护状态,建议尽量不要使用其它组件替代,替代成本和异常问题远比想象的高。

当然,如果你使用的是CDH类的集成套件,套件本身已经集成封装,每个套件版本会提供相应的支持,这些内容就无需多虑了。

2 注意事项

  1. llap 目前只支持tez引擎,需要先部署好hive和tez;
  2. 由于llap依赖zookeeper和hadoop组件,如果集群开启了安全认证(比如kerberos),llap也要进行安全认证相关配置,使用到的配置参数如:
<property>
<name>hive.llap.daemon.keytab.file</name>
<value>/etc/security/keytabs/demo.keytab</value>
</property> <property>
<name>hive.llap.daemon.service.principal</name>
<value>demo/sywu@sywukeb</value>
</property> <property>
<name>hive.llap.task.scheduler.am.registry.keytab.file</name>
<value>/etc/security/keytabs/demo.service.keytab</value>
</property> <property>
<name>hive.llap.task.scheduler.am.registry.principal</name>
<value>demo/sywu@sywukeb</value>
</property>

另外随着程序的更新,官方文档上的参数参差不齐,有些参数需要阅读和从代码中查找

  1. 一些网上资料和CDH文档的部署方式使用hive用户和权限运行llap服务,hive的权限很大,如果集群很大,使用的人很多,对权限控制粒度要求高,不适合使用这种方式,应该考虑多个llap服务,为不同的用户或者项目组开放不同的llap服务。
  2. 由于LLAP所具有的优势(预取、缓存),对于大的集群,考虑面向不同场景和用户使用不同的llap服务,提高查询命中率,提升性能,我认为是合理的。

3 llap初始化

以下以hadoop 3.1.0,hive 3.1.0,tez 0.9.1,集群无安全认证为例,首先配置llap;

<property>
<name>hive.llap.execution.mode</name>
<value>all</value>
</property>
<property>
<name>hive.execution.mode</name>
<value>llap</value>
</property> <property>
<name>hive.llap.daemon.service.hosts</name>
<value>@sywu-llap01</value>
</property> <property>
<name>hive.llap.daemon.memory.per.instance.mb</name>
<value>25600</value>
</property> <property>
<name>hive.llap.daemon.num.executors</name>
<value>8</value>
</property> <property>
<name>hive.llap.zk.sm.connectionString</name>
<value>sywu01:2181</value>
</property> <property>
<name>hive.llap.zk.registry.namespace</name>
<value>hive_sywu01</value>
</property> <property>
<name>hive.llap.zk.registry.user</name>
<value>sywu</value>
</property>

hive.llap.daemon.service.hosts 配置llap 实例名称,这个名称和启动的名称相同。然后打包和准备部署llap 服务的文件;

hive --service llap --name sywu-llap01 --instances 4 --size 60g --loglevel info --cache 30g --executors 10 --iothreads 10 --args " -XX:+UseG1GC -XX:+ResizeTLAB -XX:+UseNUMA -XX:-ResizePLAB"

这个命令会在当前目录生成llap 服务文件夹,里面包含启动llap的脚本,llap的相关配置和jar包;

$ ll
total 184M
-rwxr-xr-x 1 sywu01 sywu01 184M Oct 20 15:28 llap-20Oct2020.tar.gz
-rwxr-xr-x 1 sywu01 sywu01 273 Oct 20 15:28 run.sh
-rwxr-xr-x 1 sywu01 sywu01 2.0K Oct 20 15:29 Yarnfile

执行run.sh 文件启动llap服务。到此llap部署到yarn上并运行。

LLAPSTATUS
--------------------------------------------------------------------------------
LLAP Application running with ApplicationId=application_1602234006497_0592
--------------------------------------------------------------------------------
LLAP Application running with ApplicationId=application_1602234006497_0592
-------------------------------------------------------------------------------- {
"amInfo" : {
"appName" : "sywu-llap01",
"appType" : "yarn-service",
"appId" : "application_1602234006497_0592"
},
"state" : "RUNNING_ALL",
"desiredInstances" : 4,
"liveInstances" : 4,
"launchingInstances" : 0,
"appStartTime" : 0,
"runningThresholdAchieved" : false,
"runningInstances" : [ {
"hostname" : "sywu01",
"containerId" : "container_e48_1602234006497_0592_01_000013",
"statusUrl" : "http://sywu01:15002/status",
"webUrl" : "http://sywu01:15002",
"rpcPort" : 45795,
"mgmtPort" : 15004,
"shufflePort" : 15551,
"yarnContainerExitStatus" : 0
}, {
"hostname" : "sywu02",
"containerId" : "container_e48_1602234006497_0592_01_000005",
"statusUrl" : "http://sywu02:15002/status",
"webUrl" : "http://sywu02:15002",
"rpcPort" : 46845,
"mgmtPort" : 15004,
"shufflePort" : 15551,
"yarnContainerExitStatus" : 0
}, {
"hostname" : "sywu01",
"containerId" : "container_e48_1602234006497_0592_01_000008",
"statusUrl" : "http://sywu01:15002/status",
"webUrl" : "http://sywu01:15002",
"rpcPort" : 33382,
"mgmtPort" : 15004,
"shufflePort" : 15551,
"yarnContainerExitStatus" : 0
}, {
"hostname" : "sywu03",
"containerId" : "container_e48_1602234006497_0592_01_000010",
"statusUrl" : "http://sywu03:15002/status",
"webUrl" : "http://sywu03:15002",
"rpcPort" : 43520,
"mgmtPort" : 15004,
"shufflePort" : 15551,
"yarnContainerExitStatus" : 0
} ]
}

4 性能测试

到此,hive已经有mr和tez引擎,并支持llap,使用hortonworks公司开源的 hive-testbench项目 生成1Tb数据;

 $ ./tpcds-setup.sh 1000

用query10.sql 中的关联脚本查询测试;

select
cd_gender,cd_marital_status,cd_education_status,count(*) cnt1,cd_purchase_estimate,count(*) cnt2,cd_credit_rating,count(*) cnt3,cd_dep_count,count(*) cnt4,cd_dep_employed_count,count(*) cnt5,cd_dep_college_count,count(*) cnt6
from
customer c,customer_address ca,customer_demographics
where
c.c_current_addr_sk = ca.ca_address_sk and
ca_county in ('Fillmore County','McPherson County','Bonneville County','Boone County','Brown County') and
cd_demo_sk = c.c_current_cdemo_sk and
exists (select *
from store_sales,date_dim
where c.c_customer_sk = ss_customer_sk and
ss_sold_date_sk = d_date_sk and
d_year = 2000 and
d_moy between 3 and 3+3) and
(exists (select *
from web_sales,date_dim
where c.c_customer_sk = ws_bill_customer_sk and
ws_sold_date_sk = d_date_sk and
d_year = 2000 and
d_moy between 3 ANd 3+3) or
exists (select *
from catalog_sales,date_dim
where c.c_customer_sk = cs_ship_customer_sk and
cs_sold_date_sk = d_date_sk and
d_year = 2000 and
d_moy between 3 and 3+3))
group by cd_gender,
cd_marital_status,
cd_education_status,
cd_purchase_estimate,
cd_credit_rating,
cd_dep_count,
cd_dep_employed_count,
cd_dep_college_count
order by cd_gender,
cd_marital_status,
cd_education_status,
cd_purchase_estimate,
cd_credit_rating,
cd_dep_count,
cd_dep_employed_count,
cd_dep_college_count
limit 100;

mr 引擎执行情况;

INFO  : Query ID = sywu01_20201023113411_add04434-7382-4376-8883-26ab298b1c6f
INFO : Total jobs = 8
INFO : Starting task [Stage-24:MAPREDLOCAL] in parallel
INFO : Starting task [Stage-25:MAPREDLOCAL] in parallel
INFO : Starting task [Stage-26:MAPREDLOCAL] in parallel
INFO : Starting task [Stage-27:MAPREDLOCAL] in parallel
INFO : Launching Job 1 out of 8
INFO : Starting task [Stage-20:MAPRED] in parallel
INFO : Launching Job 2 out of 8
INFO : Starting task [Stage-14:MAPRED] in parallel
INFO : Launching Job 3 out of 8
INFO : Starting task [Stage-11:MAPRED] in parallel
INFO : Launching Job 4 out of 8
INFO : Starting task [Stage-18:MAPRED] in parallel
INFO : Starting task [Stage-17:CONDITIONAL] in parallel
INFO : Launching Job 5 out of 8
INFO : Starting task [Stage-3:MAPRED] in parallel
INFO : Launching Job 6 out of 8
INFO : Starting task [Stage-4:MAPRED] in parallel
INFO : Launching Job 7 out of 8
INFO : Starting task [Stage-5:MAPRED] in parallel
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-18: Map: 3 Cumulative CPU: 779.28 sec HDFS Read: 77140427 HDFS Write: 5298244 SUCCESS
INFO : Stage-Stage-20: Map: 350 Cumulative CPU: 4134.07 sec HDFS Read: 3203667384 HDFS Write: 193140638 SUCCESS
INFO : Stage-Stage-11: Map: 153 Reduce: 151 Cumulative CPU: 4631.57 sec HDFS Read: 886558268 HDFS Write: 46326191 SUCCESS
INFO : Stage-Stage-14: Map: 257 Reduce: 271 Cumulative CPU: 6646.95 sec HDFS Read: 1049371661 HDFS Write: 106287345 SUCCESS
INFO : Stage-Stage-3: Map: 19 Reduce: 2 Cumulative CPU: 394.45 sec HDFS Read: 351370942 HDFS Write: 1399528 SUCCESS
INFO : Stage-Stage-4: Map: 2 Reduce: 1 Cumulative CPU: 15.71 sec HDFS Read: 1415039 HDFS Write: 12296 SUCCESS
INFO : Stage-Stage-5: Map: 1 Reduce: 1 Cumulative CPU: 8.31 sec HDFS Read: 23606 HDFS Write: 7168 SUCCESS
INFO : Total MapReduce CPU Time Spent: 0 days 4 hours 36 minutes 50 seconds 340 msec
INFO : Completed executing command(queryId=sywu01_20201023113411_add04434-7382-4376-8883-26ab298b1c6f); Time taken: 838.106 seconds
INFO : OK
+------------+--------------------+-----------------------+-------+-----------------------+-------+-------------------+-------+---------------+-------+------------------------+-------+-----------------------+-------+
| cd_gender | cd_marital_status | cd_education_status | cnt1 | cd_purchase_estimate | cnt2 | cd_credit_rating | cnt3 | cd_dep_count | cnt4 | cd_dep_employed_count | cnt5 | cd_dep_college_count | cnt6 |
+------------+--------------------+-----------------------+-------+-----------------------+-------+-------------------+-------+---------------+-------+------------------------+-------+-----------------------+-------+
| F | D | 2 yr Degree | 1 | 500 | 1 | Good | 1 | 0 | 1 | 0 | 1 | 4 | 1 |
| F | D | 2 yr Degree | 1 | 500 | 1 | Good | 1 | 1 | 1 | 0 | 1 | 5 | 1 |
| F | D | 2 yr Degree | 1 | 500 | 1 | Good | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
....
| F | D | 2 yr Degree | 1 | 3500 | 1 | High Risk | 1 | 0 | 1 | 4 | 1 | 4 | 1 |
| F | D | 2 yr Degree | 1 | 3500 | 1 | High Risk | 1 | 3 | 1 | 2 | 1 | 1 | 1 |
| F | D | 2 yr Degree | 1 | 3500 | 1 | High Risk | 1 | 5 | 1 | 4 | 1 | 0 | 1 |
| F | D | 2 yr Degree | 1 | 3500 | 1 | High Risk | 1 | 5 | 1 | 6 | 1 | 6 | 1 |
| F | D | 2 yr Degree | 1 | 3500 | 1 | Low Risk | 1 | 0 | 1 | 3 | 1 | 4 | 1 |
+------------+--------------------+-----------------------+-------+-----------------------+-------+-------------------+-------+---------------+-------+------------------------+-------+-----------------------+-------+
100 rows selected (850.686 seconds)

tez 引擎执行情况;

INFO  : Query ID = sywu01_20201023175253_265d5780-7be4-47ad-ad4b-ef8154bb3842
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in parallel
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 6 .......... container SUCCEEDED 14 14 0 0 0 10
Map 7 .......... container SUCCEEDED 4 4 0 0 3 3
Map 1 .......... container SUCCEEDED 6 6 0 0 0 1
Map 9 .......... container SUCCEEDED 1 1 0 0 0 1
Reducer 5 ...... container SUCCEEDED 1 1 0 0 0 1
Map 8 .......... container SUCCEEDED 20 20 0 0 0 0
Map 10 ......... container SUCCEEDED 6 6 0 0 0 0
Reducer 11 ..... container SUCCEEDED 234 234 0 0 0 0
Map 12 ......... container SUCCEEDED 10 10 0 0 0 0
Reducer 13 ..... container SUCCEEDED 234 234 0 0 0 0
Reducer 2 ...... container SUCCEEDED 234 234 0 0 0 0
Reducer 3 ...... container SUCCEEDED 145 145 0 0 0 0
Reducer 4 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 13/13 [==========================>>] 100% ELAPSED TIME: 36.01 s
----------------------------------------------------------------------------------------------
INFO : Completed executing command(queryId=sywu01_20201023175253_265d5780-7be4-47ad-ad4b-ef8154bb3842); Time taken: 54.039 seconds
INFO : OK - Query Execution Summary
- ----------------------------------------------------------------------------------------------
- OPERATION DURATION
- ----------------------------------------------------------------------------------------------
- Compile Query 0.00s
- Prepare Plan 0.00s
- Get Query Coordinator (AM) 0.00s
- Submit Plan 1603446798.35s
- Start DAG 1.05s
- Run DAG 34.93s
- ----------------------------------------------------------------------------------------------
-
- Task Execution Summary
- ----------------------------------------------------------------------------------------------
- VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS
- ----------------------------------------------------------------------------------------------
- Map 1 16546.00 130,790 1,748 13,963,497 82,778
- Map 10 4568.00 49,540 484 27,755,681 15,784
- Map 12 3554.00 75,240 987 55,261,069 41,887
- Map 6 14520.00 131,700 5,584 6,000,000 42,697
- Map 7 13490.00 34,760 780 1,920,800 1,920,800
- Map 8 4073.00 80,900 472 106,067,119 73,854
- Map 9 3146.00 10,200 375 10,000 366
- Reducer 11 1531.00 27,450 623 15,784 157,859
- Reducer 13 1026.00 32,250 697 41,887 261,474
- Reducer 2 4577.00 267,060 4,652 575,965 22,170
- Reducer 3 4046.00 263,850 5,876 22,170 61,923
- Reducer 4 503.00 1,430 20 61,923 0
- Reducer 5 12877.00 2,360 0 82,778 3
- ---------------------------------------------------------------------------------------------- +------------+--------------------+-----------------------+-------+-----------------------+-------+-------------------+-------+---------------+-------+------------------------+-------+-----------------------+-------+
| cd_gender | cd_marital_status | cd_education_status | cnt1 | cd_purchase_estimate | cnt2 | cd_credit_rating | cnt3 | cd_dep_count | cnt4 | cd_dep_employed_count | cnt5 | cd_dep_college_count | cnt6 |
+------------+--------------------+-----------------------+-------+-----------------------+-------+-------------------+-------+---------------+-------+------------------------+-------+-----------------------+-------+
| F | D | 2 yr Degree | 1 | 500 | 1 | Good | 1 | 0 | 1 | 0 | 1 | 4 | 1 |
| F | D | 2 yr Degree | 1 | 500 | 1 | Good | 1 | 1 | 1 | 0 | 1 | 5 | 1 |
| F | D | 2 yr Degree | 1 | 500 | 1 | Good | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
....
| F | D | 2 yr Degree | 1 | 3500 | 1 | High Risk | 1 | 0 | 1 | 4 | 1 | 4 | 1 |
| F | D | 2 yr Degree | 1 | 3500 | 1 | High Risk | 1 | 3 | 1 | 2 | 1 | 1 | 1 |
| F | D | 2 yr Degree | 1 | 3500 | 1 | High Risk | 1 | 5 | 1 | 4 | 1 | 0 | 1 |
| F | D | 2 yr Degree | 1 | 3500 | 1 | High Risk | 1 | 5 | 1 | 6 | 1 | 6 | 1 |
| F | D | 2 yr Degree | 1 | 3500 | 1 | Low Risk | 1 | 0 | 1 | 3 | 1 | 4 | 1 |
+------------+--------------------+-----------------------+-------+-----------------------+-------+-------------------+-------+---------------+-------+------------------------+-------+-----------------------+-------+
100 rows selected (61.765 seconds)

tez引擎 + llap 执行情况;

INFO  : Query ID = sywu01_20201023174916_f4c7d891-7395-4720-9eb1-5bf1fd7c024c
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in parallel
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 6 .......... llap SUCCEEDED 3 3 0 0 0 0
Map 7 .......... llap SUCCEEDED 4 4 0 0 0 0
Map 1 .......... llap SUCCEEDED 6 6 0 0 0 0
Map 9 .......... llap SUCCEEDED 1 1 0 0 0 0
Reducer 5 ...... llap SUCCEEDED 1 1 0 0 0 0
Map 8 .......... llap SUCCEEDED 6 6 0 0 0 0
Map 10 ......... llap SUCCEEDED 6 6 0 0 0 0
Reducer 11 ..... llap SUCCEEDED 234 234 0 0 0 0
Map 12 ......... llap SUCCEEDED 7 7 0 0 0 0
Reducer 13 ..... llap SUCCEEDED 234 234 0 0 0 0
Reducer 2 ...... llap SUCCEEDED 234 234 0 0 0 2
Reducer 3 ...... llap SUCCEEDED 145 145 0 0 0 3
Reducer 4 ...... llap SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 13/13 [==========================>>] 100% ELAPSED TIME: 11.34 s
----------------------------------------------------------------------------------------------
INFO : Completed executing command(queryId=sywu01_20201023174916_f4c7d891-7395-4720-9eb1-5bf1fd7c024c); Time taken: 30.035 seconds
INFO : OK - Query Execution Summary
- ----------------------------------------------------------------------------------------------
- OPERATION DURATION
- ----------------------------------------------------------------------------------------------
- Compile Query 0.00s
- Prepare Plan 0.00s
- Get Query Coordinator (AM) 0.00s
- Submit Plan 1603446581.98s
- Start DAG 1.00s
- Run DAG 11.02s
- ----------------------------------------------------------------------------------------------
-
- Task Execution Summary
- ----------------------------------------------------------------------------------------------
- VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS
- ----------------------------------------------------------------------------------------------
- Map 1 2042.00 0 0 13,963,497 82,778
- Map 10 1527.00 0 0 27,755,681 15,767
- Map 12 1530.00 0 0 55,261,069 40,935
- Map 6 514.00 0 0 6,000,000 42,697
- Map 7 514.00 0 0 1,920,800 1,920,800
- Map 8 3063.00 0 0 106,067,119 59,399
- Map 9 0.00 0 0 10,000 366
- Reducer 11 1536.00 0 0 15,767 14,579
- Reducer 13 1019.00 0 0 40,935 33,694
- Reducer 2 3059.00 0 0 190,450 22,170
- Reducer 3 2540.00 0 0 22,170 14,423
- Reducer 4 278.00 0 0 14,423 0
- Reducer 5 2041.00 0 0 82,778 3
- ---------------------------------------------------------------------------------------------- +------------+--------------------+-----------------------+-------+-----------------------+-------+-------------------+-------+---------------+-------+------------------------+-------+-----------------------+-------+
| cd_gender | cd_marital_status | cd_education_status | cnt1 | cd_purchase_estimate | cnt2 | cd_credit_rating | cnt3 | cd_dep_count | cnt4 | cd_dep_employed_count | cnt5 | cd_dep_college_count | cnt6 |
+------------+--------------------+-----------------------+-------+-----------------------+-------+-------------------+-------+---------------+-------+------------------------+-------+-----------------------+-------+
| F | D | 2 yr Degree | 1 | 500 | 1 | Good | 1 | 0 | 1 | 0 | 1 | 4 | 1 |
| F | D | 2 yr Degree | 1 | 500 | 1 | Good | 1 | 1 | 1 | 0 | 1 | 5 | 1 |
| F | D | 2 yr Degree | 1 | 500 | 1 | Good | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
....
| F | D | 2 yr Degree | 1 | 3500 | 1 | High Risk | 1 | 0 | 1 | 4 | 1 | 4 | 1 |
| F | D | 2 yr Degree | 1 | 3500 | 1 | High Risk | 1 | 3 | 1 | 2 | 1 | 1 | 1 |
| F | D | 2 yr Degree | 1 | 3500 | 1 | High Risk | 1 | 5 | 1 | 4 | 1 | 0 | 1 |
| F | D | 2 yr Degree | 1 | 3500 | 1 | High Risk | 1 | 5 | 1 | 6 | 1 | 6 | 1 |
| F | D | 2 yr Degree | 1 | 3500 | 1 | Low Risk | 1 | 0 | 1 | 3 | 1 | 4 | 1 |
+------------+--------------------+-----------------------+-------+-----------------------+-------+-------------------+-------+---------------+-------+------------------------+-------+-----------------------+-------+
100 rows selected (37.668 seconds)

5 总结

可以看到,mr引擎的执行耗时(850.686 seconds)是tez引擎执行耗时(61.765 seconds)和tez引擎+llap执行耗时(37.668 seconds)的近14倍,资源使用率远远高于后者;tez引擎和tez引擎+llap确实极大的提升了查询性能,也让hive更越进一步,而这一切的代价,仅是对架构、底层的了解和认识以及组件的升级和更新能够获得的。

链接

Hive LLAP的更多相关文章

  1. 提升 Hive Query 执行效率 - Hive LLAP

    从 Hive 刚推出到现在,得益于社区对它的不断贡献,使得 Hive执行 query 效率显著提升.其中比较有代表性的功能如 Tez (将多个 job整合为一个DAG job)以及 CBO(Cost- ...

  2. hive创建orc表,使用LLAP查询

    create table if not exists test_orc( name string, age int, address string ) partitioned by (dt strin ...

  3. Hive的初识

    Hive是构建于Hadoop上的数据仓库基础框架,它提供了以下功能: 可通过SQL轻松的访问数据,从而实现数据仓库的任务.如提取/转换/加载,报告和数据分析. 对各种数据格式施加结构. 访问存储在HD ...

  4. SQL数据分析概览——Hive、Impala、Spark SQL、Drill、HAWQ 以及Presto+druid

    转自infoQ! 根据 O’Reilly 2016年数据科学薪资调查显示,SQL 是数据科学领域使用最广泛的语言.大部分项目都需要一些SQL 操作,甚至有一些只需要SQL. 本文涵盖了6个开源领导者: ...

  5. hive Getting Started

    Apache HiveThe Apache Hive™ data warehouse software facilitates reading, writing, and managing large ...

  6. Sparksql 取代 Hive?

    sparksql  hive https://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-o ...

  7. HDP Hive性能调优

    (官方文档翻译整理及总结) 一.优化数据仓库 ① Hive LLAP  是一项接近实时结果查询的技术,可用于BI工具以及网络看板的应用,能够将数据仓库的查询时间缩短到15秒之内,这样的查询称之为Int ...

  8. spark-shell操作hive

    本文是在集群已经搭建好的基础上来说的,还没有搭建好集群的小伙伴还请自行百度! 启动spark-shell之前要先启动hive metastore 和 hiveservice2 hive --servi ...

  9. Hive数据仓库工具基本架构和入门部署详解

    @ 目录 概述 定义 本质 特点 Hive与Hadoop关系 Hive与关系型数据库区别 优缺点 其他说明 架构 组成部分 数据模型(Hive数据组织形式) Metastore(元数据) Compil ...

随机推荐

  1. Java基础语法(10)-面向对象之三大特征

    title: Java基础语法(9)-面向对象之类的成员 blog: CSDN data: Java学习路线及视频 1.面向对象特征--封装 为什么需要封装?封装的作用和含义? 我要用洗衣机,只需要按 ...

  2. Aop的学习与使用

    什么是aop AOP(Aspect Oriented Programming)意为:面向切面编程,通过预编译方式和运行期动态代理实现程序功能的统一维护的一种技术.AOP是OOP的延续,是软件开发中的一 ...

  3. if else与switch for与foreach

    if...else...适用于变量判断 switch适用于常量判断(switch只判断一次,if else 判断多次) foreach只适用于集合和数组查询(foreach不支持增加删除操作) for ...

  4. 单元测试框架怎么搭?快来看看新版Junit5的这些神奇之处吧!

    为什么使用JUnit5 JUnit4被广泛使用,但是许多场景下使用起来语法较为繁琐,JUnit5中支持lambda表达式,语法简单且代码不冗余. JUnit5易扩展,包容性强,可以接入其他的测试引擎. ...

  5. ARCENGINE 10 开发遇到的一些问题

    许多版友在刚刚使用ArcGIS 10做开发的时候,都会遇到这样那样的问题.在担任实习版主的这一个多月里,看到了这么几个与开发环境相关的问题,重复被提到相当多,于是我就做了这个FAQ.Q:哪儿有10的A ...

  6. IDEA中创建父子工程与maven打包Springboot聚合工程报错程序包不存在问题处理

    公司新项目需使用java技术栈,便使用IDEA搭建了一个多SpringBoot项目的聚合工程,因为初次使用,遇到了很多问题,maven打包时各种报错,在网上查了好多终于解决了,为巩固记忆,特作此记录. ...

  7. Docker开启远程连接,本地IDEA使用docker插件连接(不认证的版本和认证的版本都有)

    前言 在学校学习的时候,要部署一个Java程序,一般是打成war包,放到服务器上的tomcat的webapp里面去: 后来SpringBoot出现内置了tomcat,就直接打成jar包,丢到服务器任何 ...

  8. VSCODE 配置eslint规则和自动修复

    全局安装eslint 打开终端,运行npm install eslint -g全局安装ESLint. vscode安装插件 vscode 扩展设置 依次点击 文件 > 首选项 > 设置 { ...

  9. 多测师讲解selenium--常用关键字归纳-_高级讲师肖sir

    常见的定位方式: 1.通过id定位 id=kw 2.通过name定位 name=wd 3.通过xpath相对路径定位:xpath=//*[@id="kw"] 4.通过两个属性值定位 ...

  10. 多测师_肖sir_git _004(版本控制器)

    gitgit 是一个开源的分布式版本控制系统,用于敏捷高效的处理任何大小的项目.git是linux torvalds 为了帮助管理linux内核开发的一个开放源码的版本控制软件.git与常用的版本控制 ...