【原创】大叔经验分享(18)hive2.0以后通过beeline执行sql没有进度信息
一 问题
在hive1.2中使用hive或者beeline执行sql都有进度信息,但是升级到hive2.0以后,只有hive执行sql还有进度信息,beeline执行sql完全silence,在等待结果的过程中完全不知道执行到哪了
1 hive执行sql过程(有进度信息)
hive> select count(1) from test_table;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20181227162003_bd82e3e2-2736-42b4-b1da-4270ead87e4d
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1544593827645_22873, Tracking URL = http://rm1:8088/proxy/application_1544593827645_22873/
Kill Command = /export/App/hadoop-2.6.1/bin/hadoop job -kill job_1544593827645_22873
2018-12-27 16:20:27,650 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 116.9 sec
MapReduce Total cumulative CPU time: 1 minutes 56 seconds 900 msec
Ended Job = job_1544593827645_22873
MapReduce Jobs Launched:
Stage-Stage-1: Map: 29 Reduce: 1 Cumulative CPU: 116.9 sec HDFS Read: 518497 HDFS Write: 197 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 56 seconds 900 msec
OK
104
Time taken: 24.437 seconds, Fetched: 1 row(s)
2 beeline执行sql过程(无进度信息)
0: jdbc:hive2://thrift1:10000> select count(1) from test_table;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
+------+--+
| c0 |
+------+--+
| 104 |
+------+--+
1 row selected (23.965 seconds)
二 代码分析
hive执行sql的详细过程详见:https://www.cnblogs.com/barneywill/p/10185168.html
hive中执行sql最终都会调用到Driver.run,run会调用execute,下面直接看execute代码:
org.apache.hadoop.hive.ql.Driver
public int execute(boolean deferClose) throws CommandNeedRetryException {
...
if (jobs > 0) {
logMrWarning(mrJobs);
console.printInfo("Query ID = " + queryId);
console.printInfo("Total jobs = " + jobs);
}
...
private void logMrWarning(int mrJobs) {
if (mrJobs <= 0 || !("mr".equals(HiveConf.getVar(conf, ConfVars.HIVE_EXECUTION_ENGINE)))) {
return;
}
String warning = HiveConf.generateMrDeprecationWarning();
LOG.warn(warning);
warning = "WARNING: " + warning;
console.printInfo(warning);
// Propagate warning to beeline via operation log.
OperationLog ol = OperationLog.getCurrentOperationLog();
if (ol != null) {
ol.writeOperationLog(LoggingLevel.EXECUTION, warning + "\n");
}
}
可见在hive命令中看到的进度信息是通过console.printInfo输出的;
注意到一个细节,在beeline中虽然没有进度信息,但是有一个warning信息:
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
这个warning信息是通过如下代码输出的:
OperationLog ol = OperationLog.getCurrentOperationLog();
if (ol != null) {
ol.writeOperationLog(LoggingLevel.EXECUTION, warning + "\n");
}
所以如果让beeline执行sql也有进度信息,就要通过相同的方式输出;
三 hive进度信息位置
熟悉的进度信息在这里:
org.apache.hadoop.hive.ql.Driver
public int execute(boolean deferClose) throws CommandNeedRetryException {
...
console.printInfo("Query ID = " + queryId);
console.printInfo("Total jobs = " + jobs); private TaskRunner launchTask(Task<? extends Serializable> tsk, String queryId, boolean noName,
String jobname, int jobs, DriverContext cxt) throws HiveException {
...
console.printInfo("Launching Job " + cxt.getCurJobNo() + " out of " + jobs);
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
private void setNumberOfReducers() throws IOException {
ReduceWork rWork = work.getReduceWork();
// this is a temporary hack to fix things that are not fixed in the compiler
Integer numReducersFromWork = rWork == null ? 0 : rWork.getNumReduceTasks(); if (rWork == null) {
console
.printInfo("Number of reduce tasks is set to 0 since there's no reduce operator");
} else {
if (numReducersFromWork >= 0) {
console.printInfo("Number of reduce tasks determined at compile time: "
+ rWork.getNumReduceTasks());
} else if (job.getNumReduceTasks() > 0) {
int reducers = job.getNumReduceTasks();
rWork.setNumReduceTasks(reducers);
console
.printInfo("Number of reduce tasks not specified. Defaulting to jobconf value of: "
+ reducers);
} else {
if (inputSummary == null) {
inputSummary = Utilities.getInputSummary(driverContext.getCtx(), work.getMapWork(), null);
}
int reducers = Utilities.estimateNumberOfReducers(conf, inputSummary, work.getMapWork(),
work.isFinalMapRed());
rWork.setNumReduceTasks(reducers);
console
.printInfo("Number of reduce tasks not specified. Estimated from input data size: "
+ reducers); }
console
.printInfo("In order to change the average load for a reducer (in bytes):");
console.printInfo(" set " + HiveConf.ConfVars.BYTESPERREDUCER.varname
+ "=<number>");
console.printInfo("In order to limit the maximum number of reducers:");
console.printInfo(" set " + HiveConf.ConfVars.MAXREDUCERS.varname
+ "=<number>");
console.printInfo("In order to set a constant number of reducers:");
console.printInfo(" set " + HiveConf.ConfVars.HADOOPNUMREDUCERS
+ "=<number>");
}
}
大部分都在下边这个类里:
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper
public void jobInfo(RunningJob rj) {
if (ShimLoader.getHadoopShims().isLocalMode(job)) {
console.printInfo("Job running in-process (local Hadoop)");
} else {
if (SessionState.get() != null) {
SessionState.get().getHiveHistory().setTaskProperty(queryState.getQueryId(),
getId(), Keys.TASK_HADOOP_ID, rj.getID().toString());
}
console.printInfo(getJobStartMsg(rj.getID()) + ", Tracking URL = "
+ rj.getTrackingURL());
console.printInfo("Kill Command = " + HiveConf.getVar(job, HiveConf.ConfVars.HADOOPBIN)
+ " job -kill " + rj.getID());
}
} private MapRedStats progress(ExecDriverTaskHandle th) throws IOException, LockException {
...
StringBuilder report = new StringBuilder();
report.append(dateFormat.format(Calendar.getInstance().getTime())); report.append(' ').append(getId());
report.append(" map = ").append(mapProgress).append("%, ");
report.append(" reduce = ").append(reduceProgress).append('%');
...
String output = report.toString();
...
console.printInfo(output);
... public static String getJobEndMsg(JobID jobId) {
return "Ended Job = " + jobId;
}
看起来改动工作量不小,哈哈
【原创】大叔经验分享(18)hive2.0以后通过beeline执行sql没有进度信息的更多相关文章
- 【原创】大叔经验分享(89)docker启动openjdk执行jmap报错
docker启动openjdk后,可以查看进程 # docker exec -it XXX jps 10 XXX.jar 可见启动的java进程id一直为10,然后可以执行jvm命令,比如 # doc ...
- 【原创】经验分享:一个小小emoji尽然牵扯出来这么多东西?
前言 之前也分享过很多工作中踩坑的经验: 一个线上问题的思考:Eureka注册中心集群如何实现客户端请求负载及故障转移? [原创]经验分享:一个Content-Length引发的血案(almost.. ...
- 【原创】大叔经验分享(12)如何程序化kill提交到spark thrift上的sql
spark 2.1.1 hive正在执行中的sql可以很容易的中止,因为可以从console输出中拿到当前在yarn上的application id,然后就可以kill任务, WARNING: Hiv ...
- 【原创】大叔经验分享(20)spark job之间会停顿几分钟
今天遇到一个问题,spark应用中在一个循环里执行sql,每个sql都会向一张表写入数据,比如 insert overwrite table test_table partition(dt) sele ...
- 【原创】大叔经验分享(13)spark运行报错WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1.
本地运行spark报错 18/12/18 12:56:55 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting ...
- 【原创】大叔经验分享(33)hive select count为0
hive建表后直接将数据文件拷贝到table目录下,select * 可以查到数据,但是select count(1) 一直返回0,这个是因为hive中有个配置 hive.stats.autogath ...
- 【原创】大叔经验分享(24)hive metastore的几种部署方式
hive及其他组件(比如spark.impala等)都会依赖hive metastore,依赖的配置文件位于hive-site.xml hive metastore重要配置 hive.metastor ...
- 【原创】大叔经验分享(11)python引入模块报错ImportError: No module named pandas numpy
python应用通常需要一些库,比如numpy.pandas等,安装也很简单,直接通过pip # pip install numpyRequirement already satisfied: num ...
- 【原创】大叔经验分享(7)创建hive表时格式如何选择
常用格式 textfile 需要定义分隔符,占用空间大,读写效率最低,非常容易发生冲突(分隔符)的一种格式,基本上只有需要导入数据的时候才会使用,比如导入csv文件: ROW FORMAT DELIM ...
随机推荐
- Redis入门之增删改查等常用命令总结
Redis是用C语言实现的,一般来说C语言实现的程序"距离"操作系统更近,执行速度相对会更快. Redis使用了单线程架构,预防了多线程可能产生的竞争问题. 作者对于Redis源代 ...
- ORM框架的前世今生
目录 一.ORM简介二.ORM的工作原理三.ORM的优缺点四.常见的ORM框架 一.ORM简介 ORM(Object Relational Mapping)对象关系映射,一般指持久化数据和实体对象的映 ...
- 爬虫之selenium模块
Selenium 简介 selenium最初是一个自动化测试工具,而爬虫中使用它主要是为了解决requests无法直接执行JavaScript代码的问题 selenium本质是通过驱动浏览器,完全模拟 ...
- openstack基础:网络
Neutron 功能 Neutron 为整个 OpenStack 环境提供网络支持,包括二层交换,三层路由,负载均衡,防火墙和 *** 等.Neutron 提供了一个灵活的框架,通过配置,无论是开源还 ...
- 好久好久没写,,百度API逆地址解析以及删除指定marker
百度地图Api中 除覆盖物有两个方法:map.removeOverlay()或者 map.clearOverlays(),其中 clearOverlays()方法一次移除所有的覆盖物removeOve ...
- rsync用法详细解释
提要 熟悉 rsync 的功能及其特点 掌握 rsync 语法及常用选项的功能 掌握 rsync 命令的三种基本使用方法 掌握如何筛选 rsync 的传输目标 掌握使用 rsync 进行镜像和增量备份 ...
- Flutter之Simulation
Simulation 可以理解成动画进行的函数. Flutter中自带了有下面几种. BouncingScrollSimulationBounce弹性的滚动模拟 ClampedSimulation C ...
- CodeForces 461B Appleman and T
题目链接:http://codeforces.com/contest/461/problem/B 题目大意: 给定一课树,树上的节点有黑的也有白的,有这样一种分割树的方案,分割后每个子图只含有一个黑色 ...
- DAY17、常用模块
一.time模块 1.时间戳(timestamp):time.time() #可以作为数据的唯一标识 是相对于1970-1-1-0:0:0时间插值 2.延迟线程的运行:time.sleep ...
- gitignore的使用
gitignore的作用是忽略文件的提交,被加入到gitignore中的文件不会被提交到文件服务器 通常需要添加到.gitignore的文件有: (1)缓存相关文件,编译相关文件,运行时相关文件 (2 ...