我们采用亚马逊emr构建的集群，用hive查询的时候报错，FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask，查看了下面的参数，挺有帮助的我是设置了这个参数set hive.tez.auto.reducer.parallelism=true;

Tez内存优化

1、AM、Container大小设置

tez.am.resource.memory.mb

参数说明：Set tez.am.resource.memory.mb tobe the same as yarn.scheduler.minimum-allocation-mb the YARNminimum container size.

hive.tez.container.size

参数说明：Set hive.tez.container.size to be the same as or a small multiple(1 or 2 times that) of YARN container size yarn.scheduler.minimum-allocation-mb but NEVER more than yarn.scheduler.maximum-allocation-mb.

2、AM、Container JVM参数设置

tez.am.launch.cmd-opts

默认值：80%*tez.am.resource.memory.mb

参数说明：一般不需要调整

hive.tez.java.ops

默认值：80%*hive.tez.container.size

参数说明：Hortonworks建议“–server –Djava.net.preferIPv4Stack=true–XX:NewRatio=8 –XX:+UseNUMA –XX:UseG1G”

tez.container.max.java.heap.fraction

默认值：0.8

参数说明：task\AM占用JVM Xmx的比例，该参数建议调整，需根据具体业务情况修改；

3、Hive内存Map Join参数设置

tez.runtime.io.sort.mb

默认值：100

参数说明：输出排序需要的内存大小。建议值：40%*hive.tez.container.size，一般不超过2G；

hive.auto.convert.join.noconditionaltask

默认值：true

参数说明：是否将多个mapjoin合并为一个，使用默认值

hive.auto.convert.join.noconditionaltask.size

默认值：

参数说明：多个mapjoin转换为1个时，所有小表的文件大小总和的最大值，这个值只是限制输入的表文件的大小，并不代表实际mapjoin时hashtable的大小。建议值：1/3* hive.tez.container.size

tez.runtime.unordered.output.buffer.size-mb

默认值：100

参数说明：Size of the buffer to use if not writing directly to disk.。建议值：10%* hive.tez.container.size

4、Container重用设置

tez.am.container.reuse.enabled

默认值：true

参数说明：Container重用开关

Mapper/Reducer优化

1、Mapper数设置

tez.grouping.min-size

默认值：50*1024*1024

参数说明：Lower bound on thesize (in bytes) of a grouped split, to avoid generating too many small splits.

tez.grouping.max-size

默认值：1024*1024*1024

参数说明：Upper bound on thesize (in bytes) of a grouped split, to avoid generating excessively largesplits.

;

2、Reducer数设置

hive.tez.auto.reducer.parallelism

默认值：false

参数说明：Turn on Tez' autoreducer parallelism feature. When enabled, Hive will still estimate data sizesand set parallelism estimates. Tez will sample source vertices' output sizesand adjust the estimates at runtime as necessary.

建议设置为true.

hive.tex.min.partition.factor

默认值：0.25

参数说明：When auto reducerparallelism is enabled this factor will be used to put a lower limit to thenumber of reducers that Tez specifies.

hive.tez.max.partition.factor

默认值：2.0

参数说明：When auto reducerparallelism is enabled this factor will be used to over-partition data inshuffle edges.

hive.exec.reducers.bytes.per.reducer

默认值：256,000,000

参数说明：Sizeper reducer. The default in Hive 0.14.0 and earlier is 1 GB, that is, if theinput size is 10 GB then 10 reducers will be used. In Hive 0.14.0 and later thedefault is 256 MB, that is, if the input size is 1 GB then 4 reducers willbe used.

以下公式确认Reducer个数：

Max(1, Min(hive.exec.reducers.max [1009], ReducerStage estimate/hive.exec.reducers.bytes.per.reducer))x hive.tez.max.partition.factor [2]

3、Shuffle参数设置

tez.shuffle-vertex-manager.min-src-fraction

默认值：0.25

参数说明：thefraction of source tasks which should complete before tasks for the currentvertex are scheduled.

tez.shuffle-vertex-manager.max-src-fraction

默认值：0.75

参数说明：oncethis fraction of source tasks have completed, all tasks on the current vertexcan be scheduled. Number of tasks ready for scheduling on the current vertexscales linearly between min-fraction and max-fraction.

例子：

hive.exec.reducers.bytes.per.reducer=1073741824;// 1gb

tez.shuffle-vertex-manager.min-src-fraction=0.25；

tez.shuffle-vertex-manager.max-src-fraction=0.75；

This indicates thatthe decision will be made between 25% of mappers finishing and 75% of mappersfinishing, provided there's at least 1Gb of data being output (i.e if 25% ofmappers don't send 1Gb of data, we will wait till at least 1Gb is sent out).

骚年希望能帮助你

hive 调优（三）tez优化的更多相关文章

【Hive】Hive笔记：Hive调优总结——数据倾斜，join表连接优化
数据倾斜即为数据在节点上分布不均,是常见的优化过程中常见的需要解决的问题.常见的Hive调优的方法:列剪裁.Map Join操作. Group By操作.合并小文件. 一.表现 1.任务进度长度为99 ...
（转）hive调优(1) coding调优
hive 调优(一)coding调优本人认为hive是很好的工具,目前支持mr,tez,spark执行引擎,有些大公司原来封装的sparksql,开发py脚本,但是目前hive支持spark引擎(不 ...
Hive调优相关
前言 Hive是由Facebook 开源用于解决海量结构化日志的数据统计,是基于Hadoop 的一个数据仓库工具,可以将结构化的数据文件映射为一张表,并提供类 SQL查询功能. 在资源有限的情况下,提 ...
【Hadoop离线基础总结】Hive调优手段
Hive调优手段最常用的调优手段 Fetch抓取 MapJoin 分区裁剪列裁剪控制map个数以及reduce个数 JVM重用数据压缩 Fetch的抓取出现原因 Hive中对某些情况的查询不 ...
【Hive六】Hive调优小结
Hive调优 Hive调优 Fetch抓取本地模式表的优化小表.大表Join 大表Join大表 MapJoin Group By Count(Distinct) 去重统计行列过滤动态分区调整 ...
Hive调优笔记
Hive调优先记录了这么多,日后如果有遇到,再补充. fetch模式 <property> <name>hive.fetch.task.conversion</name ...
(转) hive调优（2）
hive 调优(二)参数调优汇总在hive调优(一) 中说了一些常见的调优,但是觉得参数涉及不多,补充如下 1.设置合理solt数 mapred.tasktracker.map.tasks.maxi ...
hive 调优（二）参数调优汇总
在hive调优(一) 中说了一些常见的调优,但是觉得参数涉及不多,补充如下 1.设置合理solt数 mapred.tasktracker.map.tasks.maximum 每个tasktracker ...
Spark调优，性能优化
Spark调优,性能优化 1.使用reduceByKey/aggregateByKey替代groupByKey 2.使用mapPartitions替代普通map 3.使用foreachPartitio ...
Java性能优化，操作系统内核性能调优，JYM优化，Tomcat调优
文章目录 Java性能优化尽量在合适的场合使用单例尽量避免随意使用静态变量尽量避免过多过常地创建Java对象尽量使用final修饰符尽量使用局部变量尽量处理好包装类型和基本类型两者的使用场 ...

随机推荐

CentOS7通过YUM安装MySQL5.6
检查系统中的 MySQL,并删除现有的 Mysql 软件包. $ rpm -qa | grep mysql 这里如果没有返回任何东西证明没有安装任何 MySQL 相关的应用.如下图: 由于 cento ...
layui2.5 修改layuicms
雷哥layui2.5版本学习学习地址: https://www.bilibili.com/video/av59813890/?p=30 注意: 修改layuicms时注意下面是缓存的js, < ...
2.bash术语定义
2.术语定义POSIX:基于Unix的一系列操作系统可移植性的标准.Bash主要和POSIX标准第1003.1号中的<Shell和使用工具>有关.空白符:一个空格或者制表符.内部命令:在s ...
java对象的几种创建过程
java对象的创建过程 (1)用new 语句创建对象,这是最常用的创建对象方法. 下面用一个简单的存在继承关系的实例的创建,来叙述对象创建过程中的细节概括如下: 执行顺序:(优先级从高到低.)静态代 ...
转载： Ubuntu 在命令下，安装中文环境的方法。
转载: https://blog.csdn.net/zhangchao19890805/article/details/52743380 安装英文版ubuntu,在打开含有中文字符文件时会乱码,有需要 ...
问题：com.alibaba.dubbo.rpc.RpcException: Failed to invoke ......
个人解决流程: 一看到这个问题,下意识想到了是dubbo远程连接的问题,可能是dubbo本身的问题,于是在虚拟机上另外一台dubbo能正常脸上的服务器上重新尝试,还是报相同的错误,并且在dubbo-a ...
maven 依赖包找不到（转）
1,手动添加jar包例: maven在集成Oracle驱动的时候从远程仓库下载不下来ojdbc14 报missing artifact com.oracle:ojdbc14:jar:10.2.0.3 ...
ASP.NET c# 实验日记（1）
第一次写有一些紧张,以前学过html,c语言,vb,c#等语言.也自己翻过有关javascript的书,现在的目的是怎么把学习经验写的更具结构化和条理化,大佬勿喷. 在一个集成开发平台里第一步就是新建 ...
spring cloud eureka注册原理-注册失败填坑
写在前面我们知道Eureka分为两部分,Eureka Server和Eureka Client.Eureka Server充当注册中心的角色,Eureka Client相对于Eureka Serve ...
BZOJ 3876 统一下界上下界费用流
//Mcmf LargeDumpling #include<iostream> #include<cstdio> #include<cstdlib> #includ ...

hive 调优（三）tez优化

Tez内存优化

Mapper/Reducer优化

hive 调优（三）tez优化的更多相关文章

随机推荐

热门专题