Out of memory due to hash maps used in map-side aggregation解决办法

在运行一个group by的sql时，抛出以下错误信息：

Task with the most failures(4):

-----
Task ID:
task_201411191723_723592_m_000004

URL:
http://DDS0204.dratio:50030/taskdetails.jsp?jobid=job_201411191723_723592&tipid=task_201411191723_723592_m_000004

Possible error:
Out of memory due to hash maps used in map-side aggregation.

Solution:
Currently hive.map.aggr.hash.percentmemory is set to 0.25. Try setting it to a lower value. i.e 'set hive.map.aggr.hash.percentmemory = 0.125;'
-----
Diagnostic Messages for this Task:

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 12 Reduce: 1 Cumulative CPU: 164.04 sec HDFS Read: 0 HDFS Write: 0 FAIL

Total MapReduce CPU Time Spent: 2 minutes 44 seconds 40 msec

原因是在map端进行了聚合，超过hash map的大小

终极解决办法：set hive.map.aggr=false 或者更改为子sql 或者尝试更改以下参数

备注：

与mapjoin和map aggregate相关的优化参数有：

①.hive.map.aggr 是否关闭关掉map端的aggregation,sethive.map.aggr=false就关闭map端的聚合了

②.hive.map.aggr.hash.min.reduction如果内存Map超过一定大小，就关闭MapAggregation功能，比如set hive.map.aggr.hash.min.reduction=0.5;

③.hive.map.aggr.hash.percentmemory

当内存的Map大小，占到jsm配置的Map进程的25%(设置sethive.map.aggr.hash.percentmemory = 0.25)的时候(默认是50%)，就将这个数据flush到reducer去，以释放内存Map的空间。

④.hive.groupby.skewindata数据据倾斜的时候进行负载均衡，当hive.groupby.skewindata=true，生成的查询计划会有两个 mr job。第一个mr中，每个map的输出结果集合会随机分布到reduce中，reduce做部分聚合操作。第二个mr再根据上个mr的数据结果按照group by key分布到 reduce中完成最终的聚合操作。

参考：

http://dev.bizo.com/2013/02/map-side-aggregations-in-apache-hive.html

Out of memory due to hash maps used in map-side aggregation解决办法的更多相关文章

Hive ERROR: Out of memory due to hash maps used in map-side aggregation
什么时候hive在运行大数据量的统计查询语句时.常常会出现以下OOM错误.详细错误提演示样例如以下: Possible error: Out of memory due to hash maps us ...
mysql 错误 ERROR 1372 (HY000): Password hash should be a 41-digit hexadecimal number 解决办法
MySQL创建用户(包括密码)时,会提示ERROR 1372 (HY000): Password hash should be a 41-digit hexadecimal number: 问题原因: ...
Flume启动报错[ERROR - org.apache.flume.sink.hdfs. Hit max consecutive under-replication rotations (30); will not continue rolling files under this path due to under-replication解决办法（图文详解）
前期博客 Flume自定义拦截器(Interceptors)或自带拦截器时的一些经验技巧总结(图文详解) 问题详情 -- ::, (SinkRunner-PollingRunner-Default ...
用链表和数组实现HASH表，几种碰撞冲突解决方法
Hash算法中要解决一个碰撞冲突的办法,后文中描述了几种解决方法.下面代码中用的是链式地址法,就是用链表和数组实现HASH表. he/*hash table max size*/ #define HA ...
Eclipse 关于“The type * is not accessible due to restriction on required library”问题的解决办法
The type * is not accessible due to restriction on required library”的错误, 意思是所需要的类库由于受限制无法访问. 解决办法: 1 ...
PHP运行错最有效解决办法Fatal error: Out of memory (allocated 786432) (tried to allocate 98304 bytes) in H:\freehost\zhengbao2\web\includes\lib_common.php on line 744
原文 PHP运行错最有效解决办法Fatal error: Out of memory (allocated 6029312) Fatal error: Out of memory (allocated ...
Android 启动模拟器是出现“Failed to allocate memory: 8”错误提示的原因及解决办法
某天,Android 启动模拟器是出现“Failed to allocate memory: 8”错误,模拟器无法启动,如下图: 原因:设置了不正确AVD显示屏模式,4.0版默认的模式为WVGA800 ...
服务器上运行程序Out of memory 解决办法
****** 服务器上跑过程序经常能遇到out of memory 这个问题,下面是我经常在实验室碰到的解决方法. 1.使用命令nvidia-smi,看到GPU显存被占满: 2.尝试使用 ps aux ...
ACPI:Memory错误解决办法
Linux系统装在vmware12中,打开虚拟机时报错,报错内容大概如下: ACPI:memory_hp:Memory online failed for 0x100000000 - 0x400000 ...

随机推荐

Python3: 对两个字符串进行匹配
Python里一共有三种字符串匹配方式,用于判断一个字符串是否包含另一个字符串.比如判断字符串“HelloWorld”中是否包含“World”: def stringCompare(str1, str ...
React16版本的新特性
React16版本更新的新特性 2018年05月03日 21:27:56 阅读数:188 1.render方法的返回值类型:New render return types 之前的方式: class A ...
.netcore centos环境搭建实战
步骤 1. 安装VMware Workstation 下载地址:https://my.vmware.com/cn/web/vmware/info/slug/desktop_end_user_compu ...
ubuntu自带的ibus输入法问题解决方法
ubuntu自带的ibus有点问题,输入字的时候不知道是个什么模式. 在网上搜到一个解决方法. 终端下执行: ibus-daemon -drx 然后切换到拼音输入法,就正常了. 写下作为记录.
ES6 export，import报错
问题描述: 现有两个文件: profile.js const firstName = 'Michael'; const lastName = 'Jackson'; const year = 2018; ...
《python核心编程第二版》第4章习题
4–1. Python 对象.与所有 Python 对象有关的三个属性是什么?请简单的
第十九章 Python os模块，pathlib 判断文件是目录还是文件
OS模块 os.path.abspath() :返回path规范化的绝对路径 import sys import os BASE_DIR = os.path.dirname(os.path.dirna ...
使用Scrapy自带的ImagesPipeline下载图片，并对其进行分类。
ImagesPipeline是scrapy自带的类,用来处理图片(爬取时将图片下载到本地)用的. 优势: 将下载图片转换成通用的JPG和RGB格式避免重复下载缩略图生成图片大小过滤异步下载 . ...
Leetcode 686.重复叠加字符串匹配
重复叠加字符串匹配给定两个字符串 A 和 B, 寻找重复叠加字符串A的最小次数,使得字符串B成为叠加后的字符串A的子串,如果不存在则返回 -1. 举个例子,A = "abcd", ...
在 C/C++ 中使用 TensorFlow 预训练好的模型—— 间接调用 Python 实现
现在的深度学习框架一般都是基于 Python 来实现,构建.训练.保存和调用模型都可以很容易地在 Python 下完成.但有时候,我们在实际应用这些模型的时候可能需要在其他编程语言下进行,本文将通过 ...

Out of memory due to hash maps used in map-side aggregation解决办法

Out of memory due to hash maps used in map-side aggregation解决办法的更多相关文章

随机推荐

热门专题