hiveserver 占用内存过大的问题

今天为了求解hiveserver占用内存过大的问题，特地加了hive在apache的邮件列表，讨论半天。特别说的是里面的人确实很热情啊，外国人做事确实很认真，讨论帖发的时候都狠详细。

粘出一些记录：

Did you update your JDK in last time? A java-dev told me that could be

a  issue in JDK _26

(https://forums.oracle.com/forums/thread.jspa?threadID=2309872), some

devs report a memory decrease when they use GC - flags. I'm quite not

sure, sounds for me to far away.

The stacks have a lot waitings, but I see nothing special.

- Alex

2011/12/12 王锋 <wfeng1982@163.com>:

>

> The hive log:

>

> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121840_767713480.txt

> 8159.581: [GC [PSYoungGen: 1927208K->688K(2187648K)]

> 9102425K->7176256K(9867648K), 0.0765670 secs] [Times: user=0.36 sys=0.00,

> real=0.08 secs]

> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121841_451939518.txt

> 8219.455: [GC [PSYoungGen: 1823477K->608K(2106752K)]

> 8999046K->7176707K(9786752K), 0.0719450 secs] [Times: user=0.66 sys=0.01,

> real=0.07 secs]

> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201112121842_1930999319.txt

>

> Now we have 3 hiveservers and I set the concurrent job num to 4,but the Mem

> still be so large .I'm  mad, God

>

> have other suggestions ?

>

> 在 2011-12-12 17:59:52，"alo alt" <wget.null@googlemail.com

>> 写道：

>>When you start a high-load hive query can you watch the stack-traces?

>>Its possible over the webinterface:

>>http://jobtracker:50030/stacks

>>

>>- Alex

>>

>>

>>2011/12/12 王锋 <wfeng1982@163.com>

>>>

>>> hiveserver will throw oom after several hours .

>>>

>>>

>>> At 2011-12-12 17:39:21,"alo alt" <wget.null@googlemail.com> wrote:

>>>

>>> what happen when you set xmx=2048m or similar? Did that have any negative effects for running queries?

>>>

>>> 2011/12/12 王锋 <wfeng1982@163.com>

>>>>

>>>> I have modify hive jvm args.

>>>>  the new args is -Xmx15000m -XX:NewRatio=1 -Xms2000m .

>>>>

>>>> but the memory  used by hiveserver  is still large.

>>>>

>>>>

>>>>

>>>>

>>>>

>>>> At 2011-12-12 16:20:54,"Aaron Sun" <aaron.sun82@gmail.com> wrote:

>>>>

>>>> Not from the running jobs, what I am saying is the heap size of the Hadoop really depends on the number of files, directories on the HDFS. Remove old files periodically or merge small files would bring in some performance boost.

>>>>

>>>> On the Hive end, the memory consumed also depends on the queries that are executed. Monitor the reducers of the Hadoop job, and my experiences are that reduce part could be the bottleneck here.

>>>>

>>>> It's totally okay to host multiple Hive servers on one machine.

>>>>

>>>> 2011/12/12 王锋 <wfeng1982@163.com>

>>>>>

>>>>> is the files you said  the files from runned jobs  of our system? and them  can't be so much large.

>>>>>

>>>>> why is the cause of namenode.  what are hiveserver doing   when it use so large memory?

>>>>>

>>>>> how  do you use hive? our method using hiveserver is correct?

>>>>>

>>>>> Thanks.

>>>>>

>>>>> 在 2011-12-12 14:27:09，"Aaron Sun" <aaron.sun82@gmail.com> 写道：

>>>>>

>>>>> Not sure if this is because of the number of files, since the namenode would track each of the file and directory, and blocks.

>>>>> See this one. http://www.cloudera.com/blog/2009/02/the-small-files-problem/

>>>>>

>>>>> Please correct me if I am wrong, because this seems to be more like a hdfs problem which is actually irrelevant to Hive.

>>>>>

>>>>> Thanks

>>>>> Aaron

>>>>>

>>>>> 2011/12/11 王锋 <wfeng1982@163.com>

>>>>>>

>>>>>>

>>>>>> I want to know why the hiveserver use so large memory,and where the memory has been used ?

>>>>>>

>>>>>> 在 2011-12-12 10:02:44，"王锋" <wfeng1982@163.com> 写道：

>>>>>>

>>>>>>

>>>>>> The namenode summary:

>>>>>>

>>>>>>

>>>>>>

>>>>>> the mr summary

>>>>>>

>>>>>>

>>>>>> and hiveserver:

>>>>>>

>>>>>>

>>>>>> hiveserver jvm args:

>>>>>> export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=1 -Xms15000m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParallelGC -XX:ParallelGCThreads=20 -XX:+UseParall

>>>>>> elOldGC -XX:-UseGCOverheadLimit -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"

>>>>>>

>>>>>> now we  using 3 hiveservers in the same machine.

>>>>>>

>>>>>>

>>>>>> 在 2011-12-12 09:54:29，"Aaron Sun" <aaron.sun82@gmail.com> 写道：

>>>>>>

>>>>>> how's the data look like? and what's the size of the cluster?

>>>>>>

>>>>>> 2011/12/11 王锋 <wfeng1982@163.com>

>>>>>>>

>>>>>>> Hi,

>>>>>>>

>>>>>>>     I'm one of engieer of sina.com.  We have used hive ,hiveserver several months. We have our own tasks schedule system .The system can schedule tasks running with hiveserver by jdbc.

>>>>>>>

>>>>>>>     But The hiveserver use mem very large, usally  large than 10g.   we have 5min tasks which will be  running every 5 minutes.,and have hourly tasks .total num of tasks  is 40. And we start 3 hiveserver in one linux server,and be cycle connected .

>>>>>>>

>>>>>>>     so why Memory of  hiveserver  using so large and how we do or some suggestion from you ?

>>>>>>>

>>>>>>> Thanks and Best Regards!

>>>>>>>

>>>>>>> Royce Wang

最上面 Alex发现一篇文章

https://forums.oracle.com/forums/thread.jspa?threadID=2309872说是 jdk_1.0.26存在泄露的风险，我们正在使用也正是同一个版本，看这个url里文章说的也是谁也不能确认，而oracle方自然说不由其负责。

I tried with java6u29 and java7 and they work great. Actually on the production server we are running for almost 4 days with java7 and it's stable, no crash, no slowdown, no restart in this period, and with less maximum memory. If it's going to last for a week then I trust it will go on fine.

最后是有人用java6u29 和java7 运行 稳定。特别是java7.明天尝试在hiveserver服务器换用java7试试。

append。。。。

今天改用jdk 7测试情况基本一致，看来问题并不是 jvm问题。

使用jmap -heap 发现 hiveserver 新生代并没有去按照ratio设置的那样，最大容量还是默认的800m，这个对数据分析来说太小了，使用xmn配置新生代，并配置最大新生代大小，而且将gc机制改为cms，目前内存占用稳定在 2.3g左右。

最后的参数：

export HADOOP_OPTS="$HADOOP_OPTS -Xms5000m -Xmn4000m -XX:MaxNewSize=4000m -Xss128k -XX:MaxHeapFreeRatio=80 -XX:MinHeapFreeRatio=40 -XX:+UseParNewGC -XX:+UseConcMarkSw
eepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0 -XX:-UseGCOverheadLimit -XX:MaxTenuringThreshold=8 -XX:P
ermSize=800M -XX:MaxPermSize=800M -XX:GCTimeRatio=19 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"

hiveserver 占用内存过大的问题的更多相关文章

Unity3D占用内存太大的解决方法
原地址:http://www.cnblogs.com/88999660/archive/2013/03/15/2961663.html 最近网友通过网站搜索Unity3D在手机及其他平台下占用内存太大 ...
（转）Unity3D占用内存太大的解决方法
自:http://www.cnblogs.com/88999660/archive/2013/03/15/2961663.html 最近网友通过网站搜索Unity3D在手机及其他平台下占用内存太大. ...
win7 64位的 svchost.exe 占用内存过大的问题
svchost.exe 是用来启动系统服务的,所以某个 svchost.exe 占用内存过大,可能就是它启动的那个服务占用内存过大,所以只要停止并禁用那个服务就行了. 一般来说占用内存最大的服务是 S ...
Unity3D占用内存太大怎么解决呢?
最近网友通过网站搜索Unity3D在手机及其他平台下占用内存太大. 这里写下关于Unity3D对于内存的管理与优化. Unity3D 里有两种动态加载机制:一个是Resources.Load,另外一个 ...
php 因循环数据赋值变量占用内存太大提示错误
Fatal error: Allowed memory size of 134217728 bytes exhausted 网上很多解决方法:就简单记录下一个csv导入功能由于数据太多占用内存太 ...
C# Winform应用程序占用内存较大解决方法整理（转）
原文:http://www.jb51.net/article/56682.htm 背景: 微软的 .NET FRAMEWORK 现在可谓如火如荼了.但是,.NET 一直所为人诟病的就是“胃口太大”,狂 ...
(转)C# Winform应用程序占用内存较大解决方法整理
背景: 微软的 .NET FRAMEWORK 现在可谓如火如荼了.但是,.NET 一直所为人诟病的就是“胃口太大”,狂吃内存,虽然微软声称 GC 的功能和智能化都很高,但是内存的回收问题,一直存在困扰 ...
C# Winform应用程序占用内存较大解决方法整理
微软的 .NET FRAMEWORK 现在可谓如火如荼了.但是,.NET 一直所为人诟病的就是“胃口太大”,狂吃内存,虽然微软声称 GC 的功能和智能化都很高,但是内存的回收问题,一直存在困扰,尤其 ...
SQL Server 2008 R2占用内存越来越大两种解决方法
SQL Server 2008 R2运行越久,占用内存会越来越大. 第一种:有了上边的分析结果,解决方法就简单了,定期重启下SQL Server 2008 R2数据库服务即可,使用任务计划定期执行下边 ...

随机推荐

EF-关于类库中EntityFramework之CodeFirst(代码优先)的操作浅析
前有ADO.NET,后有ORM模式的EntityFramework.这两种技术都实现了对数据库的访问操作.如果要说哪种技术好,就看项目架构的大小,使用者的熟练程度等等,毕竟萝卜白菜,各有所爱. 今天要 ...
使用MyEclipse开发Java EE应用：用XDoclet创建EJB 2 Session Bean项目（三）
MyEclipse限时秒杀!活动火热开启中>> [MyEclipse最新版下载] 三.配置XDoclet支持的项目默认情况下,MyEclipse EJB项目未配置使用XDoclet功能. ...
idea 一些插件配置
接触maven快2年了吧,对maven还是一知半解其实.得到了一些教训,就是少转牛角尖,多把握实际需要的东西,一口一口吃饭. 插件化很常见了.这里记录idea使用的jetty插件和tomcat插件和 ...
synchronized(一)
/** * 线程安全概念:当多个线程访问某一个类(对象或方法)时,这个对象始终都能表现出正确的行为,那么这个类(对象或方法)就是线程安全的. * synchronized:可以在任意对象及方法上加锁, ...
2016 多校联赛7 Elegant Construction
Being an ACMer requires knowledge in many fields, because problems in this contest may use physics, ...
shell 或 Makefile 学习网站
1.http://man.linuxde.net/ 2.http://www.cnblogs.com/peida/archive/2012/12/05/2803591.html
Spring Boot 揭秘与实战（一）快速上手
文章目录 1. 简介 1.1. 什么是Spring Boot 1.2. 为什么选择Spring Boot 2. 相关知识 2.1. Spring Boot的spring-boot-starter 2. ...
angular把echarts封装为指令（配合requirejs）
1.在require中配置echartsjs文件 2.在directives下定义指令(定义为全局的指令,任何页面调用都可以) define(['app','echarts'],function(ap ...
cboss升级顺序
1. sunboss去除cron.d ps.运营商cboss安装说明: 1. 先安装管理服务器,管理服务器安装所有包,安装顺序如下: (1)cdb20 (2)cboss (3)db (4)boss ( ...
Linux 查看CPU信息、机器型号等硬件信息[转]
查看CPU信息(型号) # cat /proc/cpuinfo | grep name | cut -f2 -d: | uniq -c 8 Intel(R) Xeon(R) CPU ...

hiveserver 占用内存过大的问题

hiveserver 占用内存过大的问题的更多相关文章

随机推荐

热门专题