来自:https://www.mapr.com/blog/how-to-avoid-java-heap-space-errors-understanding-and-managing-task-attempt-memory#.VMWvNDGUfXY

Keeping these five steps in mind can save you a lot of headaches and avoid Java heap space errors.

  1. Calculate memory needed.
  2. Check that the JVMs have enough memory for the TaskTracker tasks.
  3. Check that the JVMs settings are suitable for your tasks.
  4. Limit your nodes use of swap space and paged memory.
  5. Set the task attempt slots to a number that’s lower than the number calculated by the JobTracker web GUI.

In this blog, I will cover each of these steps which will give you a much better understanding and ability to manage task attempt memory.

It is important to understand how to manage task attempt memory, so that you can avoid Java heap space errors. When running map/reduce jobs, you may observe task attempts that fail like this:
 
13/09/20 08:50:56 INFO mapred.JobClient: Task Id : attempt_201309200652_0003_m_000000_0, Status : FAILED on node node1
 
Error: Java heap space
 
This error occurs when the task attempt tries to use more memory than the maximum limit set for the Java Virtual Machine (JVM) used to run it.
 
The first step in avoiding Java heap space errors is to understand memory requirements for your map and reduce tasks, so that you can start the JVM with an appropriate memory limit.
 
For example, in the wordcount job shipped in the hadoop-0.20.2-dev-examples.jar, map tasks would not need much memory, regardless of what data is being processed. The only significant amount of memory a map task would need would be for loading in libraries needed for execution. When using the default wordcount included with the MapR packages, 512MB of memory should be more than enough for the map attempt JVMs. If you plan to run the Hadoop examples we provide, you can look to spawn your map task attempt JVMs with a 512MB memory limit.
 
If you know how much memory your map task attempts should receive (in this case it’s 512MB), the next step is to launch the JVMs with that amount of memory. The task attempt JVM memory is set by the TaskTracker as it spawns JVMs to process data for a Map/Reduce job. The limit set by the TaskTracker comes from one of two possible sources: either the user specifies the desired amount of memory at the time they submit the job as part of their job conf object, or the TaskTracker will spawn the JVM with a default amount of memory.
 
The mapred.map.child.java.opts property is used to provide arguments that the TaskTracker uses to start JVMs to run map tasks (there is a similar property for reduce tasks). If themapred.map.child.java.opts property was set to "-Xmx512m" then the map task attempt JVMs would have a memory limit of 512MB. Alternately, when no -Xmx value is specified via the conf property, each TaskTracker will calculate a default memory limit for the JVMs that it launches. The limit will be based on the number of map/reduce task slots a TaskTracker will present, as well as the total Map/Reduce memory the TaskTracker has been instructed to use as a system-wide limit.
 
The number of map/reduce task attempt slots presented by a TaskTracker is set at the time the TaskTracker starts, and is controlled via two settings in mapred-site.xml on each node:
 
mapred.tasktracker.map.tasks.maximum mapred.tasktracker.reduce.tasks.maximum
 
The default values for these settings are formulas that are based on the number of CPU cores on the node. However, you can override the parameters by 1) modifying mapred-site.xml and setting a hard coded number of slots, or 2) using your own formula.
 
The system-wide TaskTracker map/reduce task attempt memory limit is set when a TaskTracker process starts. There are two possible sources for the memory limit that the TaskTracker will impose. First, a limit can be set explicitly in the hadoop-env.sh script in the hadoop conf directory. You can specify a memory limit by adding this line:
 
export HADOOP_HEAPSIZE=2000
 
This instructs the TaskTracker to limit the total memory that all task attempt JVMs on the node can use concurrently to 2000MB. If no HADOOP_HEAPSIZE parameter is specified in the hadoop-env.sh file, then the MapR warden service will provide the memory limit when it starts the TaskTracker. The warden service will calculate the limit it specifies based on the amount of physical memory on the node, minus the amount already committed to the services configured to run on the node. If you look in warden.conf you will see properties such as this:
 
service.command.mfs.heapsize.percent=20 service.command.mfs.heapsize.min=512
 
This particular example indicates the warden should instruct the MFS service to use 20% of the physical memory on the node or a minimum of 512MB (in the case that 512MB > 20% of physical memory). If you look at all the services configured to run on a node,as well as the memory specifications in warden.conf, you should be able to determine how much memory is being committed to the configured services (as well as how much is reserved for general OS/administration). The leftover memory is what is the TaskTracker willset as the memory limit for concurrently running task attempts.
 
For example, imagine that you have a single node install where you run ZooKeeper, CLDB, MFS, JobTracker, TaskTracker, NFS, the GUI, HBase Master and HBase RegionServer. That’s a lot of services on one node, and each one requires memory, so the warden will be allocating percentages of the memory to each service, chipping away at what’s left for map/reduce task attempts on the node. If you add up the percentages used by each of those services and it adds up to 60%, and there is a 5% OS reservation, you're left with 35% of the memory on the node for map/reduce task attempts. If you had 10GB of memory on the node, you would have 3.5GB for your map/reduce tasks. If you have 6 map slots and 4 reduce slots set on the node, then you would have 10 slots total.If the memory is divided evenly, you'd end up with a JVM memory limit of 350MB for each JVM. If you need 512MB of memory to run your map tasks, then this default config won't work, and you will encounter Java heap space errors.
 
There are a few additional issues to be aware of when managing task attempt memory. Do not force your nodes to use any significant amount of swap space, or to cause memory to be paged into/out of disk frequently. If you change how you submit your job by explicitly setting "-Xmx500m" inmapred.map.child.java.opts, you've overridden the safe memory limits, but you don't actually have additional physical memory. So while the map/reduce task attempts may launch, you may force massive amounts of swap to be used, and you can't rely on the kernel OOM killer or anything else to prevent this from occurring. You also can’t expect a node that starts paging heavily to recover quickly, if at all. If you just increase the task attempt JVM memory while continuing to launch the same number of task attempts concurrently on the node, you are subscribing more memory, and you need to be careful not to oversubscribe. If you are oversubscribing significantly,you might cause significant paging, and your nodes might freeze up and never come back online until power cycled.
 
So if you increase the amount of memory given to each task attempt JVM, you will also need to reduce the number of map/reduce task slots presented by TaskTrackers.
 
This is a complex situation, because if you run different jobs concurrently on a cluster, you can have task attempts from one job (jobA) that need lots of memory, and task attempts from another job (Job B) that need very little memory. Consequently, if you lower the number of map/reduce slots, you may find that you have plenty of free memory when a node is running tasks mostly for job B, but not that much free memory available for jobA. The key is to find the balance where you can allow some level of oversubscription, without being so aggressive that your nodes freeze.
 
To assist with this task, the TaskTracker will look at the amount of memory currently consumed by all running map/reduce tasks. It is not looking at the maximum limit on these tasks, but the actual amount of memory being utilized in total across all running task attempts. If that amount of memory reaches a specified level, the TaskTracker can kill some running task attempts to free up memory so that the other task attempts can finish successfully without causing a node to start paging excessively.
 
For example, if you’re trying to get the wordcount example to run on a small cluster or single node setup, and it fails with a "Java heap space" error, the easiest, fastest solution is to edit/opt/mapr/hadoop/hadoop-0.20.2/conf/mapred-site.xml on the TaskTracker node(s) and to reduce the number of map/reduce task attempt slots by setting:
 
mapred.tasktracker.map.tasks.maximum mapred.tasktracker.reduce.tasks.maximum
 
It’s important to set the task attempt slots to a number that’s lower than what’s being calculated currently, which can be determined by going to the JobTracker web GUI. For example, if you had one TaskTracker and it was showing 6 map slots and 4 reduce slots, then you should set that number to 3 map slots and 2 reduce slots,and restart the TaskTracker process on the node via:
 
maprcli node services -nodes -tasktracker restart
 
When it restarts with the lowered number of slots, re-submit the wordcount job. Witheach attempt, JVM should be getting more memory without any additional oversubscription. It’s a safe solution, as the nodes shouldn't start paging aggressively, it’s an easy solution since it doesn’t require a lot of memory calculations, and it’s fast since the solution requires an edit to just one config file and a service restart command afterwards.
In order to avoid Java heap space errors, keep the following points in mind:
  1. Calculate how much memory your task attempts will need.
  2. Make sure that the TaskTracker launches your task attempts in JVMs that have a memory limit greater than or equal to your expected memory requirement.
  3. Remember that there are default settings that come into play when launching those JVMs, unless you have explicitly overridden them. The default settings may not be suitable based on the balance of CPU cores, physical memory and the set of services that you have configured to run on the node.
  4. Do not force your nodes to use any significant amount of swap space, or to cause memory to be paged into/out of disk frequently.
  5. Set the task attempt slots to a number that’s lower than what is being calculated by the JobTracker web GUI.

Five Steps to Avoiding Java Heap Space Errors的更多相关文章

  1. java head space/ java.lang.OutOfMemoryError: Java heap space内存溢出

    上一篇JMX/JConsole调试本地还可以在centos6.5 服务器上进行监控有个问题端口只开放22那么设置的9998端口 你怎么都连不上怎么监控?(如果大神知道还望指点,个人见解) 线上项目出现 ...

  2. Tomcat报java.lang.OutOfMemoryError: Java heap space错误停止运行如何解决

    最近开发的一个商业项目,部署完成后,经常出现Tomcat挂掉的现象,报的异常是:java.lang.OutOfMemoryError: Java heap space,上网google了一下,了解了一 ...

  3. MyCAT报java.lang.OutOfMemoryError: Java heap space

    早上同事反映,mycat又假死了,估计还是内存溢出,查看了一下错误日志. INFO | jvm | // :: | java.lang.OutOfMemoryError: Java heap spac ...

  4. Spark java.lang.outofmemoryerror gc overhead limit exceeded 与 spark OOM:java heap space 解决方法

    引用自:http://cache.baiducontent.com/c?m=9f65cb4a8c8507ed4fece7631046893b4c4380146d96864968d4e414c42246 ...

  5. Tomcat 启动项目报错 java.lang.OutOfMemoryError: Java heap space

    近日使用myeclipse 部署web项目,启动tomcat时报错: SEVERE: Error waiting for multi-thread deployment of directories ...

  6. 应用jacob组件造成的内存溢出解决方案(java.lang.OutOfMemoryError: Java heap space)

    http://www.educity.cn/wenda/351088.html 使用jacob组件造成的内存溢出解决方案(java.lang.OutOfMemoryError: Java heap s ...

  7. java.lang.OutOfMemoryError: Java heap space

    java.lang.OutOfMemoryError: Java heap space 原因:内存溢出,内存一直申请一直占用,无法回收 解决方法:定时重启下服务,

  8. java.lang.OutOfMemoryError: Java heap space解决方法

    引起java.lang.OutOfMemoryError: Java heap space异常,可能是由JAVA的堆栈设置太小的原因 根据网上的答案大致有以下两种解决方法: 1.在D:/apache- ...

  9. 【转】java.lang.OutOfMemoryError: Java heap space的解决

    原文地址:http://blog.sina.com.cn/s/blog_4b12778b0100v0bb.html Myeclipse下java.lang.OutOfMemoryError: Java ...

随机推荐

  1. 关于面试总结7-linux篇

    前言 现在做测试的出去面试,都会被问到linux,不会几个linux指令都不好意思说自己是做测试的了,本篇收集了几个被问的频率较高的linux面试题 常用指令 1.说出10个linux常用的指令 ls ...

  2. java使用反射强制给private字段赋值

    今天项目中遇到了一个问题,要调用一个类,并获取这个类的属性进行赋值然后将这个类传递到方法中做为参数. 实际操作时才发现,这个类中的字段属性是私有的,不能进行赋值!没有提供公有的方法.而这个类又是打包成 ...

  3. hihocoder #1170 机器人 && 编程之美2015复赛

    题意: 时间限制:2000ms 单点时限:1000ms 内存限制:256MB 描写叙述 小冰的N个机器人兄弟排成一列,每一个机器人有一个颜色. 如今小冰想让同一颜色的机器人聚在一起.即随意两个同颜色的 ...

  4. 解决Using 1.7 requires compiling with Android 4.4 (KitKat); currently using API 4

    有时候我们可能需要将项目的版本降低,比如4.4降低到2.2这样的,可能会遇到类似于这样的错误 Using 1.7 requires compiling with Android 4.4 (KitKat ...

  5. Memcache的安装与配置

    因为单位要求修复Memcached的DDOS漏洞,整理了本文.之前的文章防止Memcached的DDOS攻击另外一个思路 提到了解决方案,我们使用的版本较低,因此需要对 Memcached 进行升级, ...

  6. JVM的内存区域划分(转)

    原文链接:JVM的内存区域划分 JVM的内存区域划分 学过C语言的朋友都知道C编译器在划分内存区域的时候经常将管理的区域划分为数据段和代码段,数据段包括堆.栈以及静态数据区.那么在Java语言当中,内 ...

  7. 里氏替换原则(Liskov Substitution Principle,LSP)

    肯定有不少人跟我刚看到这项原则的时候一样,对这个原则的名字充满疑惑.其实原因就是这项原则最早是在1988年,由麻省理工学院的一位姓里的女士(Barbara Liskov)提出来的. 定义1:如果对每一 ...

  8. ubuntu服务器常见使用技巧及-kill掉后GPU显存不释放进程-

    如何解决python进程被kill掉后GPU显存不释放的问题 1 重新开一个shell,然后输入: ps aux|grep user_name|grep python.所有该用户下的python程序就 ...

  9. Sublime Text 3中SublimeLinter的使用

    关于Sublime  Text 2中的SublimeLinter的使用网上多如牛毛,基本上不会遇到什么问题,简单的讲一下关于Sublime Text 3中遇到的问题: 1.通过package cont ...

  10. Linux上安装Hadoop集群(CentOS7+hadoop-2.8.0)

    1下载hadoop 2安装3个虚拟机并实现ssh免密码登录 2.1安装3个机器 2.2检查机器名称 2.3修改/etc/hosts文件 2.4 给3个机器生成秘钥文件 2.5 在hserver1上创建 ...