最近再hue 集群查询任务经常失败,经过几天的观察,终于找到原因,报错如下

Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1514128895713_0770_1_00, diagnostics=[Task failed, taskId=task_1514128895713_0770_1_00_000006, diagnostics=[TaskAttempt 0 failed, info=[Container container_1514128895713_0770_01_000008 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 1 failed, info=[Container container_1514128895713_0770_01_000026 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 2 failed, info=[Container container_1514128895713_0770_01_000036 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 3 failed, info=[Container container_1514128895713_0770_01_000042 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1514128895713_0770_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1514128895713_0770_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:6, Vertex vertex_1514128895713_0770_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1514128895713_0770_1_00, diagnostics=[Task failed, taskId=task_1514128895713_0770_1_00_000006, diagnostics=[TaskAttempt 0 failed, info=[Container container_1514128895713_0770_01_000008 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 1 failed, info=[Container container_1514128895713_0770_01_000026 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 2 failed, info=[Container container_1514128895713_0770_01_000036 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 3 failed, info=[Container container_1514128895713_0770_01_000042 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1514128895713_0770_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1514128895713_0770_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:6, Vertex vertex_1514128895713_0770_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1

分析:

  taskId=task_1514128895713_0770_1_00_000006 失败了几次,失败的原因是container被高优先级的任务抢占了。而task最大的失败次数默认是4.当集群上的任务比较多时,比较容易出现这个问题。还有一个原因我认为container设置的内存太小,我这是1.5G 不过没有修改,我设置了以下参数就解决了:

解决方案:

  

  问题解决了,希望帮到你

  

hive on tez 任务失败的更多相关文章

  1. hive on tez配置

    1.Tez简介 Tez是Hontonworks开源的支持DAG作业的计算框架,它可以将多个有依赖的作业转换为一个作业从而大幅提升MapReduce作业的性能.Tez并不直接面向最终用户--事实上它允许 ...

  2. hive on tez 错误记录

    1.执行过程失败,报 Container killed on request. Exit code is 143 如下图: 分析:造成这种原因是由于总内存不多,而容器在jvm中占比过高,修改tez-s ...

  3. 配置 Hive On Tez

    配置 Hive On Tez 标签(空格分隔): hive Tez 部署底层应用 简单介绍 介绍:tez 是基于hive 之上,可以将sql翻译解析成DAG计算的引擎.基于DAG 与mr 架构本身的优 ...

  4. hive on spark VS SparkSQL VS hive on tez

    http://blog.csdn.net/wtq1993/article/details/52435563 http://blog.csdn.net/yeruby/article/details/51 ...

  5. Hive on Tez 中 Map 任务的数量计算

    Hive on Tez Mapper 数量计算 在Hive 中执行一个query时,我们可以发现Hive 的执行引擎在使用 Tez 与 MR时,两者生成mapper数量差异较大.主要原因在于 Tez ...

  6. hive on tez

    hive运行模式 hive on mapreduce 离线计算(默认) hive on tez  YARN之上支持DAG作业的计算框架 hive on spark 内存计算 hive on tez T ...

  7. Hive执行count函数失败,Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException)

    Hive执行count函数失败 1.现象: 0: jdbc:hive2://192.168.137.12:10000> select count(*) from emp; INFO : Numb ...

  8. hive中创建表失败

    使用create table命令创建表失败,如下错误信息: hive> create table test(id int,name string,age int,sex string); FAI ...

  9. Hive配置Tez引擎踩坑

    框架版本 Hadoop 2.7.7 Hive 2.3.7 Tez 0.9.2 保证hadoop集群启动,hive元数据服务启动 上传tez到HDFS tar -zxvf apache-tez-0.9. ...

随机推荐

  1. 解决WordPress百度分享图标不显示问题

    最近在帮朋友维护博客时,发现他的百度分享居然不能使用了,首先很多人会认为,百度分享挂在那里就是一种摆设,又没有几个人去分享,有什么含义呢?其实挂百度分享的含义是非常重要的,网站增加一个百度分享是可以增 ...

  2. mysql的索引为什么要使用B+树而不是其他树?

    总结 1.InnoDB存储引擎的最小存储单元是页,页可以用于存放数据也可以用于存放键值+指针,在B+树中叶子节点存放数据,非叶子节点存放键值+指针. 2.索引组织表通过非叶子节点的二分查找法以及指针确 ...

  3. 磁盘(disk)结构

  4. 100、神器的 routing mesh (Swarm07)

    参考https://www.cnblogs.com/CloudMan6/p/7930321.html   上一节我们提到了 swarm 的 routing mesh .当外部访问任意节点的8080端口 ...

  5. 优秀java博客

    https://www.jianshu.com/p/efb58b7115bf?utm_source=tuicool https://www.nowcoder.com/discuss/110317 ht ...

  6. JAVA核心技术--继承(1)

    1.继承:向上追溯,对同一批类的抽象,延续和扩展父类的一切信息! 1)关键字:extends      例如,父类是Animal,子类是Dog;   eg: public class Dog exte ...

  7. mybatis查询返回的对象不为null,但是属性值为null

    返回的对象不为null,但是属性值为null 代码如下: <resultMap id="BaseResultMap" type="com.trhui.ebook.d ...

  8. Java LinkedHashMap学习

    以前一直使用HashMap,今天学习一下LinkedHashMap JavaDoc 注解: Hash table and linked list implementation of the Map i ...

  9. CRM项目讲解和django知识点回顾

    今天想把之前写的CRM项目梳理下,顺便回顾一下djiango的部分重要知识. 1.登录页面(包含简单验证码) 首先来看下CRM的登录页面,样式啥的不重要,大家可以去jquery ui的网站上或者其他地 ...

  10. gcc -DDEBUG

    编译方法: gcc -D(DEBUGNAME) -o execution_name execution_source_code.c 例如: gcc -DDEBUG -o quick_sort quic ...