最近再hue 集群查询任务经常失败,经过几天的观察,终于找到原因,报错如下

Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1514128895713_0770_1_00, diagnostics=[Task failed, taskId=task_1514128895713_0770_1_00_000006, diagnostics=[TaskAttempt 0 failed, info=[Container container_1514128895713_0770_01_000008 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 1 failed, info=[Container container_1514128895713_0770_01_000026 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 2 failed, info=[Container container_1514128895713_0770_01_000036 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 3 failed, info=[Container container_1514128895713_0770_01_000042 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1514128895713_0770_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1514128895713_0770_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:6, Vertex vertex_1514128895713_0770_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1514128895713_0770_1_00, diagnostics=[Task failed, taskId=task_1514128895713_0770_1_00_000006, diagnostics=[TaskAttempt 0 failed, info=[Container container_1514128895713_0770_01_000008 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 1 failed, info=[Container container_1514128895713_0770_01_000026 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 2 failed, info=[Container container_1514128895713_0770_01_000036 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]], TaskAttempt 3 failed, info=[Container container_1514128895713_0770_01_000042 finished with diagnostics set to [Container failed, exitCode=-100. Container released on a *lost* node]]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1514128895713_0770_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1514128895713_0770_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:6, Vertex vertex_1514128895713_0770_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1

分析:

  taskId=task_1514128895713_0770_1_00_000006 失败了几次,失败的原因是container被高优先级的任务抢占了。而task最大的失败次数默认是4.当集群上的任务比较多时,比较容易出现这个问题。还有一个原因我认为container设置的内存太小,我这是1.5G 不过没有修改,我设置了以下参数就解决了:

解决方案:

  

  问题解决了,希望帮到你

  

hive on tez 任务失败的更多相关文章

  1. hive on tez配置

    1.Tez简介 Tez是Hontonworks开源的支持DAG作业的计算框架,它可以将多个有依赖的作业转换为一个作业从而大幅提升MapReduce作业的性能.Tez并不直接面向最终用户--事实上它允许 ...

  2. hive on tez 错误记录

    1.执行过程失败,报 Container killed on request. Exit code is 143 如下图: 分析:造成这种原因是由于总内存不多,而容器在jvm中占比过高,修改tez-s ...

  3. 配置 Hive On Tez

    配置 Hive On Tez 标签(空格分隔): hive Tez 部署底层应用 简单介绍 介绍:tez 是基于hive 之上,可以将sql翻译解析成DAG计算的引擎.基于DAG 与mr 架构本身的优 ...

  4. hive on spark VS SparkSQL VS hive on tez

    http://blog.csdn.net/wtq1993/article/details/52435563 http://blog.csdn.net/yeruby/article/details/51 ...

  5. Hive on Tez 中 Map 任务的数量计算

    Hive on Tez Mapper 数量计算 在Hive 中执行一个query时,我们可以发现Hive 的执行引擎在使用 Tez 与 MR时,两者生成mapper数量差异较大.主要原因在于 Tez ...

  6. hive on tez

    hive运行模式 hive on mapreduce 离线计算(默认) hive on tez  YARN之上支持DAG作业的计算框架 hive on spark 内存计算 hive on tez T ...

  7. Hive执行count函数失败,Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException)

    Hive执行count函数失败 1.现象: 0: jdbc:hive2://192.168.137.12:10000> select count(*) from emp; INFO : Numb ...

  8. hive中创建表失败

    使用create table命令创建表失败,如下错误信息: hive> create table test(id int,name string,age int,sex string); FAI ...

  9. Hive配置Tez引擎踩坑

    框架版本 Hadoop 2.7.7 Hive 2.3.7 Tez 0.9.2 保证hadoop集群启动,hive元数据服务启动 上传tez到HDFS tar -zxvf apache-tez-0.9. ...

随机推荐

  1. vue : 无法加载文件 C:\Users\lihongjie\AppData\Roaming\npm\vue.ps1,因为在此系统上禁止运行脚本。有关详细信息,请参阅 htt ps:/go.microsoft.com/fwlink/?LinkID=135170 中的 about_Execution_Policies。 所在位置 行:1 字符: 1 + vue init webpack vue_p

    以管理员方式打开powershell 运行命令:set-ExecutionPolicy RemoteSigned 出现: 执行策略更改执行策略可帮助你防止执行不信任的脚本.更改执行策略可能会产生安全风 ...

  2. redis在php中实际应用-hash

    Redis hash 是一个string类型的field和value的映射表,hash特别适合用于存储对象. 目录: 1.批量赋值:hmset,hmget,hgetall 可用于存储一条条数据,即一个 ...

  3. python_0基础开始_day06

    第六节 1.小数据池 ==,is,id ==:查看等号两边的值是否一样 a = 9b = 9print(a == b) # 返回Truec = "dog"d = "dog ...

  4. python之成像库pillow

    目录 python之成像库pillow 官方文档 图像模块(Image.Image) Image模块的功能 Image.new(mode,size,color): Image.open(file,mo ...

  5. 多边形面积(Area_Of_Polygons)

    原理: 任意多边形的面积可由任意一点与多边形上依次两点连线构成的三角形矢量面积求和得出. 分析: 由于给出的点是相对于我们的坐标原点的坐标,每个点实际上我们可以当作一个顶点相对于原点的向量,如下图所示 ...

  6. DIj

    using System;using System.Collections.Generic;using System.Linq;using System.Text; namespace DefineG ...

  7. springcloud(十一)-Zuul聚合微服务

    前言 我们接着上一节.在许多场景下,外部请求需要查询Zuul后端的多个微服务.比如一个电影售票手机APP,在购票订单页上,既需要查询“电影微服务”获得电影相关信息,又需要查询“用户微服务”获得当前用户 ...

  8. 06 Python之列表和元组

    1. 什么是列表 定义: 能装对象的对象 在python中使用[]来描述列表, 内部元素用逗号隔开. 对数据类型没有要求 列表存在索引和切片. 和字符串是一样的. 2. 相关的增删改查操作 添加: 1 ...

  9. 【版本控制工具】 Git基础

    一.Git简介 Git 是一个开源的分布式版本控制系统,用于敏捷高效地处理任何或小或大的项目.于是Git 成了帮助管理 Linux 内核开发而开发的一个开放源码的版本控制软件. (Git目前使用率非常 ...

  10. SpringMVC @Valid,@RequestBody,@RequestParam标注参数时,进行Postman测试

    @Valid(post请求) 可与@RequestBody一起使用 > (@RequestBody @Valid User user) @RequestBody(post请求) 这里的requi ...