反垃圾rd那边有一个hql,在执行过程中出现错误退出,报java.io.IOException: Broken pipe异常,hql中使用到了python脚本,hql和python脚本最近没有人改过,在10.1号时还执行正常,可是在10.4号之后执行就老是出现同样的错误,并且错误出如今stage-2的reduce阶段,gateway上面的错误提演示样例如以下:

2014-10-10 15:05:32,724 Stage-2 map = 100%,  reduce = 100%
Ended Job = job_201406171104_4019895 with errors
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

jobtracker页面job报错信息:

2014-10-10 15:00:29,614 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":"1000390355","reducesinkkey1":"14"},"value":{"_col0":"1000390355","_col1":25,"_col2":"Infinity","_col3":"14","_col4":17},"alias":0}
at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:268)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:518)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:419)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1061)
at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":"1000390355","reducesinkkey1":"14"},"value":{"_col0":"1000390355","_col1":25,"_col2":"Infinity","_col3":"14","_col4":17},"alias":0}
at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256)
... 7 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Broken pipe
at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:348)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247)
... 7 more
Caused by: java.io.IOException: Broken pipe
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:260)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at org.apache.hadoop.hive.ql.exec.TextRecordWriter.write(TextRecordWriter.java:43)
at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:331)
... 15 more

stderr logs:

Traceback (most recent call last):
File "/data10/hadoop/local/taskTracker/liangjun/jobcache/job_201406171104_4019895/attempt_201406171104_4019895_r_000000_0/work/./pranalysis.py", line 86, in <module>
pranalysis(cols[0],pr,cols[1],cols[4],prnum)
File "/data10/hadoop/local/taskTracker/liangjun/jobcache/job_201406171104_4019895/attempt_201406171104_4019895_r_000000_0/work/./pranalysis.py", line 60, in pranalysis
print '%s\t%d\t%d\t%d'%(uid,v[14]-20,type,rank)
TypeError: %d format: a number is required, not float

从以上job的错误信息初步推断,问题原因应该是10.1之后的数据出现故障。导致python脚本运行的时候退出。数据流通道被关闭,而ExecReducer.reduce()方法不知道往python写数据的通道已经由于异常而关闭。还继续往里写数据,这时就会出现java.io.IOException: Broken pipe异常。

下面是分析过程:

1、hql和python

hql内容例如以下:

add file /usr/home/wbdata_anti/shell/sass_offline/pranalysis.py;
select transform(BS.*) using 'pranalysis.py' as uid,prvalue,trend,prlevel
from
(
select B1.uid,B1.flws,B1.pr,iter,B2.alivefans from tmp_anti_user_pagerank1 B1
join
mds_anti_user_flwpr B2
on B1.uid=B2.uid
where iter>'00' and iter<='14' and dt='lowrlfans20141001'
distribute by uid sort by uid,iter
)BS;

python脚本内容例如以下:

#!/usr/bin/python
#coding=utf-8
import sys,time
import re,math
from optparse import OptionParser
import ConfigParser reload(sys)
sys.setdefaultencoding('utf-8') parser = OptionParser(usage="usage:%prog [optinos] filepath")
parser.add_option("-i", "--iter",action = "store",type = 'string', dest = "iter", default = '14',
help="how many iterators" )
(options, args) = parser.parse_args() def pranalysis(uid,prs,flw,fans,prnum):
tasc=tdesc=0 try:
v=[float(pr)*100000000000 for pr in prs]
fans=int(fans)
interval=fans/100
except:
#rst=sys.exc_info()
#sys.excepthook(rst[0],rst[1],rst[2])
return
for i in range(1,prnum-1) :
if i==1:
if v[i+1]-v[i]>interval and v>fans: tasc += 1
elif v[i]-v[i+1]>interval and v[i+1]<fans: tdesc += 1
continue
if v[i+1]-v[i]>interval: tasc += 1
elif v[i]-v[i+1]>interval: tdesc += 1 # rank indicate the rate between pr and fans. higher rank(big number) mean more possible negative user
rate=v[prnum-1]/fans
rank=4
if rate>3.0: rank=0
elif rate>2.0: rank=1
elif rate>1.3: rank=2
elif rate>0.7: rank=3
elif rate>0.5: rank=4
elif rate>0.3: rank=5
elif rate>0.2: rank=6
else: rank=7 # 0 for stable trend. 1 for round trend, 2, for positive user, 3 for negative user.
type=0
if tasc>0 and tdesc>0:
type=1
elif tasc>0:
type=2
elif tdesc>0:
type=3
else: # tdesc=0 and tasc=0
type=0
#if fans<60:
# type=0 print '%s\t%d\t%d\t%d'%(uid,v[14]-20,type,rank) #format sort by uid, iter
#uid follow pr iter fans
#1642909335 919 0.00070398898 04 68399779 prnum=int(options.iter)+1
pr=[0]*prnum
idx=1
lastiter='00'
lastuid=''
for line in sys.stdin:
line=line.rstrip('\n')
cols=line.split('\t')
if len(cols)<5: continue
if cols[3]>options.iter or cols[3]=='00': continue
if cols[3]<=lastiter:
print '%s\t%d\t%d\t%d'%(lastuid,2,0,7)
pr=[0]*prnum
idx=1
lastiter=cols[3]
lastuid=cols[0]
pr[idx]=cols[2]
idx+=1
if cols[3]==options.iter:
pranalysis(cols[0],pr,cols[1],cols[4],prnum)
pr=[0]*prnum
lastiter='00'
idx=1

2、stage-2 reduce阶段的运行计划:

      Reduce Operator Tree:
Extract
Select Operator
expressions:
expr: _col0
type: string
expr: _col1
type: bigint
expr: _col2
type: string
expr: _col3
type: string
expr: _col4
type: bigint
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Transform Operator
command: pranalysis.py
output info:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

依据运行计划,能够看出。stage-2 的reduce阶段事实上非常easy,就是将map阶段拿到的数据使用pranalysis.py脚本进行计算。由5列转换成4列,python输出的时候有数据格式要求:

print '%s\t%d\t%d\t%d'%(uid,v[14]-20,type,rank)

依据运行计划定位到的结果。在结合job的stderr logs信息:

Traceback (most recent call last):
File "/data10/hadoop/local/taskTracker/liangjun/jobcache/job_201406171104_4019895/attempt_201406171104_4019895_r_000000_0/work/./pranalysis.py", line 86, in <module>
pranalysis(cols[0],pr,cols[1],cols[4],prnum)
File "/data10/hadoop/local/taskTracker/liangjun/jobcache/job_201406171104_4019895/attempt_201406171104_4019895_r_000000_0/work/./pranalysis.py", line 60, in pranalysis
print '%s\t%d\t%d\t%d'%(uid,v[14]-20,type,rank)
TypeError: %d format: a number is required, not float

能够看出,hql确实是在运行python的时候由于数据出现异常。python计算完毕之后的有一个数据的格式是float型的,而我们对该数据预期的格式应该是number型的,导致python脚本异常退出,退出的时候关闭了数据流通道。可是ExecReducer.reduce()方法事实上是不知道往python写数据的通道已经由于异常而关闭,还继续往里写数据,这时就出现了java.io.IOException:
Broken pipe的异常。

參考:

http://fgh2011.iteye.com/blog/1684544

http://blog.csdn.net/churylin/article/details/11969925

hive使用python脚本导致java.io.IOException: Broken pipe异常退出的更多相关文章

  1. POI 导入导出时异常[java.io.IOException: Broken pipe]

    使用用POI导出文件时抛出异常java.io.IOException: Broken pipe ERROR: 'java.io.IOException: Broken pipe' org.apache ...

  2. 线上问题!----------org.apache.catalina.connector.ClientAbortException: java.io.IOException: Broken pipe

    1.问题出现 昨晚项目在上线的时候因为推广的原因,新增的大量请求.在八点的时候. org.apache.catalina.connector.ClientAbortException: java.io ...

  3. java.io.IOException: Broken pipe

    最近项目虽然已经在正常运行,但是偶尔会有一些不知名的错误冒出来,比如时不时报一个数据库主键重复或者某些时候会有null的异常报出来.看看代码写完能跑起来还只是开始而已,需要不断精进重构,才能让代码运行 ...

  4. java.io.IOException 断开的管道 解决方法 ClientAbortException: java.io.IOException: Broken pipe

    今天公司技术支持的童鞋报告一个客户的服务不工作了,紧急求助,于是远程登陆上服务器排查问题. 查看采集数据的tomcat日志,习惯性的先翻到日志的最后去查看有没有异常的打印,果然发现了好几种异常信息,但 ...

  5. Tomcat报java.io.IOException: Broken pipe错误

    Tomcat报java.io.IOException: Broken pipe错误,如下图: 解决方案:我的原因是因为网络策略导致出现该问题,即网络端口未启用或被限制.

  6. 控制台(Console)报错:java.io.IOException: Broken pipe

    控制台(Console)输出: java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcherImpl.write0(Native Meth ...

  7. openTSDB ConnectionManager: Unexpected exception from downstream java.io.IOException: Broken pipe

    openTSDB有这种错误: ConnectionManager: Unexpected exception from downstream for [id: 0xf85323a8, /10.65.3 ...

  8. 断开的管道 java.io.IOException: Broken pipe 解决方法

    一.Broken pipe产生原因分析 1.当访问某个服务突然服务器挂了,就会产生Broken pipe; 2.客户端读取超时关闭了连接,这时服务器往客户端再写数据就发生了broken pipe异常! ...

  9. troubleshooting-执行Oozie调度Hive导数脚本抛java.io.IOException: output.properties data exceeds its limit [2048]

    执行Oozie调度Hive导数脚本抛java.io.IOException: output.properties data exceeds its limit [2048] 原因分析 shell脚本中 ...

随机推荐

  1. Machine Learning 算法可视化实现2 - Apriori算法实现

    目录 关联分析 Apriori原理 Apriori算法实现 - 频繁项集 Apriori算法实现 - 从频繁项集挖掘关联规则 一.关联分析 关联分析是一种在大规模数据集中寻找有趣关系的任务. 这些关系 ...

  2. ELM:ELM实现鸢尾花种类测试集预测识别正确率(better)结果对比—Jason niu

    load iris_data.mat P_train = []; T_train = []; P_test = []; T_test = []; for i = 1:3 temp_input = fe ...

  3. 【JavaScript】快速入门

    摘抄地址快速入门 No1: JavaScript严格区分大小写 No2: JavaScript不区分整数和浮点数,统一用Number表示 NaN表示Not a Number,当无法计算结果时用NaN表 ...

  4. POJ 3275 Ranking the Cows(传递闭包)【bitset优化Floyd】+【领接表优化Floyd】

    <题目链接> 题目大意:FJ想按照奶牛产奶的能力给她们排序.现在已知有N头奶牛$(1 ≤ N ≤ 1,000)$.FJ通过比较,已经知道了M$1 ≤ M ≤ 10,000$对相对关系.每一 ...

  5. 笔记-JS高级程序设计-变量,作用域和内存问题

    1在将一个值赋给变量时,解析器必须确认这个值是基本类值还是引用类型值,基本类型值是按值访问的,可以操作保存在在变量中的实际值,引用类型是保 存在内存中的对象,JS不允许直接访问内存中的位置,所以实际操 ...

  6. 现代C++之理解模板类型推断(template type deduction)

    理解模板类型推断(template type deduction) 我们往往不能理解一个复杂的系统是如何运作的,但是却知道这个系统能够做什么.C++的模板类型推断便是如此,把参数传递到模板函数往往能让 ...

  7. ctf study of jarvisoj reverse

    [61dctf] androideasy 164求解器 50 相反 脚本如下: s='' a=113, 123, 118, 112, 108, 94, 99, 72, 38, 68, 72, 87, ...

  8. linux相关操作命令

    1.复制文件:cp -r file ./src 2.删除文件:rm -rf file 3.解压文件:tar -xvf bianque.tar.gz

  9. 熔断器---Hystrix

    Hystrix:熔断器,容错管理工具,旨在通过熔断机制控制服务和第三方库的节点,从而对延迟和故障提供更强大的容错能力. 说到熔断器,先要引入另外一个词,雪崩效应. 雪崩效应,百度百科的解释是这样的: ...

  10. C#调用WebService的简单方式

    WebServiceCallpublic class WebServiceCall { public void Call() { string url = "http://localhost ...