问题复现:

G:\bigdata\spark-2.3.3-bin-hadoop2.7\bin>spark-shell
2020-12-26 10:20:48 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://DESKTOP-01KN1P4:4040
Spark context available as 'sc' (master = local[*], app id = local-1608949256544).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.3
/_/ Using Scala version 2.11.8 (Java HotSpot(TM) Client VM, Java 1.8.0_201)
Type in expressions to have them evaluated.
Type :help for more information. scala> sql("create table empty_orc(a int) stored as orc location '/tmp/empty_orc'").show
++
||
++
++ (其他窗口新建一个空文件) touch /tmp/empty_orc/zero.orc scala> sql("select * from empty_orc").show java.lang.RuntimeException: serious problem
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:340)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3278)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3259)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2489)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2703)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:254)
at org.apache.spark.sql.Dataset.show(Dataset.scala:723)
at org.apache.spark.sql.Dataset.show(Dataset.scala:682)
at org.apache.spark.sql.Dataset.show(Dataset.scala:691)
... 49 elided
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$BISplitStrategy.getSplits(OrcInputFormat.java:560)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1010)
... 99 more

该问题的主要原因是在读取orc表时,遇到有空文件时报错,bug记录地址:

SPARK-19809:NullPointerException on zero-size ORC file(https://issues.apache.org/jira/browse/SPARK-19809)

SPARK-29773:Unable to process empty ORC files in Hive Table using Spark SQL(https://issues.apache.org/jira/browse/SPARK-29773)

解决办法:使用参数spark.sql.hive.convertMetastoreOrc=true

G:\bigdata\spark-2.3.3-bin-hadoop2.7\bin>spark-shell --conf spark.sql.hive.convertMetastoreOrc=true
2020-12-26 10:29:06 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://DESKTOP-01KN1P4:4040
Spark context available as 'sc' (master = local[*], app id = local-1608949754291).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.3
/_/ Using Scala version 2.11.8 (Java HotSpot(TM) Client VM, Java 1.8.0_201)
Type in expressions to have them evaluated.
Type :help for more information. scala> sql("select * from empty_orc").show +---+
| a|
+---+
+---+

spark的帮助文档种介绍如下:

ORC Files

Since Spark 2.3, Spark supports a vectorized ORC reader with a new ORC file format for ORC files. To do that, the following configurations are newly added. The vectorized reader is used for the native ORC tables (e.g., the ones created using the clause USING ORC) when spark.sql.orc.impl is set to native and spark.sql.orc.enableVectorizedReader is set to true. For the Hive ORC serde tables (e.g., the ones created using the clause USING HIVE OPTIONS (fileFormat 'ORC')), the vectorized reader is used when spark.sql.hive.convertMetastoreOrc is also set to true.

https://spark.apache.org/docs/2.3.3/sql-programming-guide.html#orc-files

spark读取空orc文件时报错java.lang.RuntimeException: serious problem at OrcInputFormat.generateSplitsInfo的更多相关文章

  1. sparksql读取hive数据报错:java.lang.RuntimeException: serious problem

    问题: Caused by: java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException: Index: ...

  2. shiro使用redis作为缓存,出现要清除缓存时报错 java.lang.Exception: Failed to deserialize at org.crazycake.shiro.SerializeUtils.deserialize(SerializeUtils.java:41) ~[shiro-redis-2.4.2.1-RELEASE.jar:na]

    shiro使用redis作为缓存,出现要清除缓存时报错 java.lang.Exception: Failed to deserialize at org.crazycake.shiro.Serial ...

  3. 使用RestTemplate时报错java.lang.IllegalStateException: No instances available for 127.0.0.1

    我在RestTemplate的配置类里使用了 @LoadBalanced@Componentpublic class RestTemplateConfig { @Bean @LoadBalanced ...

  4. 云笔记项目- 上传文件报错"java.lang.IllegalStateException: File has been moved - cannot be read again"

    在做文件上传时,当写入上传的文件到文件时,会报错“java.lang.IllegalStateException: File has been moved - cannot be read again ...

  5. hive启动时报错 java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D at org.apache.hadoop.fs.Path.initialize

    错误提示信息如下 错误信息如下 [root@node1 bin]# ./hive Logging initialized -bin/lib/hive-common-.jar!/hive-log4j.p ...

  6. 运用反射时报错java.lang.NoSuchMethodException,以解决,记录一下

    问题:想调用service类中的私有方法时, Method target=clz.getMethod("say", String.class);用Class的getMethod报错 ...

  7. storm supervisor启动报错java.lang.RuntimeException: java.io.EOFException

    storm因机器断电或其他异常导致的supervisor意外终止,再次启动时报错: 1. 2013-09-24 09:15:44,361 INFO [main] daemon.supervisor ( ...

  8. Android Studio 首次安装报错 Java.lang.RuntimeException:java.lang.NullPointerException...错

    下次安装报:Java.lang.RuntimeException: java.lang.NullPointerException......错 只需在文件..\Android Studio\bin\i ...

  9. 我的Android进阶之旅------>Android中MediaRecorder.stop()报错 java.lang.RuntimeException: stop failed.【转】

    本文转载自:http://blog.csdn.net/ouyang_peng/article/details/48048975 今天在调用MediaRecorder.stop(),报错了,java.l ...

  10. maven报错 java.lang.RuntimeException: com.google.inject.CreationException: Unable to create injector, see the following errors

    2 errors java.lang.RuntimeException: com.google.inject.CreationException: Unable to create injector, ...

随机推荐

  1. MB01 BAPI_GOODSMVT_CREATE退货

    "-----------------------------------------@斌将军--------------------------------------------DATA: ...

  2. 题解 CF1388A 【Captain Flint and Crew Recruitment】(思维、贪心)

    AC代码: #include<bits/stdc++.h> using namespace std; void solve() { int n; cin >> n; if (n ...

  3. vivo悟空活动中台-打造 Nodejs 版本的MyBatis

    经典的架构设计可以跨越时间和语言,得以传承. -- 题记 一.背景 悟空活动中台技术文章系列又和大家见面了,天气渐冷,注意保暖. 在往期的系列技术文章中我们主要集中分享了前端技术的方方面面,如微组件的 ...

  4. 无需代码绘制人工神经网络ANN模型结构图的方法

      本文介绍几种基于在线网页或软件的.不用代码的神经网络模型结构可视化绘图方法.   之前向大家介绍了一种基于Python第三方ann_visualizer模块的神经网络结构可视化方法,大家可以直接点 ...

  5. 14、SpringBoot-easyexcel导出excle

    系列导航 springBoot项目打jar包 1.springboot工程新建(单模块) 2.springboot创建多模块工程 3.springboot连接数据库 4.SpringBoot连接数据库 ...

  6. Linux mknod命令详解

    Linux一切皆文件,系统与设备通信之前,要建立一个存放在/dev目录下的设备文件,默认情况下就已经生成了很多设备文件,有时候自己手动新建一些设备文件,这就会用到mknod. 语法格式:mknod[选 ...

  7. 概率图模型 · 蒙特卡洛采样 · MCMC | 非常好的教学视频

    https://www.bilibili.com/video/BV17D4y1o7J2?p=1 非常感谢!感觉学会一点了,应该能写作业了

  8. spring--@Autowired @Qualifier @Resource @Value 四者的区别

    @Autowired,@Qualifier,@Resource,和 @Value 是 Spring 框架中用于依赖注入的注解,它们各有特点和用途: @Autowired: @Autowired 注解用 ...

  9. CSS3之transition

    随着css3不断地发展,越来越多的页面特效可以被实现. 例如当我们鼠标悬浮在某个tab上的时候,给它以1s的渐进变化增加一个背景颜色.渐进的变化可以让css样式变化得不那么突兀,也显得交互更加柔和. ...

  10. 部署开源项目管理工具focalboard

    前言 focalboard是一款开源项目管理工具,类似Jira.Trello.官网地址 组件 版本 说明 Debian 12.1 操作系统 docker 20.10.7 容器运行时 docker-co ...