ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs:...
: jdbc:hive2://master01.hadoop.dtmobile.cn:1> select * from cell_random_grid_tmp2 limit 1; INFO : Compiling command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5): INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:grid_row_id, type:,), comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5); Time taken: 0.045 seconds INFO : Executing command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5): INFO : Completed executing command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5); Time taken: 0.001 seconds INFO : OK Error: java.io.IOException: parquet.io.ParquetDecodingException: Can not read value at in file hdfs://master01.hadoop.dtmobile.cn:8020/user/hive/warehouse/capacity.db/cell_random_grid_tmp2/part-00000-82a689a5-7c2a-48a0-ab17-8bf04c963ea6-c000.snappy.parquet (state=,code=0) : jdbc:hive2://master01.hadoop.dtmobile.cn:1>
通过spark2.3 sparksql执行写数据到hive,saveAsTable(),sparksql写数据到hive时候,默认是保存为parquet+snappy的数据。在数据保存完成之后,通过hive beeline查询,报错如上。但是通过spark查询,执行正常。
在stackoverflow上找到同样的问题:
根本原因如下:
This issue is caused because of different parquet conventions used in Hive and Spark. In Hive, the decimal datatype is represented as fixed bytes (INT 32). In Spark 1.4 or later the default convention is to use the Standard Parquet representation for decimal data type. As per the Standard Parquet representation based on the precision of the column datatype, the underlying representation changes.
eg: DECIMAL can be used to annotate the following types: int32: for 1 <= precision <= 9 int64: for 1 <= precision <= 18; precision < 10 will produce a warning
Hence this issue happens only with the usage of datatypes which have different representations in the different Parquet conventions. If the datatype is DECIMAL (10,3), both the conventions represent it as INT32, hence we won't face an issue. If you are not aware of the internal representation of the datatypes it is safe to use the same convention used for writing while reading. With Hive, you do not have the flexibility to choose the Parquet convention. But with Spark, you do.
Solution: The convention used by Spark to write Parquet data is configurable. This is determined by the property spark.sql.parquet.writeLegacyFormat The default value is false. If set to "true", Spark will use the same convention as Hive for writing the Parquet data. This will help to solve the issue.
所以尝试调整参数 spark.sql.parquet.writeLegacyFormat = true,问题解决。
到spark2.3源代码中查找该参数(spark.sql.parquet.writeLegacyFormat):
package org.apache.spark.sql.internal 中 关于sparksql的默认配置 SQLConf.scala中相关描述如下
val PARQUET_WRITE_LEGACY_FORMAT = buildConf("spark.sql.parquet.writeLegacyFormat") .doc("Whether to be compatible with the legacy Parquet format adopted by Spark 1.4 and prior " + "versions, when converting Parquet schema to Spark SQL schema and vice versa.") .booleanConf .createWithDefault(false)
可以看到默认值为false
在 package org.apache.spark.sql.execution.datasources.parquet 的关于ParquetWriteSupport.scala 的描述如下:
/** * A Parquet [[WriteSupport]] implementation that writes Catalyst [[InternalRow]]s as Parquet * messages. This class can write Parquet data in two modes: * * - Standard mode: Parquet data are written in standard format defined in parquet-format spec. * - Legacy mode: Parquet data are written in legacy format compatible with Spark 1.4 and prior. * * This behavior can be controlled by SQL option `spark.sql.parquet.writeLegacyFormat`. The value * of this option is propagated to this class by the `init()` method and its Hadoop configuration * argument. */
ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs:...的更多相关文章
- python3: error while loading shared libraries: libpython3.5m.so.1.0: cannot open shared object file: No such file or directory
安装python3遇到报错: wget https://www.python.org/ftp/python/3.5.2/Python-3.5.2.tgz ./configure --prefix=/u ...
- svnadmin:error while loading shared libraries: libaprutil-1.so.0:cannot open shared object file: No such file or directory
wdcp下安装svn后一直提示 svnadmin:error while loading shared libraries: libaprutil-1.so.0:cannot open shared ...
- 动态链接库找不到 : error while loading shared libraries: libgsl.so.0: cannot open shared object file: No such file or directory
问题: 运行gsl(GNU scientific Library)的函数库,用 gcc erf.c -I/usr/local/include -L/usr/local/lib64 -L/usr/loc ...
- 解决libpython2.6.so.1.0: cannot open shared object file
文章解决的问题:安装nginx中需要Python2.6的支持,下面介绍如何安装Python2.6,并建立lib的连接. 问题展示:error while loading shared librarie ...
- ./filezilla: error while loading shared libraries: libpng12.so.0: cannot open shared object file: No such file or directory
opensuse系统 在filezilla官网下载压缩文件解压运行后报 ./filezilla: error while loading shared libraries: libpng12.so.0 ...
- error while loading shared libraries: libpthread.so.0: cannot open shared object file: No such file
安装rac10g,出现例如以下错误: [root@rac2 oracle]# /u01/product/crs/root.sh WARNING: directory '/u01/product' is ...
- tensorflow-gpu版本出现libcublas.so.8.0:cannot open shared object file
文章主要参考以下博客https://www.aliyun.com/zixun/wenji/1289957.html 在利用GPU加速tensorflow时,出现了libcublas.so.8.0:ca ...
- ubuntu下tensorflow 报错 libcusolver.so.8.0: cannot open shared object file: No such file or directory
解决方法1. 在终端执行: export LD_LIBRARY_PATH=”$LD_LIBRARY_PATH:/usr/local/cuda/lib64” export CUDA_HOME=/usr/ ...
- 〖Android〗arm-linux-androideabi-gdb报 libpython2.6.so.1.0: cannot open shared object file错误的解决方法
执行: prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.6/bin/arm-linux-androideabi-gdb out/target/p ...
随机推荐
- containsObject 总是不含有,你会用吗
结论:containsObject:是在比较内存地址,即使两个对象内容完全一样,地址不同,那也是不同的.我个人认为这个方法应该叫是否存在同一个对象 (开始不知道这个知识,被坑,至少浪费了3个钟头,数组 ...
- Android的简述4
NoteEditor深入分析 首先来弄清楚“日志编辑“的状态转换,通过上篇文章的方法来做下面这样一个实验,首先进入“日志编辑“时会触发onCreate和onResume,然后用户通过Option Me ...
- javaweb入门---web服务器与HTTP协议基础
上文web基础简介了web到底是什么,以及身为Java开发人员需要掌握的地方.本文将解答web服务器是什么,怎么使用?还有关于http协议的基础知识. web服务器 web服务器的大概念很广泛,但是通 ...
- Java秒杀系统实战系列~商品秒杀代码实战
摘要: 本篇博文是“Java秒杀系统实战系列文章”的第六篇,本篇博文我们将进入整个秒杀系统核心功能模块的代码开发,即“商品秒杀”功能模块的代码实战. 内容: “商品秒杀”功能模块是建立在“商品详情”功 ...
- go interface衍生的插件化处理
在设计程序的许多应用场景中我们会遇到大体分为三个阶段的任务流. 第一.入口 一个或多个入口,等待阻塞的.或者主动请求方式的. ============================== 比如任务流需 ...
- c#小灶——输出语句
前面我我们学习了如何在控制台输出一句话,今天我们学习一下更详细的输出方式. Console.WriteLine();和Console.Write(); 我们来看一下下面几行代码, using Syst ...
- Netty源码解析---服务端启动
Netty源码解析---服务端启动 一个简单的服务端代码: public class SimpleServer { public static void main(String[] args) { N ...
- Oracle SQL常用内置系统函数总结
Oracle数据库 内置系统函数主要分为以下类别:数学函数.字符串函数.日期函数.转换函数.聚合函数.分析聚合函数 一.数学函数 ------------返回数字 abs(n):返回数字 ...
- MySQL操作命令梳理(1)
一.索引 1.创建索引 索引的创建可以在CREATE TABLE语句中进行,也可以单独用CREATE INDEX或ALTER TABLE来给表增加索引.以下命令语句分别展示了如何创建主键索引(PRIM ...
- vue过滤器微信小程序过滤器和百度智能小程序过滤器
因为最近写了微信小程序和百度小程序,用到了过滤器,感觉还挺好用的,所以就来总结一下,希望能帮到你们. 1. 微信小程序过滤器: 1.1:首先建一个单独的wxs后缀的文件,一般放在utils文件夹里面. ...