ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs:...

: jdbc:hive2://master01.hadoop.dtmobile.cn:1> select * from cell_random_grid_tmp2 limit 1;
INFO : Compiling command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5):
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:grid_row_id, type:,), comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5); Time taken: 0.045 seconds
INFO : Executing command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5):
INFO : Completed executing command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5); Time taken: 0.001 seconds
INFO : OK
Error: java.io.IOException: parquet.io.ParquetDecodingException: Can not read value at   in file hdfs://master01.hadoop.dtmobile.cn:8020/user/hive/warehouse/capacity.db/cell_random_grid_tmp2/part-00000-82a689a5-7c2a-48a0-ab17-8bf04c963ea6-c000.snappy.parquet (state=,code=0)
: jdbc:hive2://master01.hadoop.dtmobile.cn:1>

通过spark2.3 sparksql执行写数据到hive，saveAsTable()，sparksql写数据到hive时候，默认是保存为parquet+snappy的数据。在数据保存完成之后，通过hive beeline查询，报错如上。但是通过spark查询，执行正常。

在stackoverflow上找到同样的问题：

根本原因如下：

This issue is caused because of different parquet conventions used in Hive and Spark. In Hive, the decimal datatype is represented as fixed bytes (INT 32). In Spark 1.4 or later the default convention is to use the Standard Parquet representation for decimal data type. As per the Standard Parquet representation based on the precision of the column datatype, the underlying representation changes.
eg: DECIMAL can be used to annotate the following types: int32: for 1 <= precision <= 9 int64: for 1 <= precision <= 18; precision < 10 will produce a warning

Hence this issue happens only with the usage of datatypes which have different representations in the different Parquet conventions. If the datatype is DECIMAL (10,3), both the conventions represent it as INT32, hence we won't face an issue. If you are not aware of the internal representation of the datatypes it is safe to use the same convention used for writing while reading. With Hive, you do not have the flexibility to choose the Parquet convention. But with Spark, you do.

Solution: The convention used by Spark to write Parquet data is configurable. This is determined by the property spark.sql.parquet.writeLegacyFormat The default value is false. If set to "true", Spark will use the same convention as Hive for writing the Parquet data. This will help to solve the issue.

所以尝试调整参数 spark.sql.parquet.writeLegacyFormat = true，问题解决。

到spark2.3源代码中查找该参数(spark.sql.parquet.writeLegacyFormat)：

package org.apache.spark.sql.internal 中关于sparksql的默认配置 SQLConf.scala中相关描述如下

  val PARQUET_WRITE_LEGACY_FORMAT = buildConf("spark.sql.parquet.writeLegacyFormat")
    .doc("Whether to be compatible with the legacy Parquet format adopted by Spark 1.4 and prior " +
      "versions, when converting Parquet schema to Spark SQL schema and vice versa.")
    .booleanConf
    .createWithDefault(false)

可以看到默认值为false

在 package org.apache.spark.sql.execution.datasources.parquet 的关于ParquetWriteSupport.scala 的描述如下：

/**
 * A Parquet [[WriteSupport]] implementation that writes Catalyst [[InternalRow]]s as Parquet
 * messages.  This class can write Parquet data in two modes:
 *
 *  - Standard mode: Parquet data are written in standard format defined in parquet-format spec.
 *  - Legacy mode: Parquet data are written in legacy format compatible with Spark 1.4 and prior.
 *
 * This behavior can be controlled by SQL option `spark.sql.parquet.writeLegacyFormat`.  The value
 * of this option is propagated to this class by the `init()` method and its Hadoop configuration
 * argument.
 */

ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs:...的更多相关文章

python3: error while loading shared libraries: libpython3.5m.so.1.0: cannot open shared object file: No such file or directory
安装python3遇到报错: wget https://www.python.org/ftp/python/3.5.2/Python-3.5.2.tgz ./configure --prefix=/u ...
svnadmin:error while loading shared libraries: libaprutil-1.so.0:cannot open shared object file: No such file or directory
wdcp下安装svn后一直提示 svnadmin:error while loading shared libraries: libaprutil-1.so.0:cannot open shared ...
动态链接库找不到： error while loading shared libraries: libgsl.so.0: cannot open shared object file: No such file or directory
问题: 运行gsl(GNU scientific Library)的函数库,用 gcc erf.c -I/usr/local/include -L/usr/local/lib64 -L/usr/loc ...
解决libpython2.6.so.1.0: cannot open shared object file
文章解决的问题:安装nginx中需要Python2.6的支持,下面介绍如何安装Python2.6,并建立lib的连接. 问题展示:error while loading shared librarie ...
./filezilla: error while loading shared libraries: libpng12.so.0: cannot open shared object file: No such file or directory
opensuse系统在filezilla官网下载压缩文件解压运行后报 ./filezilla: error while loading shared libraries: libpng12.so.0 ...
error while loading shared libraries: libpthread.so.0: cannot open shared object file: No such file
安装rac10g,出现例如以下错误: [root@rac2 oracle]# /u01/product/crs/root.sh WARNING: directory '/u01/product' is ...
tensorflow-gpu版本出现libcublas.so.8.0:cannot open shared object file
文章主要参考以下博客https://www.aliyun.com/zixun/wenji/1289957.html 在利用GPU加速tensorflow时,出现了libcublas.so.8.0:ca ...
ubuntu下tensorflow 报错 libcusolver.so.8.0: cannot open shared object file: No such file or directory
解决方法1. 在终端执行: export LD_LIBRARY_PATH=”$LD_LIBRARY_PATH:/usr/local/cuda/lib64” export CUDA_HOME=/usr/ ...
〖Android〗arm-linux-androideabi-gdb报 libpython2.6.so.1.0: cannot open shared object file错误的解决方法
执行: prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.6/bin/arm-linux-androideabi-gdb out/target/p ...

随机推荐

<<Modern CMake>> 翻译 2.3 与代码通信
<<Modern CMake>> 翻译 2.3 与代码通信配置文件 CMake 允许您使用代码通过 configure_file 存取 CMake 变量. 此命令复制一个文件 ...
hdu第十场Cyclic
本题主要是对用容斥的使用,正难则反,对于要求满足题意的可以求不满足题意的先考虑对于长度至少为2的连续序列,易得其排列C(n,1)*(n-2)!,意为从剩下n个数字中选取连续的两个. 方法总计为n,即 ...
Promise异步编程解决方案
Promise是ES6中新增的异步编程解决方案,体现在代码中它是一个对象,可以通过 Promise 构造函数来实例化. 其最基本的使用 new Promise(function(resolve,rej ...
Windows上切换java8和java11
Windows上安装了java8和java11,时不时要切换,于是思考写行命令解决.思路是修改java_home变量.我的java_home变量是设置在系统级别的. 修改环境变量有2个命令,set和s ...
c#小灶——输出语句
前面我我们学习了如何在控制台输出一句话,今天我们学习一下更详细的输出方式. Console.WriteLine();和Console.Write(); 我们来看一下下面几行代码, using Syst ...
TextView 使用详解
极力推荐文章:欢迎收藏 Android 干货分享阅读五分钟,每日十点,和您一起终身学习,这里是程序员Android 本篇文章主要介绍 Android 开发中的部分知识点,通过阅读本篇文章,您将收获以 ...
Permission 使用详解
极力推荐文章:欢迎收藏 Android 干货分享阅读五分钟,每日十点,和您一起终身学习,这里是程序员Android 本篇文章主要介绍 Android 开发中的部分知识点,通过阅读本篇文章,您将收获以 ...
PythonDay05
第五章今日内容字典字典语法:{'key1':1,'key2':2} 注意:dict保存的数据不是按照我们添加进去的顺序保存的. 是按照hash表的顺序保存的. ⽽hash表不是连续的. 所以 ...
Usaco Training [1.3] wormhole
传送门解题要素:代码能力解题步骤:理解题意 - >搜索枚举所有可能的配对情况 - >判断冲突并求解 - >调试一. 理解题意这里讲几个不容易理解的点: 1. +x方向即向右 ...
Vue系列：Vue Router 路由梳理
Vue Router 是 Vue.js 官方的路由管理器.它和 Vue.js 的核心深度集成,让构建单页面应用变得易如反掌.包含的功能有: 嵌套的路由/视图表模块化的.基于组件的路由配置路由参数. ...

ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs:...

ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs:...的更多相关文章

随机推荐

热门专题