ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs:...

: jdbc:hive2://master01.hadoop.dtmobile.cn:1> select * from cell_random_grid_tmp2 limit 1;
INFO : Compiling command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5):
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:grid_row_id, type:,), comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5); Time taken: 0.045 seconds
INFO : Executing command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5):
INFO : Completed executing command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5); Time taken: 0.001 seconds
INFO : OK
Error: java.io.IOException: parquet.io.ParquetDecodingException: Can not read value at   in file hdfs://master01.hadoop.dtmobile.cn:8020/user/hive/warehouse/capacity.db/cell_random_grid_tmp2/part-00000-82a689a5-7c2a-48a0-ab17-8bf04c963ea6-c000.snappy.parquet (state=,code=0)
: jdbc:hive2://master01.hadoop.dtmobile.cn:1>

通过spark2.3 sparksql执行写数据到hive，saveAsTable()，sparksql写数据到hive时候，默认是保存为parquet+snappy的数据。在数据保存完成之后，通过hive beeline查询，报错如上。但是通过spark查询，执行正常。

在stackoverflow上找到同样的问题：

根本原因如下：

This issue is caused because of different parquet conventions used in Hive and Spark. In Hive, the decimal datatype is represented as fixed bytes (INT 32). In Spark 1.4 or later the default convention is to use the Standard Parquet representation for decimal data type. As per the Standard Parquet representation based on the precision of the column datatype, the underlying representation changes.
eg: DECIMAL can be used to annotate the following types: int32: for 1 <= precision <= 9 int64: for 1 <= precision <= 18; precision < 10 will produce a warning

Hence this issue happens only with the usage of datatypes which have different representations in the different Parquet conventions. If the datatype is DECIMAL (10,3), both the conventions represent it as INT32, hence we won't face an issue. If you are not aware of the internal representation of the datatypes it is safe to use the same convention used for writing while reading. With Hive, you do not have the flexibility to choose the Parquet convention. But with Spark, you do.

Solution: The convention used by Spark to write Parquet data is configurable. This is determined by the property spark.sql.parquet.writeLegacyFormat The default value is false. If set to "true", Spark will use the same convention as Hive for writing the Parquet data. This will help to solve the issue.

所以尝试调整参数 spark.sql.parquet.writeLegacyFormat = true，问题解决。

到spark2.3源代码中查找该参数(spark.sql.parquet.writeLegacyFormat)：

package org.apache.spark.sql.internal 中关于sparksql的默认配置 SQLConf.scala中相关描述如下

  val PARQUET_WRITE_LEGACY_FORMAT = buildConf("spark.sql.parquet.writeLegacyFormat")
    .doc("Whether to be compatible with the legacy Parquet format adopted by Spark 1.4 and prior " +
      "versions, when converting Parquet schema to Spark SQL schema and vice versa.")
    .booleanConf
    .createWithDefault(false)

可以看到默认值为false

在 package org.apache.spark.sql.execution.datasources.parquet 的关于ParquetWriteSupport.scala 的描述如下：

/**
 * A Parquet [[WriteSupport]] implementation that writes Catalyst [[InternalRow]]s as Parquet
 * messages.  This class can write Parquet data in two modes:
 *
 *  - Standard mode: Parquet data are written in standard format defined in parquet-format spec.
 *  - Legacy mode: Parquet data are written in legacy format compatible with Spark 1.4 and prior.
 *
 * This behavior can be controlled by SQL option `spark.sql.parquet.writeLegacyFormat`.  The value
 * of this option is propagated to this class by the `init()` method and its Hadoop configuration
 * argument.
 */

ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs:...的更多相关文章

python3: error while loading shared libraries: libpython3.5m.so.1.0: cannot open shared object file: No such file or directory
安装python3遇到报错: wget https://www.python.org/ftp/python/3.5.2/Python-3.5.2.tgz ./configure --prefix=/u ...
svnadmin:error while loading shared libraries: libaprutil-1.so.0:cannot open shared object file: No such file or directory
wdcp下安装svn后一直提示 svnadmin:error while loading shared libraries: libaprutil-1.so.0:cannot open shared ...
动态链接库找不到： error while loading shared libraries: libgsl.so.0: cannot open shared object file: No such file or directory
问题: 运行gsl(GNU scientific Library)的函数库,用 gcc erf.c -I/usr/local/include -L/usr/local/lib64 -L/usr/loc ...
解决libpython2.6.so.1.0: cannot open shared object file
文章解决的问题:安装nginx中需要Python2.6的支持,下面介绍如何安装Python2.6,并建立lib的连接. 问题展示:error while loading shared librarie ...
./filezilla: error while loading shared libraries: libpng12.so.0: cannot open shared object file: No such file or directory
opensuse系统在filezilla官网下载压缩文件解压运行后报 ./filezilla: error while loading shared libraries: libpng12.so.0 ...
error while loading shared libraries: libpthread.so.0: cannot open shared object file: No such file
安装rac10g,出现例如以下错误: [root@rac2 oracle]# /u01/product/crs/root.sh WARNING: directory '/u01/product' is ...
tensorflow-gpu版本出现libcublas.so.8.0:cannot open shared object file
文章主要参考以下博客https://www.aliyun.com/zixun/wenji/1289957.html 在利用GPU加速tensorflow时,出现了libcublas.so.8.0:ca ...
ubuntu下tensorflow 报错 libcusolver.so.8.0: cannot open shared object file: No such file or directory
解决方法1. 在终端执行: export LD_LIBRARY_PATH=”$LD_LIBRARY_PATH:/usr/local/cuda/lib64” export CUDA_HOME=/usr/ ...
〖Android〗arm-linux-androideabi-gdb报 libpython2.6.so.1.0: cannot open shared object file错误的解决方法
执行: prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.6/bin/arm-linux-androideabi-gdb out/target/p ...

随机推荐

Servlet高级应用
会话只是指一段指定的时间间隔. 会话跟踪是维护用户状态(数据)的一种方式.它也被称为servlet中的会话管理. Http协议是一个无状态的,所以我们需要使用会话跟踪技术来维护用户状态. 每次用户请求 ...
yum源使用报错
CentOS系统yum源使用报错:Error: Cannot retrieve repository metadata (repomd.xml) for repository: rpmforge. 服 ...
K8S 部署 Web UI
在早期的版本中 Kubernetes可以在 Dashboard 中看到 heapster 提供的一些图表信息, 在后续的版本中会陆续移除掉 heapster,现在更加流行的监控工具是 promethe ...
【原创】NES第一波：如何用通用型6502宏汇编器，制用NES/FC游戏。
在163的博客关了呀.在这边重新开张了. 以后若网友有什么要长篇解答的问题,也在这儿作答. 作为第一波原创文章,我打算做一次小白示范.那就是一步一步的展示某个汇编编译器的用法. 一.科普很多人认为程 ...
java高并发系列 - 第21天：java中的CAS操作，java并发的基石
这是java高并发系列第21篇文章. 本文主要内容从网站计数器实现中一步步引出CAS操作介绍java中的CAS及CAS可能存在的问题悲观锁和乐观锁的一些介绍及数据库乐观锁的一个常见示例使用ja ...
String常量池和intern方法
String s1 = "Hello"; String s2 = "Hello"; String s3 = "Hel" + "lo ...
10分钟了解一致性hash算法
应用场景当我们的数据表超过500万条或更多时,我们就会考虑到采用分库分表:当我们的系统使用了一台缓存服务器还是不能满足的时候,我们会使用多台缓存服务器,那我们如何去访问背后的库表或缓存服务器呢,我们 ...
Spring cloud Feign不支持对象传参解决办法[完美解决]
spring cloud 使用 Feign 进行服务调用时,不支持对象参数. 通常解决方法是,要么把对象每一个参数平行展开,并使用 @RequestParam 标识出每一个参数,要么用 @Reques ...
佳木斯集训Day3
D3是我的巅峰 D3的出题人毒瘤!!!T3放了一道莫队,我们全体爆炸,到现在只有一个奆老A掉了T3 据说lkh被晓姐姐D了 T1是个26进制数,当时在考场上想了好久才想到(太次了)注意需要处理一下溢出 ...
喜大普奔 | 微信小程序支持PC端打开了
微信小程序可以在PC端打开啦微信PC版发布了v2.7.0测试版,其中一个重磅的功能就是:支持打开聊天中分享的小程序咖啡君这么喜欢尝鲜的人自然是在第一时间下载进行了体验安装成功,会有功能更新说明 ...

ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs:...

ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs:...的更多相关文章

随机推荐

热门专题