: jdbc:hive2://master01.hadoop.dtmobile.cn:1> select * from cell_random_grid_tmp2 limit 1;
INFO : Compiling command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5):
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:grid_row_id, type:,), comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5); Time taken: 0.045 seconds
INFO : Executing command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5):
INFO : Completed executing command(queryId=hive_20190904113737_49bb8821-f8a1-4e49-a32e-12e3b45c6af5); Time taken: 0.001 seconds
INFO : OK
Error: java.io.IOException: parquet.io.ParquetDecodingException: Can not read value at   in file hdfs://master01.hadoop.dtmobile.cn:8020/user/hive/warehouse/capacity.db/cell_random_grid_tmp2/part-00000-82a689a5-7c2a-48a0-ab17-8bf04c963ea6-c000.snappy.parquet (state=,code=0)
: jdbc:hive2://master01.hadoop.dtmobile.cn:1>

通过spark2.3 sparksql执行写数据到hive,saveAsTable(),sparksql写数据到hive时候,默认是保存为parquet+snappy的数据。在数据保存完成之后,通过hive beeline查询,报错如上。但是通过spark查询,执行正常。

stackoverflow上找到同样的问题:

根本原因如下:

This issue is caused because of different parquet conventions used in Hive and Spark. In Hive, the decimal datatype is represented as fixed bytes (INT 32). In Spark 1.4 or later the default convention is to use the Standard Parquet representation for decimal data type. As per the Standard Parquet representation based on the precision of the column datatype, the underlying representation changes.
eg: DECIMAL can be used to annotate the following types: int32: for 1 <= precision <= 9 int64: for 1 <= precision <= 18; precision < 10 will produce a warning

Hence this issue happens only with the usage of datatypes which have different representations in the different Parquet conventions. If the datatype is DECIMAL (10,3), both the conventions represent it as INT32, hence we won't face an issue. If you are not aware of the internal representation of the datatypes it is safe to use the same convention used for writing while reading. With Hive, you do not have the flexibility to choose the Parquet convention. But with Spark, you do.

Solution: The convention used by Spark to write Parquet data is configurable. This is determined by the property spark.sql.parquet.writeLegacyFormat The default value is false. If set to "true", Spark will use the same convention as Hive for writing the Parquet data. This will help to solve the issue.

所以尝试调整参数 spark.sql.parquet.writeLegacyFormat = true,问题解决。

到spark2.3源代码中查找该参数(spark.sql.parquet.writeLegacyFormat):

package org.apache.spark.sql.internal 中 关于sparksql的默认配置 SQLConf.scala中相关描述如下

  val PARQUET_WRITE_LEGACY_FORMAT = buildConf("spark.sql.parquet.writeLegacyFormat")
    .doc("Whether to be compatible with the legacy Parquet format adopted by Spark 1.4 and prior " +
      "versions, when converting Parquet schema to Spark SQL schema and vice versa.")
    .booleanConf
    .createWithDefault(false)

可以看到默认值为false

在 package org.apache.spark.sql.execution.datasources.parquet 的关于ParquetWriteSupport.scala 的描述如下:

/**
 * A Parquet [[WriteSupport]] implementation that writes Catalyst [[InternalRow]]s as Parquet
 * messages.  This class can write Parquet data in two modes:
 *
 *  - Standard mode: Parquet data are written in standard format defined in parquet-format spec.
 *  - Legacy mode: Parquet data are written in legacy format compatible with Spark 1.4 and prior.
 *
 * This behavior can be controlled by SQL option `spark.sql.parquet.writeLegacyFormat`.  The value
 * of this option is propagated to this class by the `init()` method and its Hadoop configuration
 * argument.
 */

ParquetDecodingException: Can not read value at 0 in block -1 in file hdfs:...的更多相关文章

  1. python3: error while loading shared libraries: libpython3.5m.so.1.0: cannot open shared object file: No such file or directory

    安装python3遇到报错: wget https://www.python.org/ftp/python/3.5.2/Python-3.5.2.tgz ./configure --prefix=/u ...

  2. svnadmin:error while loading shared libraries: libaprutil-1.so.0:cannot open shared object file: No such file or directory

    wdcp下安装svn后一直提示 svnadmin:error while loading shared libraries: libaprutil-1.so.0:cannot open shared ...

  3. 动态链接库找不到 : error while loading shared libraries: libgsl.so.0: cannot open shared object file: No such file or directory

    问题: 运行gsl(GNU scientific Library)的函数库,用 gcc erf.c -I/usr/local/include -L/usr/local/lib64 -L/usr/loc ...

  4. 解决libpython2.6.so.1.0: cannot open shared object file

    文章解决的问题:安装nginx中需要Python2.6的支持,下面介绍如何安装Python2.6,并建立lib的连接. 问题展示:error while loading shared librarie ...

  5. ./filezilla: error while loading shared libraries: libpng12.so.0: cannot open shared object file: No such file or directory

    opensuse系统 在filezilla官网下载压缩文件解压运行后报 ./filezilla: error while loading shared libraries: libpng12.so.0 ...

  6. error while loading shared libraries: libpthread.so.0: cannot open shared object file: No such file

    安装rac10g,出现例如以下错误: [root@rac2 oracle]# /u01/product/crs/root.sh WARNING: directory '/u01/product' is ...

  7. tensorflow-gpu版本出现libcublas.so.8.0:cannot open shared object file

    文章主要参考以下博客https://www.aliyun.com/zixun/wenji/1289957.html 在利用GPU加速tensorflow时,出现了libcublas.so.8.0:ca ...

  8. ubuntu下tensorflow 报错 libcusolver.so.8.0: cannot open shared object file: No such file or directory

    解决方法1. 在终端执行: export LD_LIBRARY_PATH=”$LD_LIBRARY_PATH:/usr/local/cuda/lib64” export CUDA_HOME=/usr/ ...

  9. 〖Android〗arm-linux-androideabi-gdb报 libpython2.6.so.1.0: cannot open shared object file错误的解决方法

    执行: prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.6/bin/arm-linux-androideabi-gdb out/target/p ...

随机推荐

  1. 第二章 jQuery框架使用准备

    window常用属性: History:有关客户访问过的URL的信息 Location: 有关当前url的信息 常用方法: Confirm()将弹出一个确认对话框 open()在页面上弹出一个新的浏览 ...

  2. CDN绕过姿势小结

    公司的各业务主站都挂了CDN,总结一波CDN绕过技巧. 什么是CDN CDN的全称是Content Delivery Network,即内容分发网络. 其基本思路是尽可能避开互联网上有可能影响数据传输 ...

  3. XSS危害——session劫持(转载)

    在跨站脚本攻击XSS中简单介绍了XSS的原理及一个利用XSS盗取存在cookie中用户名和密码的小例子,有些同学看了后会说这有什么大不了的,哪里有人会明文往cookie里存用户名和密码.今天我们就介绍 ...

  4. TCP加速机制是如何加速的?

    一.什么是TCP加速?   TCP加速就是在高时延链路提高吞吐量的一系列解决方案.   二.为什么需要对TCP进行加速?   1.传统的TCP拥塞控制算法并不适用于高时延.高误码的链路. 2.随着we ...

  5. 后端基于方法的权限控制--Spirng-Security

    后端基于方法的权限控制--Spirng-Security 默认情况下, Spring Security 并不启用方法级的安全管控. 启用方法级的管控后, 可以针对不同的方法通过注解设置不同的访问条件: ...

  6. Linux服务部署Yapi项目(安装Node Mongdb Git Nginx等)

    Linux服务部署Yapi 一,介绍与需求 1,我的安装环境:CentOS7+Node10.13.0+MongoDB4.0.10. 2,首先安装wget,用于下载node等其他工具 yum insta ...

  7. Cannot attach the file “MvcMovie.mdf” as database “aspnet-MvcMovie”

    今天在微软开发人员官网上学习asp.net mvc5入门的时候,遇到一个棘手的问题,我是按照教程一步一步操作的,但期间遇到一个自己觉得莫名其妙的问题,教程中也没有提到这个, 在添加新字段这一章节,跟着 ...

  8. Python基础总结之初步认识---class类的继承(下)。第十五天开始(新手可相互督促

    年薪百万的步伐慢了两天hhhh严格意义是三天.最近买了新的玩具,在家玩玩玩!~~~~ 今天开始正式认识类的继承.类的继承是怎么继承呢?看下代码: class Animal(object): #父类 d ...

  9. A solution to the never shortened to-do list

    I once told my younger sister my learning system, and the basic five doctrines of my methodology. Bu ...

  10. Netty学习(六)-LengthFieldBasedFrameDecoder解码器

    在TCP协议中我们知道当我们在接收消息时候,我们如何判断我们一次读取到的包就是整包消息呢,特别是对于使用了长连接和使用了非阻塞I/O的程序.上节我们也说了上层应用协议为了对消息进行区分一般采用4种方式 ...