【原创】大叔问题定位分享（15）spark写parquet数据报错ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead

【【原创】大叔问题定位分享（15）spark写parquet数据报错ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead】的更多相关文章

【原创】大叔问题定位分享（15）spark写parquet数据报错ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead

spark 2.1.1 spark里执行sql报错 insert overwrite table test_parquet_table select * from dummy 报错如下: org.apache.spark.SparkException: Task failed while writing rows. at org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.writeToFile(hiveWrite…

【原创】大叔问题定位分享（16）spark写数据到hive外部表报错ClassCastException: org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to org.apache.hadoop.hive.ql.io.HiveOutputFormat

spark 2.1.1 spark在写数据到hive外部表(底层数据在hbase中)时会报错 Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to org.apache.hadoop.hive.ql.io.HiveOutputFormat at org.apache.spark.sql.hive.SparkHiveWrit…

【原创】大叔问题定位分享（2）spark任务一定几率报错java.lang.NoSuchFieldError: HIVE_MOVE_FILES_THREAD_COUNT

最近用yarn cluster方式提交spark任务时,有时会报错,报错几率是40%,报错如下: 18/03/15 21:50:36 116 ERROR ApplicationMaster91: User class threw exception: org.apache.spark.sql.AnalysisException: java.lang.NoSuchFieldError: HIVE_MOVE_FILES_THREAD_COUNT; org.apache.spark.sql.Analy…

【原创】大叔问题定位分享（12）Spark保存文本类型文件（text、csv、json等）到hdfs时为什么是压缩格式的

问题重现 rdd.repartition(1).write.csv(outPath) 写文件之后发现文件是压缩过的 write时首先会获取hadoopConf,然后从中获取是否压缩以及压缩格式 org.apache.spark.sql.execution.datasources.DataSource def write( org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand val hadoopC…

【原创】大叔问题定位分享（8）提交spark任务报错 Caused by: java.lang.ClassNotFoundException: org.I0Itec.zkclient.exception.ZkNoNodeException

spark 2.1.1 一问题重现 spark-submit --master local[*] --class app.package.AppClass --jars /jarpath/zkclient-0.3.jar --driver-memory 1g app.jar 报错 Java HotSpot(TM) 64-Bit Server VM warning: Setting CompressedClassSpaceSize has no effect when compressed cl…

【原创】大叔问题定位分享（27）spark中rdd.cache

spark 2.1.1 spark应用中有一些task非常慢,持续10个小时,有一个task日志如下: 2019-01-24 21:38:56,024 [dispatcher-event-loop-22] INFO org.apache.spark.executor.CoarseGrainedExecutorBackend - Got assigned task 40312019-01-24 21:38:56,024 [Executor task launch worker for task 4…

【原创】大叔问题定位分享（21）spark执行insert overwrite非常慢，比hive还要慢

最近把一些sql执行从hive改到spark,发现执行更慢,sql主要是一些insert overwrite操作,从执行计划看到,用到InsertIntoHiveTable spark-sql> explain insert overwrite table test2 select * from test1;== Physical Plan ==InsertIntoHiveTable MetastoreRelation temp, test2, true, false+- HiveTableSc…

【原创】大叔问题定位分享（19）spark task在executors上分布不均

最近提交一个spark应用之后发现执行非常慢,点开spark web ui之后发现卡在一个job的一个stage上,这个stage有100000个task,但是绝大部分task都分配到两个executor上,其他executor非常空闲,what happened? 查看spark task分配逻辑发现,有一个data locality即数据本地性的特性,详见 https://www.cnblogs.com/barneywill/p/10152497.html即会按照locality级别的优先级…

【原创】大叔问题定位分享（18）beeline连接spark thrift有时会卡住

spark 2.1.1 beeline连接spark thrift之后,执行use database有时会卡住,而use database 在server端对应的是 setCurrentDatabase, 经过排查发现当时spark thrift正在执行insert操作, org.apache.spark.sql.hive.execution.InsertIntoHiveTable protected override def doExecute(): RDD[InternalRow] = {…

【原创】大叔问题定位分享（17）spark查orc格式数据偶尔报错NullPointerException

spark查orc格式的数据有时会报这个错 Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$BISplitStrategy.getSplits(OrcInputFormat.java:560) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat…