【原创】大数据基础之Drill(2)Drill1.14+Hive2.1.1运行
问题
Drill最新版本是1.14,从1.13开始Drill支持hive的版本升级到2.3.2,详见1.13的release notes
The Hive client for Drill is updated to version 2.3.2. With the update, Drill supports queries on transactional (ACID) and non-transactional Hive bucketed ORC tables. The updated libraries are backward compatible with earlier versions of the Hive server and metastore. (DRILL-5978)
强行使用Drill1.14连接Hive2.1.1会由于metastore thrift接口变化导致问题,具体体现为 show tables是空,具体报错如下:
2018-10-10 13:03:54,355 [244277c5-ba8c-b6c8-8f99-2cdde9f3c4d8:frag:0:0] WARN o.a.d.e.s.h.DrillHiveMetaStoreClient - Failure while attempting to get hive table. Retries once.
org.apache.thrift.TApplicationException: Invalid method name: 'get_table_req'
at org.apache.thrift.TApplicationException.read(TApplicationException.java:111) ~[drill-hive-exec-shaded-1.14.0.jar:1.14.0]
at org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$TableLoader.load(DrillHiveMetaStoreClient.java:531) [drill-storage-hive-core-1.14.0.jar:1.14.0]
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527) [guava-18.0.jar:na]
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319) [guava-18.0.jar:na]
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282) [guava-18.0.jar:na]
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197) [guava-18.0.jar:na]
at com.google.common.cache.LocalCache.get(LocalCache.java:3937) [guava-18.0.jar:na]
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941) [guava-18.0.jar:na]
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824) [guava-18.0.jar:na]
at org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$HiveClientWithCaching.getHiveReadEntry(DrillHiveMetaStoreClient.java:495) [drill-storage-hive-core-1.14.0.jar:1.14.0]
at org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getSelectionBaseOnName(HiveSchemaFactory.java:230) [drill-storage-hive-core-1.14.0.jar:1.14.0]
at org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getDrillTable(HiveSchemaFactory.java:210) [drill-storage-hive-core-1.14.0.jar:1.14.0]
at org.apache.drill.exec.store.hive.schema.HiveDatabaseSchema.getTable(HiveDatabaseSchema.java:62) [drill-storage-hive-core-1.14.0.jar:1.14.0]
at org.apache.drill.exec.store.AbstractSchema.getTablesByNames(AbstractSchema.java:239) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.store.AbstractSchema.getTableNamesAndTypes(AbstractSchema.java:257) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.store.ischema.InfoSchemaRecordGenerator$Tables.visitTables(InfoSchemaRecordGenerator.java:301) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.store.ischema.InfoSchemaRecordGenerator.scanSchema(InfoSchemaRecordGenerator.java:216) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.store.ischema.InfoSchemaRecordGenerator.scanSchema(InfoSchemaRecordGenerator.java:209) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.store.ischema.InfoSchemaRecordGenerator.scanSchema(InfoSchemaRecordGenerator.java:209) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.store.ischema.InfoSchemaRecordGenerator.scanSchema(InfoSchemaRecordGenerator.java:196) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.store.ischema.InfoSchemaTableType.getRecordReader(InfoSchemaTableType.java:58) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.store.ischema.InfoSchemaBatchCreator.getBatch(InfoSchemaBatchCreator.java:34) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.store.ischema.InfoSchemaBatchCreator.getBatch(InfoSchemaBatchCreator.java:30) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:159) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:182) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:137) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:182) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.physical.impl.ImplCreator.getRootExec(ImplCreator.java:110) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:87) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:261) [drill-java-exec-1.14.0.jar:1.14.0]
at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.14.0.jar:1.14.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_60]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_60]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
编译
于是尝试重新编译Drill1.14,将依赖的hive版本降到2.1.1,下载代码
http://mirror.bit.edu.cn/apache/drill/drill-1.14.0/apache-drill-1.14.0-src.tar.gz
POM
修改pom中的hive版本
<hive.version>2.3.2</hive.version>
修改为<hive.version>2.1.1</hive.version>
重新编译打包后发现问题依旧,经检查发现修改版本之后只有jars/3rdparty下的3个hive相关jar从2.3.2改为2.1.1
hive-contrib-2.1.1.jar
hive-hbase-handler-2.1.1.jar
hive-metastore-2.1.1.jar
报错的jar是drill-hive-exec-shaded-1.14.0.jar,这个jar包中包含包含hive-exec及依赖,
<artifactId>maven-shade-plugin</artifactId>
<configuration>
<artifactSet>
<includes>
<include>org.apache.hive:hive-exec</include>
并且没有使用配置的hive.version
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<scope>compile</scope>
导致打进jar包中的hive-exec是2.3.2版本的,增加hive.version配置
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>${hive.version}</version>
<scope>compile</scope>
再打包,问题消失,show tables正常;
Hadoop Location
官方文档说明如下:
Apache Drill users must tell Drill-on-YARN the location of your Hadoop install. Set the HADOOP_HOME environment variable in $DRILL_SITE/drillenv.sh to point to your Hadoop installation:
export HADOOP_HOME= /path/to/hadoop-home
但配置之后依然存在问题:
1)报错
Diagnostics: File file:/user/drill/site.tar.gz does not exist
java.io.FileNotFoundException: File file:/user/drill/site.tar.gz does not exist
需要添加link
ln -s $HADOOP_HOME/etc/hadoop/core-site.xml $DRILL_SITE/core-site.xml
2)在实际查询时会报错找不到hdfs_name,需要添加link
ln -s $HADOOP_HOME/etc/hadoop/hdfs-site.xml $DRILL_SITE/hdfs-site.xml
【原创】大数据基础之Drill(2)Drill1.14+Hive2.1.1运行的更多相关文章
- 【原创】大数据基础之Drill(1)简介、安装及使用
https://drill.apache.org/ 一 简介 Drill is an Apache open-source SQL query engine for Big Data explorat ...
- 【原创】大数据基础之Zookeeper(2)源代码解析
核心枚举 public enum ServerState { LOOKING, FOLLOWING, LEADING, OBSERVING; } zookeeper服务器状态:刚启动LOOKING,f ...
- 【原创】大数据基础之Benchmark(2)TPC-DS
tpc 官方:http://www.tpc.org/ 一 简介 The TPC is a non-profit corporation founded to define transaction pr ...
- 【原创】大数据基础之词频统计Word Count
对文件进行词频统计,是一个大数据领域的hello word级别的应用,来看下实现有多简单: 1 Linux单机处理 egrep -o "\b[[:alpha:]]+\b" test ...
- 【原创】大数据基础之Impala(1)简介、安装、使用
impala2.12 官方:http://impala.apache.org/ 一 简介 Apache Impala is the open source, native analytic datab ...
- 大数据基础知识:分布式计算、服务器集群[zz]
大数据中的数据量非常巨大,达到了PB级别.而且这庞大的数据之中,不仅仅包括结构化数据(如数字.符号等数据),还包括非结构化数据(如文本.图像.声音.视频等数据).这使得大数据的存储,管理和处理很难利用 ...
- 大数据基础知识问答----spark篇,大数据生态圈
Spark相关知识点 1.Spark基础知识 1.Spark是什么? UCBerkeley AMPlab所开源的类HadoopMapReduce的通用的并行计算框架 dfsSpark基于mapredu ...
- 大数据基础知识问答----hadoop篇
handoop相关知识点 1.Hadoop是什么? Hadoop是一个由Apache基金会所开发的分布式系统基础架构.用户可以在不了解分布式底层细节的情况下,开发分布式程序.充分利用集群的威力进行高速 ...
- hadoop大数据基础框架技术详解
一.什么是大数据 进入本世纪以来,尤其是2010年之后,随着互联网特别是移动互联网的发展,数据的增长呈爆炸趋势,已经很难估计全世界的电子设备中存储的数据到底有多少,描述数据系统的数据量的计量单位从MB ...
随机推荐
- python对 if __name__=='__main__'的理解
对于学过其他编程语言的人来说都知道程序都是从main函数开始执行的,而对于python来说他并没有主函数,他不像其他语言需要需要转化为二进制文件 然后才能执行,他时通过翻译器从第一行开始逐行执行,所以 ...
- Player启动时提示 "System.InvalidOperationException:无法加载计数器名称数据
问题 播放器意外断电重启后可能导致Player启动时报错,提示如下: 原因 这个提示一般指 Universal Player 找不到或无法设置一个Windows Performance Monitor ...
- MySQL8.0-NoSQL和SQL的对比及MySQL的优势
一.SQL VS NoSQL SQL:关系型数据库,用SQL语句来操作数据 NOSQL:非关系型数据库,NoSQL的含义是不仅仅有SQL,而实际上大多数NoSQL不用SQL来操作数据 常见的关系型数据 ...
- Spring Cloud微服务Ribbon负载均衡/Zuul网关使用
客户端负载均衡,当服务节点出现问题时进行调节或是在正常情况下进行 服务调度.所谓的负载均衡,就是当服务提供的数量和调用方对服务进行 取舍的调节问题,在spring cloud中是通过Ribbon来解决 ...
- 如何去掉wordpress网站url里面的index.php(Apache服务器)
在wordpress根目录新建.htaccess文件,并拷贝以下代码保存即可. <IfModule mod_rewrite.c> RewriteEngine On RewriteBase ...
- 解决 MariaDB无密码就可以登录的问题
问题: 困扰了很久的问题,, 使用apt-get来安装mysql,安装好之后发现安装的是 MariaDB,如下,无需密码既可以登录了.即使使用mysqladmin设置好密码,用密码登录可以,不用密码登 ...
- 帝国CMS Table '***.phome_ecms_news_data_' doesn't exist
帝国CMS刷新内容页出现以下错误 1 Table 'www.536831.com.phome_ecms_news_data_' doesn't exist select keyid,dokey,n ...
- 用IntelliJ IDEA 开发Spring+SpringMVC+Mybatis框架 分步搭建四:配置springmvc
在用IntelliJ IDEA 开发Spring+SpringMVC+Mybatis框架 分步搭建三:配置spring并测试的基础上 继续进行springmvc的配置 一:配置完善web.xml文件
- pymongo 使用方法(增删改查)
#!/usr/bin/env python # -*- coding:utf-8 -*- """ MongoDB存储 在这里我们来看一下Python3下MongoDB的存 ...
- mongodb中比较级查询条件:($lt $lte $gt $gte)(大于、小于)、查找条件
查询表中学生年级大于20,如下: db.getCollection('student').find({'age':{'$gt':'20'}}) $lt < (less than ) ...