Hive 教程(九)-python with hive

本文介绍用 python 远程连接 hive，此时需要 hive 启动 hiveserver2 服务

windows 下报如下错误

thrift.transport.TTransport.TTransportException: Could not start SASL: Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2

不好玩，还是 linux 吧

安装依赖包

pip install sasl

pip install thrift

pip install thrift-sasl

pip install PyHive

最麻烦的是安装 sasl，出现异常参考下面的异常记录

python 操作 hive

官网方法-pyhs2

import pyhs2

## 连接 hve 和 数据库

with pyhs2.connect(host='192.168.10.10',

                   port=10000,

                   authMechanism="PLAIN",

                   user='postgres',

                   password='postgres',

                   database='hive1') as conn:

    with conn.cursor() as cur:

        #Show databases

        print cur.getDatabases()

        #Execute query

        cur.execute("show tables")      # 注意没有 分号

        #Return column info from query

        print cur.getSchema()

        cur.execute('select * from hive_01 limit 3')

        #Fetch table results

        for i in cur.fetch():

            print i

其他方法-pyhive

from pyhive import hive

conn = hive.Connection(host='192.168.10.10', port=10000, username="postgres", password='postgres', auth='CUSTOM')

cursor = conn.cursor()

cursor.execute('SHOW DATABASES')

# 打印结果

for result in cursor.fetchall():

     print(result)

auth

其他方法-impyla

其他依赖

pip install six

pip install bit_array

pip install thriftpy

demo

from impala.dbapi import connect

conn = connect(host ='****'，port = ****）

cursor = conn.cursor()

cursor.execute('SELECT * FROM mytable LIMIT 100')

print cursor.description   # 打印结果集的schema

results = cursor.fetchall()

异常记录

安装 sasl 报错如下

gcc: error trying to exec 'cc1plus': execvp: No such file or directory

error: command 'gcc' failed with exit status 1

解决方案

Ubuntu 系统的话可能需要先装好 libsasl2-dev

apt-get install libsasl2-dev

CentOS 系统需要预先装好 python-devel 和 cyrus-sasl-devel

yum install gcc-c++ python-devel.x86_64 cyrus-sasl-devel.x86_64

参考资料：

https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-PythonClientDriver 官网

https://www.lfd.uci.edu/~gohlke/pythonlibs/#sasl　　第三方包下载地址

https://www.jianshu.com/p/a0ff0a4495a1　　Python 连接hive（Linux）

https://www.jianshu.com/p/99581dce8309　　本文后面有编译python脚本

https://blog.csdn.net/clany888/article/details/82989068　　pyhive and impyla 两种方式

https://www.cnblogs.com/chushiyaoyue/p/5628882.html　　pyhs2 方式，写的代码多

http://www.zhangdongshengtech.com/article-detials/267　　pyhive

https://www.cnblogs.com/free-easy0000/p/9638982.html　　异常记录

https://www.jianshu.com/p/9dd3a741a8ba　　也总结了几种方式

https://www.360kuai.com/pc/9026a62a8775d368f?cota=4&kuai_so=1&tj_url=so_rec&sign=360_57c3bbd1&refer_scene=so_1

https://blog.csdn.net/bigdataf/article/details/78479331　　Windows pyhs2下模块安装，没试过