Hadoop-Impala学习笔记之管理

配置参数管理

待补充。。。

资源分配管理（Admission Control）

Impala有资源池的概念，允许某些查询在特定的资源池执行，不过在白天不跑批/晚上不跑adhoc的DSS系统中，该机制并不常用（oracle、cgroup性质都类似），有兴趣可以参考《Impala Guide 中的Admission Control and Query Queuing》。

安全管理（跟一般的RDBMS差不多，只不过认证和授权是外部的，比较复杂）

Impala认证基于Kerberos框架《Enabling Kerberos Authentication for Impala》，Impala授权框架基于Sentry开源项目《Enabling Sentry Authorization for Impala》，从Impala 1.1.0开始加入，审计特性从1.1.1开始支持。

kerberos安装：https://www.jianshu.com/p/fc2d2dbd510b

kerberos介绍：https://www.cnblogs.com/ulysses-you/p/8107862.html

CDH集成Kerberos配置：https://blog.csdn.net/qxf1374268/article/details/79321951

如何在CDH5.12集群中启用Kerberos认证：https://blog.csdn.net/cy309173854/article/details/79288491

优化

启用short-circuit读

该特性使得Impala可以从文件系统直接读取本地数据，避免了和DataNodes通信的必要性，提升性能，它要求使用libhadoop.so（hadoop原生库）。tarball安装中不包含此库，.rpm, .deb, parcel中包含。

该特性可以通过修改hdfs-site.xml或Cloudera Manager修改。

启用块位置跟踪

该特性可以使得Impala更好地利用底层的磁盘，如果Impala不是由Cloudera Manager管理，则需要启用块位置跟踪特性。该特性同样可以通过hdfs-site.xml修改。

JDBC访问

JDBC 2.0及之后的版本可通过21050访问Impala，可通过impalad启动参数--hs2_port修改默认端口。

在Impala 2.0+，可通过Cloudera JDBC Connector和Hive 0.13（0.12之前的版本无法访问2.0） JDBC访问。

连接串：jdbc:impala://Host:Port[/Schema];Property1=Value;Property2=Value;...

jdbc:hive2://myhost.example.com:21050/;auth=noSasl

jdbc:hive2://myhost.example.com:21050/;principal=impala/myhost.example.com@H2.EXAMPLE.COM -- Kerberos认证的Impala

当前版本的驱动在对Kudu表执行DML操作时，如果发生一些错误如唯一性约束违反，不会报错。如果有此要求，可以使用Kudu Java API而不是JDBC。

impala jdbc没有发布在共有的maven仓库中，需要自己从https://www.cloudera.com/downloads/connectors/impala/jdbc/2-5-43.html下载，并维护到本地maven仓库，https://github.com/onefoursix/Cloudera-Impala-JDBC-Example包含了一个例子，它使用就和普通的JDBC一样的，没什么特别的。

Impala支持的HDFS文件格式

其中Snappy在压缩率和解压效率之间取得平衡，是推荐的做法。Gzip可以得到最好的压缩率。如果数据几乎一直驻留内存，则不用考虑压缩，因为节省不了I/O。

默认情况下，Impala创建的就是文本文件格式的表。

Parquet是列式存储的二进制文件格式，适合于访问少数列的场景。要创建Parquet格式的表，可以在create table中声明STORED AS PARQUET;子句，如下：

[impala-host:21000] > create table parquet_table_name (x INT, y STRING) STORED AS PARQUET;

还可以直接从Parquet推断出列定义：

CREATE EXTERNAL TABLE ingest_existing_files LIKE PARQUET

'/user/etl/destination/datafile1.dat'

STORED AS PARQUET

LOCATION '/user/etl/destination';

Impala使用的端口列表

Component	Service	Port	Access Requirement	Comment
Impala Daemon	Impala Daemon Frontend Port	21000	External	Used to transmit commands and receive results by `impala-shell` and some ODBC drivers.
Impala Daemon	Impala Daemon Frontend Port	21050	External	Used to transmit commands and receive results by applications, such as Business Intelligence tools, using JDBC, the Beeswax query editor in Hue, and some ODBC drivers.
Impala Daemon	Impala Daemon Backend Port	22000	Internal	Internal use only. Impala daemons use this port for Thrift based communication with each other.
Impala Daemon	StateStoreSubscriber Service Port	23000	Internal	Internal use only. Impala daemons listen on this port for updates from the statestore daemon.
Catalog Daemon	StateStoreSubscriber Service Port	23020	Internal	Internal use only. The catalog daemon listens on this port for updates from the statestore daemon.
Impala Daemon	Impala Daemon HTTP Server Port	25000	External	Impala web interface for administrators to monitor and troubleshoot.
Impala StateStore Daemon	StateStore HTTP Server Port	25010	External	StateStore web interface for administrators to monitor and troubleshoot.
Impala Catalog Daemon	Catalog HTTP Server Port	25020	External	Catalog service web interface for administrators to monitor and troubleshoot. New in Impala 1.2 and higher.
Impala StateStore Daemon	StateStore Service Port	24000	Internal	Internal use only. The statestore daemon listens on this port for registration/unregistration requests.
Impala Catalog Daemon	Catalog Service Port	26000	Internal	Internal use only. The catalog service uses this port to communicate with the Impala daemons. New in Impala 1.2 and higher.
Impala Daemon	KRPC Port	27000	Internal	Internal use only. Impala daemons use this port for KRPC based communication with each other.
Impala Daemon	Llama Callback Port	28000	Internal	Internal use only. Impala daemons use to communicate with Llama. New in Impala 1.3and higher.
Impala Llama ApplicationMaster	Llama Thrift Admin Port	15002	Internal	Internal use only. New in Impala 1.3 and higher.
Impala Llama ApplicationMaster	Llama Thrift Port	15000	Internal	Internal use only. New in Impala 1.3 and higher.
Impala Llama ApplicationMaster	Llama HTTP Port	15001	External	Llama service web interface for administrators to monitor and troubleshoot. New in Impala 1.3 and higher.