[Hive] - Hive参数含义详解

　　hive中参数分为三类，第一种system环境变量信息，是系统环境变量信息；第二种是env环境变量信息，是当前用户环境变量信息；第三种是hive参数变量信息，是由hive-site.xml文件定义的以及当前hive会话定义的环境变量信息。其中第三种hive参数变量信息中又由hadoop hdfs参数(直接是hadoop的)、mapreduce参数、metastore元数据存储参数、metastore连接参数以及hive运行参数构成。

Hive-0.13.1-cdh5.3.6参数变量信息详解
参数	默认值	含义(用处)
datanucleus.autoCreateSchema	true	creates necessary schema on a startup if one doesn't exist. set this to false, after creating it once；如果数据元数据不存在，那么直接创建，如果设置为false，那么在之后创建。
datanucleus.autoStartMechanismMode	checked	throw exception if metadata tables are incorrect;如果数据元信息检查失败，抛出异常。可选value: checked, unchecked
datanucleus.cache.level2	false	Use a level 2 cache. Turn this off if metadata is changed independently of Hive metastore server; 是否使用二级缓存机制。
datanucleus.cache.level2.type	SOFT	SOFT=soft reference based cache, WEAK=weak reference based cache， none=no cache.二级缓存机制的类型，none是不使用，SOFT表示使用软引用，WEAK表示使用弱引用。
datanucleus.connectionPoolingType	BoneCP	metastore数据连接池使用。
datanucleus.fixedDatastore	false
datanucleus.identifierFactory	datanucleus1	Name of the identifier factory to use when generating table/column names etc.创建metastore数据库的工厂类。
datanucleus.plugin.pluginRegistryBundleCheck	LOG	Defines what happens when plugin bundles are found and are duplicated [EXCEPTION\|LOG\|NONE]
datanucleus.rdbms.useLegacyNativeValueStrategy	true
datanucleus.storeManagerType	rdbms	元数据存储方式
datanucleus.transactionIsolation	read-committed	事务机制，Default transaction isolation level for identity generation.
datanucleus.validateColumns	false	validates existing schema against code. turn this on if you want to verify existing schema,对于存在的表是否进行检查schema
datanucleus.validateConstraints	false	对于存在的表是否检查约束
datanucleus.validateTables	false	检查表
dfs.block.access.key.update.interval	600
hive.archive.enabled	false	Whether archiving operations are permitted；是否允许进行归档操作。
hive.auto.convert.join	true	Whether Hive enables the optimization about converting common join into mapjoin based on the input file size；是否允许进行data join 优化
hive.auto.convert.join.noconditionaltask	true	Whether Hive enables the optimization about converting common join into mapjoin based on the input file size. If this parameter is on, and the sum of size for n-1 of the tables/partitions for a n-way join is smaller than the specified size, the join is directly converted to a mapjoin (there is no conditional task).针对没有条件的task，是否直接使用data join。
hive.auto.convert.join.noconditionaltask.size	10000000	If hive.auto.convert.join.noconditionaltask is off, this parameter does not take affect. However, if it is on, and the sum of size for n-1 of the tables/partitions for a n-way join is smaller than this size, the join is directly converted to a mapjoin(there is no conditional task). The default is 10MB；如果${hive.auto.convert.join.noconditionaltask}设置为true，那么表示控制文件的大小值，默认10M；也就是说如果小于10M，那么直接使用data join。
hive.auto.convert.join.use.nonstaged	false	For conditional joins, if input stream from a small alias can be directly applied to join operator without filtering or projection, the alias need not to be pre-staged in distributed cache via mapred local task. Currently, this is not working with vectorization or tez execution engine.对于有条件的数据join，对于小文件是否使用分布式缓存。
hive.auto.convert.sortmerge.join	false	Will the join be automatically converted to a sort-merge join, if the joined tables pass the criteria for sort-merge join.如果可以转换，自动转换为标准的sort-merge join方式。
hive.auto.convert.sortmerge.join.bigtable.selection.policy	org.apache.hadoop.hive.ql.optimizer.AvgPartitionSizeBasedBigTableSelectorForAutoSMJ
hive.auto.convert.sortmerge.join.to.mapjoin	false	是否穿件sort-merge join到map join方式
hive.auto.progress.timeout	0	How long to run autoprogressor for the script/UDTF operators (in seconds). Set to 0 for forever. 执行脚本和udtf过期时间，设置为0表示永不过期。
hive.autogen.columnalias.prefix.includefuncname	false	hive自动产生的临时列名是否加function名称，默认不加
hive.autogen.columnalias.prefix.label	_c	hive的临时列名主体部分
hive.binary.record.max.length	1000	hive二进制记录最长长度
hive.cache.expr.evaluation	true	If true, evaluation result of deterministic expression referenced twice or more will be cached. For example, in filter condition like ".. where key + 10 > 10 or key + 10 = 0" "key + 10" will be evaluated/cached once and reused for following expression ("key + 10 = 0"). Currently, this is applied only to expressions in select or filter operator. 是否允许缓存表达式的执行，默认允许；先阶段只缓存select和where中的表达式结果。
hive.cli.errors.ignore	false
hive.cli.pretty.output.num.cols	-1
hive.cli.print.current.db	false	是否显示当前操作database名称，默认不显示
hive.cli.print.header	false	是否显示具体的查询头部信息，默认不显示。比如不显示列名。
hive.cli.prompt	hive	hive的前缀提示信息,，修改后需要重新启动客户端。
hive.cluster.delegation.token.store.class	org.apache.hadoop.hive.thrift.MemoryTokenStore	hive集群委托token信息存储类
hive.cluster.delegation.token.store.zookeeper.znode	/hive/cluster/delegation	hive zk存储
hive.compactor.abortedtxn.threshold	1000	分区压缩文件阀值
hive.compactor.check.interval	300	压缩间隔时间，单位秒
hive.compactor.delta.num.threshold	10	子分区阀值
hive.compactor.delta.pct.threshold	0.1	压缩比例
hive.compactor.initiator.on	false
hive.compactor.worker.threads	0
hive.compactor.worker.timeout	86400	单位秒
hive.compat	0.12	兼容版本信息
hive.compute.query.using.stats	false
hive.compute.splits.in.am	true
hive.conf.restricted.list	hive.security.authenticator.manager,hive.security.authorization.manager
hive.conf.validation	true
hive.convert.join.bucket.mapjoin.tez	false
hive.counters.group.name	HIVE
hive.debug.localtask	false
hive.decode.partition.name	false
hive.default.fileformat	TextFile	指定默认的fileformat格式化器。默认为textfile。
hive.default.rcfile.serde	org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe	rcfile对应的序列化类
hive.default.serde	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	默认的序列化类
hive.display.partition.cols.separately	true	hive分区单独的显示列名
hive.downloaded.resources.dir	/tmp/${hive.session.id}_resources	hive下载资源存储文件
hive.enforce.bucketing	false	是否允许使用桶
hive.enforce.bucketmapjoin	false	是否允许桶进行map join
hive.enforce.sorting	false	是否允许在插入的时候使用sort排序。
hive.enforce.sortmergebucketmapjoin	false
hive.entity.capture.transform	false
hive.entity.separator	@	Separator used to construct names of tables and partitions. For example, dbname@tablename@partitionname
hive.error.on.empty.partition	false	Whether to throw an exception if dynamic partition insert generates empty results.当启用动态hive的时候，如果插入的partition为空，是否抛出异常信息。
hive.exec.check.crossproducts	true	检查是否包含向量积
hive.exec.compress.intermediate	false	中间结果是否压缩，压缩机制采用hadoop的配置信息mapred.output.compress*
hive.exec.compress.output	false	最终结果是否压缩
hive.exec.concatenate.check.index	true
hive.exec.copyfile.maxsize	33554432
hive.exec.counters.pull.interval	1000
hive.exec.default.partition.name	__HIVE_DEFAULT_PARTITION__
hive.exec.drop.ignorenonexistent	true	当执行删除的时候是否忽略不存在的异常信息，默认忽略，如果忽略，那么会报错。
hive.exec.dynamic.partition	true	是否允许动态指定partition，如果允许的话，那么我们修改内容的时候可以不指定partition的值。
hive.exec.dynamic.partition.mode	strict	动态partition模式，strict模式要求至少给定一个静态的partition值。nonstrict允许全部partition为动态的值。
hive.exec.infer.bucket.sort	false
hive.exec.infer.bucket.sort.num.buckets.power.two	false
hive.exec.job.debug.capture.stacktraces	true
hive.exec.job.debug.timeout	30000
hive.exec.local.scratchdir	/tmp/hadoop
hive.exec.max.created.files	100000	在mr程序中最大创建的hdfs文件个数
hive.exec.max.dynamic.partitions	1000	动态分区的总的分区最大个数
hive.exec.max.dynamic.partitions.pernode	100	每个MR节点的最大创建个数
hive.exec.mode.local.auto	false	是否允许hive运行本地模式
hive.exec.mode.local.auto.input.files.max	4	hive本地模式最大输入文件数量
hive.exec.mode.local.auto.inputbytes.max	134217728	hive本地模式组大输入字节数
hive.exec.orc.default.block.padding	true
hive.exec.orc.default.buffer.size	262144
hive.exec.orc.default.compress	ZLIB
hive.exec.orc.default.row.index.stride	10000
hive.exec.orc.default.stripe.size	268435456
hive.exec.orc.dictionary.key.size.threshold	0.8
hive.exec.orc.memory.pool	0.5
hive.exec.orc.skip.corrupt.data	false
hive.exec.orc.zerocopy	false
hive.exec.parallel	false	是否允许并行执行，默认不允许。
hive.exec.parallel.thread.number	8	并行执行线程个数，默认8个。
hive.exec.perf.logger	org.apache.hadoop.hive.ql.log.PerfLogger
hive.exec.rcfile.use.explicit.header	true
hive.exec.rcfile.use.sync.cache	true
hive.exec.reducers.bytes.per.reducer	1000000000	size per reducer.The default is 1G, i.e if the input size is 10G, it will use 10 reducers. 默认reducer节点处理数据的规模，默认1G。
hive.exec.reducers.max	999	reducer允许的最大个数。当mapred.reduce.tasks指定为负值的时候，该参数起效。
hive.exec.rowoffset	false
hive.exec.scratchdir	/etc/hive-hadoop
hive.exec.script.allow.partial.consumption	false
hive.exec.script.maxerrsize	100000
hive.exec.script.trust	false
hive.exec.show.job.failure.debug.info	true
hive.exec.stagingdir	.hive-staging
hive.exec.submitviachild	false
hive.exec.tasklog.debug.timeou	20000
hive.execution.engine	mr	执行引擎mr或者Tez(hadoop2)
hive.exim.uri.scheme.whitelist	hdfs,pfile
hive.explain.dependency.append.tasktype	false
hive.fetch.output.serde	org.apache.hadoop.hive.serde2.DelimitedJSONSerDe
hive.fetch.task.aggr	false
hive.fetch.task.conversion	minimal
hive.fetch.task.conversion.threshold	-1
hive.file.max.footer	100
hive.fileformat.check	true
hive.groupby.mapaggr.checkinterval	100000
hive.groupby.orderby.position.alias	false
hive.groupby.skewindata	false
hive.hadoop.supports.splittable.combineinputformat	false
hive.hashtable.initialCapacity	100000
hive.hashtable.loadfactor	0.75
hive.hbase.generatehfiles	false
hive.hbase.snapshot.restoredir	/tmp
hive.hbase.wal.enabled	true
hive.heartbeat.interval	1000
hive.hmshandler.force.reload.conf	false
hive.hmshandler.retry.attempts	1
hive.hmshandler.retry.interval	1000
hive.hwi.listen.host	0.0.0.0
hive.hwi.listen.port	9999
hive.hwi.war.file	lib/hive-hwi-${version}.war
hive.ignore.mapjoin.hint	true
hive.in.test	false
hive.index.compact.binary.search	true
hive.index.compact.file.ignore.hdfs	false
hive.index.compact.query.max.entries	10000000
hive.index.compact.query.max.size	10737418240
hive.input.format	org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
hive.insert.into.external.tables	true
hive.insert.into.multilevel.dirs	false
hive.jobname.length	50
hive.join.cache.size	25000
hive.join.emit.interval	1000
hive.lazysimple.extended_boolean_literal	false
hive.limit.optimize.enable	false
hive.limit.optimize.fetch.max	50000
hive.limit.optimize.limit.file	10
hive.limit.pushdown.memory.usage	-1.0
hive.limit.query.max.table.partition	-1
hive.limit.row.max.size	100000
hive.localize.resource.num.wait.attempts	5
hive.localize.resource.wait.interval	5000
hive.lock.manager	org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager
hive.mapred.partitioner	org.apache.hadoop.hive.ql.io.DefaultHivePartitioner
hive.mapred.reduce.tasks.speculative.execution	true
hive.mapred.supports.subdirectories	false
hive.metastore.uris	thrift://hh:9083
hive.metastore.warehouse.dir	/user/hive/warehouse
hive.multi.insert.move.tasks.share.dependencies	false
hive.multigroupby.singlereducer	true
hive.zookeeper.clean.extra.nodes	false	在会话结束的时候是否清楚额外的节点数据
hive.zookeeper.client.port	2181	客户端端口号
hive.zookeeper.quorum		zk的服务器端ip
hive.zookeeper.session.timeout	600000	zk的client端会话过期时间
hive.zookeeper.namespace	hive_zookeeper_namespace
javax.jdo.PersistenceManagerFactoryClass	org.datanucleus.api.jdo.JDOPersistenceManagerFactory
javax.jdo.option.ConnectionDriverName	改为：com.mysql.jdbc.Driver
javax.jdo.option.ConnectionPassword	改为：hive
javax.jdo.option.ConnectionURL	xxx
javax.jdo.option.ConnectionUserName	xxx
javax.jdo.option.DetachAllOnCommit	true
javax.jdo.option.Multithreaded	true
javax.jdo.option.NonTransactionalRead	true

[Hive] - Hive参数含义详解的更多相关文章

机器学习——随机森林，RandomForestClassifier参数含义详解
1.随机森林模型 clf = RandomForestClassifier(n_estimators=200, criterion='entropy', max_depth=4) rf_clf = c ...
Hive配置项的含义详解
关于MetaStore:metastore是个独立的关系数据库,用来持久化schema和系统元数据. hive.metastore.local:控制hive是否连接一个远程metastore服务器还是 ...
Apache的配置文件http.conf参数含义详解
Apache的配置由httpd.conf文件配置,因此下面的配置指令都是在httpd.conf文件中修改. 主站点的配置(基本配置) (1) 基本配置: ServerRoot "/mnt/s ...
大数据学习系列之五 ----- Hive整合HBase图文详解
引言在上一篇大数据学习系列之四 ----- Hadoop+Hive环境搭建图文详解(单机) 和之前的大数据学习系列之二 ----- HBase环境搭建(单机) 中成功搭建了Hive和HBase的环 ...
Hive 的collect_set使用详解
Hive 的collect_set使用详解 https://blog.csdn.net/liyantianmin/article/details/48262109 对于非group by字段,可以 ...
MySQL高可用架构之Mycat-关于Mycat安装和参数设置详解
MySQL高可用架构之Mycat-关于Mycat安装和参数设置详解作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.Mycat介绍 1>.什么是Mycat Mycat背后是 ...
Oracle Statspack报告中各项指标含义详解~~学习性能必看！！！
Oracle Statspack报告中各项指标含义详解~~学习性能必看!!! Data Buffer Hit Ratio#<#90# 数据块在数据缓冲区中的命中率,通常应该在90%以上,否则考虑 ...
Spring boot注解(annotation)含义详解
Spring boot注解(annotation)含义详解 @Service用于标注业务层组件@Controller用于标注控制层组件(如struts中的action)@Repository用于标注数 ...
Linux命令 ls -l 输出内容含义详解
Linux命令 ls -l s输出内容含义详解 1. ls 只显示文件名或者文件目录 2. ls -l(这个参数是字母L的小写,不是数字1) 用来查看详细的文件资料在某个目录下键入ls -l可 ...

随机推荐

Vim配置C++
当前用户的Vim配置便存储在文件 ~/.vimrc 中,该文件的每一行便是一个配置项设置自动换行,在配置文件中加入如下代码: syntax onset tabstop=4set softtabsto ...
IOS9提示“不受信任的开发者”如何处理
iPhone升级到IOS9版本后,发现部分APP在下载后首次运行时,都会提示“不受信任的应用程序开发者”,这是因为企业证书发布的APP,没有经过AppStore审核,于是iOS对用户做出一个安全性的提 ...
request参数集合绑定实体实现defaultmodebinder
using System; using System.Collections.Generic; using System.Linq; using System.Web; using System.We ...
2.4 easyui - panel的使用
<div id="p" class="easyui-panel" title="My Panel" style ...
Java Spring MVC项目搭建（三）——“Hello World”
在Spring 的配置文件里,我们定义了一个bean ,Spring 会在启动时候会生成对象. <bean id = "helloworld" class="com ...
UVa 341 - Non-Stop Travel
题目大意:给一个地区的地图,上面有若干路口,每个路口因为红灯的缘故要耽误一些时间,给出起点和终点,找出最短路径使得耽误时间最短. 单源最短路问题,Dijkstra算法.同时还要打印路径. #inclu ...
UVa 11456 - Trainsorting
题目大意:给一个车辆到达车站的序列(按时间先后),可以对车辆进行以下处理:插在队首.插在队尾或者拒绝进站.车站内的车辆必须按照重量大小从大到小排列,问车站内最多能有多少辆车辆? 假设车i是第一个进站, ...
java系列--JSP的属性和内置对象
一.JSP指令: <%@ 指令名属性=" " %> 1.page指令 import属性 errorPage属性 language属性 session属性 isErro ...
Codeforces 320A Magic Numbers
因为晚上有一个cf的比赛,而自己从来没有在cf上做过题,就找了道题熟悉一下. 题目大意:给一个数,判断是否能由1,14,144三个数连接得到. 代码如下: #include <stdio.h&g ...
centos5.5get 递归下载整个网站
这个命令可以以递归的方式下载整站,并可以将下载的页面中的链接转换为本地链接. wget加上参数之后,即可成为相当强大的下载工具. wget -r -p -np -k http://xxx.com/xx ...

[Hive] - Hive参数含义详解

[Hive] - Hive参数含义详解的更多相关文章

随机推荐

热门专题