【原创】大数据基础之Impala(3)部分调优
1)将coordinator和executor角色分离
By default, each host in the cluster that runs the impalad daemon can act as the coordinator for an Impala query, execute the fragments of the execution plan for the query, or both. During highly concurrent workloads for large-scale queries, especially on large clusters, the dual roles can cause scalability issues:
- The extra work required for a host to act as the coordinator could interfere with its capacity to perform other work for the earlier phases of the query. For example, the coordinator can experience significant network and CPU overhead during queries containing a large number of query fragments. Each coordinator caches metadata for all table partitions and data files, which can be substantial and contend with memory needed to process joins, aggregations, and other operations performed by query executors.
- Having a large number of hosts act as coordinators can cause unnecessary network overhead, or even timeout errors, as each of those hosts communicates with the statestored daemon for metadata updates.
- The “soft limits” imposed by the admission control feature are more likely to be exceeded when there are a large number of heavily loaded hosts acting as coordinators.
2)default_pool_max_requests,默认是200,要根据自己集群的内存规模以及每个查询需要的内存进行调整;
Maximum number of concurrent outstanding requests allowed to run before incoming requests are queued. Because this limit applies cluster-wide, but each Impala node makes independent decisions to run queries immediately or queue them, it is a soft limit; the overall number of concurrent queries might be slightly higher during times of heavy load. A negative value indicates no limit. Ignored if fair_scheduler_config_path and llama_site_path are set.
3)开启kerberos之后,通过jdbc访问需要做客户端load balance,因为jdbc url里需要携带对应server的principal;
【原创】大数据基础之Impala(3)部分调优的更多相关文章
- 【原创】大数据基础之Impala(1)简介、安装、使用
impala2.12 官方:http://impala.apache.org/ 一 简介 Apache Impala is the open source, native analytic datab ...
- 大数据技术 - MapReduce的Shuffle及调优
本章内容我们学习一下 MapReduce 中的 Shuffle 过程,Shuffle 发生在 map 输出到 reduce 输入的过程,它的中文解释是 “洗牌”,顾名思义该过程涉及数据的重新分配,主要 ...
- 【原创】大数据基础之Impala(2)实现细节
一 架构 Impala is a massively-parallel query execution engine, which runs on hundreds of machines in ex ...
- 【原创】大数据基础之Zookeeper(2)源代码解析
核心枚举 public enum ServerState { LOOKING, FOLLOWING, LEADING, OBSERVING; } zookeeper服务器状态:刚启动LOOKING,f ...
- 【原创】大数据基础之Benchmark(2)TPC-DS
tpc 官方:http://www.tpc.org/ 一 简介 The TPC is a non-profit corporation founded to define transaction pr ...
- 【原创】大数据基础之词频统计Word Count
对文件进行词频统计,是一个大数据领域的hello word级别的应用,来看下实现有多简单: 1 Linux单机处理 egrep -o "\b[[:alpha:]]+\b" test ...
- 【原创】大数据基础之ElasticSearch(4)es数据导入过程
1 准备analyzer 内置analyzer 参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis- ...
- 【原创】大数据基础之Hive(5)性能调优Performance Tuning
1 compress & mr hive默认的execution engine是mr hive> set hive.execution.engine;hive.execution.eng ...
- 【原创】大数据基础之ElasticSearch(5)重要配置及调优
Index Settings 重要索引配置 Index level settings can be set per-index. Settings may be: 1 static 静态索引配置 Th ...
随机推荐
- CentOS7使用yum安装MySQL8.0
1.yum仓库下载MySQL:sudo yum localinstall https://repo.mysql.com//mysql80-community-release-el7-1.noarch. ...
- python 字典用法
d = {key1 : value1, key2 : value2 } 1.创建 dict1 = { 'abc': 456 } 2.访问/修改 dict['Name'] 3.删除 del dict[' ...
- C#语法相比其它语言比较独特的地方
C#语法相比其它语言比较独特的地方(一) 本文讲解了switch语句可以用来测试string型的对象.多维数组.foreach语句.索引器和Property等内容 1,switch语句可以用来测试st ...
- linux软连接文件的copy
最近在做项目的时候遇到过一个问题:当copy一个工程模块时发现里面的目录文件有重复定义的情况. 最后查看源文件目录发现是存在软连接造成的. 出现这种情况的原因是:当直接copy文件目录时遇到软连接会把 ...
- Monte Carlo Method(蒙特·卡罗方法)
0-故事: 蒙特卡罗方法是计算模拟的基础,其名字来源于世界著名的赌城——摩纳哥的蒙特卡罗. 蒙特卡罗一词来源于意大利语,是为了纪念王子摩纳哥查理三世.蒙特卡罗(MonteCarlo)虽然是个赌城,但很 ...
- 【CSA49G】【XSY3315】jump DP
题目大意 有一个数轴.yww 最开始在位置 \(0\).yww 总共要跳跃很多次.每次 yww 可以往右跳 \(1\) 单位长度,或者跳到位置 \(1\). 定义位置序列为 yww 在每次跳跃之后所在 ...
- DRF 商城项目 - 用户( 登录, 注册,登出,个人中心 ) 逻辑梳理
用户登录 自定义用户登录字段处理 用户的登录时通过 手机号也可以进行登录 需要重写登录验证逻辑 from django.contrib.auth.backends import ModelBacken ...
- jmeter笔记(7)--参数化--用户定义的变量
录制的脚本里面有很多的相同的数据的时候,比如服务器ip,端口号等,当更换服务器的时候,就需要手动的修改脚本里面对应的服务器ip和端口号,比较繁琐,jmeter里面有一个用户自定义变量能很好的解决这个问 ...
- Java动态代理实现及实际应用
一.代理的概念 动态代理技术是整个java技术中最重要的一个技术,它是学习java框架的基础,不会动态代理技术,那么在学习Spring这些框架时是只知应用不懂实现. 动态代理技术就是用来产生一个对象的 ...
- Number和toString中的坑
在之前的一篇文章 JavaScript中的大数相加 中,在做大数相加时, 突然想到 数字.toString方法 会报错,但是作为函数参数传进来,直接调用 toString 方法却不会报错 上网搜了看看 ...