1)将coordinator和executor角色分离

By default, each host in the cluster that runs the impalad daemon can act as the coordinator for an Impala query, execute the fragments of the execution plan for the query, or both. During highly concurrent workloads for large-scale queries, especially on large clusters, the dual roles can cause scalability issues:

  • The extra work required for a host to act as the coordinator could interfere with its capacity to perform other work for the earlier phases of the query. For example, the coordinator can experience significant network and CPU overhead during queries containing a large number of query fragments. Each coordinator caches metadata for all table partitions and data files, which can be substantial and contend with memory needed to process joins, aggregations, and other operations performed by query executors.
  • Having a large number of hosts act as coordinators can cause unnecessary network overhead, or even timeout errors, as each of those hosts communicates with the statestored daemon for metadata updates.
  • The “soft limits” imposed by the admission control feature are more likely to be exceeded when there are a large number of heavily loaded hosts acting as coordinators.

2)default_pool_max_requests,默认是200,要根据自己集群的内存规模以及每个查询需要的内存进行调整;

Maximum number of concurrent outstanding requests allowed to run before incoming requests are queued. Because this limit applies cluster-wide, but each Impala node makes independent decisions to run queries immediately or queue them, it is a soft limit; the overall number of concurrent queries might be slightly higher during times of heavy load. A negative value indicates no limit. Ignored if fair_scheduler_config_path and llama_site_path are set.

3)开启kerberos之后,通过jdbc访问需要做客户端load balance,因为jdbc url里需要携带对应server的principal;

【原创】大数据基础之Impala(3)部分调优的更多相关文章

  1. 【原创】大数据基础之Impala(1)简介、安装、使用

    impala2.12 官方:http://impala.apache.org/ 一 简介 Apache Impala is the open source, native analytic datab ...

  2. 大数据技术 - MapReduce的Shuffle及调优

    本章内容我们学习一下 MapReduce 中的 Shuffle 过程,Shuffle 发生在 map 输出到 reduce 输入的过程,它的中文解释是 “洗牌”,顾名思义该过程涉及数据的重新分配,主要 ...

  3. 【原创】大数据基础之Impala(2)实现细节

    一 架构 Impala is a massively-parallel query execution engine, which runs on hundreds of machines in ex ...

  4. 【原创】大数据基础之Zookeeper(2)源代码解析

    核心枚举 public enum ServerState { LOOKING, FOLLOWING, LEADING, OBSERVING; } zookeeper服务器状态:刚启动LOOKING,f ...

  5. 【原创】大数据基础之Benchmark(2)TPC-DS

    tpc 官方:http://www.tpc.org/ 一 简介 The TPC is a non-profit corporation founded to define transaction pr ...

  6. 【原创】大数据基础之词频统计Word Count

    对文件进行词频统计,是一个大数据领域的hello word级别的应用,来看下实现有多简单: 1 Linux单机处理 egrep -o "\b[[:alpha:]]+\b" test ...

  7. 【原创】大数据基础之ElasticSearch(4)es数据导入过程

    1 准备analyzer 内置analyzer 参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis- ...

  8. 【原创】大数据基础之Hive(5)性能调优Performance Tuning

    1 compress & mr hive默认的execution engine是mr hive> set hive.execution.engine;hive.execution.eng ...

  9. 【原创】大数据基础之ElasticSearch(5)重要配置及调优

    Index Settings 重要索引配置 Index level settings can be set per-index. Settings may be: 1 static 静态索引配置 Th ...

随机推荐

  1. Centos7 启动指定docker容器报错

    今天做docker实验时,把docker镜像pull下后,启动报如下错误: 错误信息:WARNING: IPv4 forwarding is disabled. Networking will not ...

  2. docker企业实战视频教程

    Docker是一个开源的引擎,可以轻松的为任何应用创建一个轻量级的.可移植的.自给自足的容器.开发者在笔记本上编译测试通过的容器可以批量地在生产环境中部署,包括VMs(虚拟机).bare metal. ...

  3. [BJOI2019]删数(线段树)

    [BJOI2019]删数(线段树) 题面 洛谷 题解 按照值域我们把每个数的出现次数画成一根根的柱子,然后把柱子向左推导,\([1,n]\)中未被覆盖的区间长度就是答案. 于是问题变成了单点修改值,即 ...

  4. Keepalived配置详解

    Keepalived 配置文件解释 Keepalived的所有配置都在一个配置文件里面,主要分为三类: 全局配置 VRRPD配置 LVS 配置 配置文件是以配置块的形式存在,每个配置块都在一个闭合的{ ...

  5. App自动化(1)--Appium-Android环境搭建

    本次笔记记录Appium-Android环境搭建,主要实现在windows上通过python编写脚本来实现模拟器上安装的app自动化测试. 主要步骤:安装node.js,配置JDK环境,配置Andro ...

  6. logback输出json格式日志(包括mdc)发送到kafka

    1,pom.xml <!-- kafka --> <dependency> <groupId>com.github.danielwegener</groupI ...

  7. python之路day12--装饰器的进阶

    装饰器# 开发原则:开发封闭原则# 装饰器的作用:在不改变原函数的调用函数下,在函数的前后添加功能.# 装饰器的本质:闭包函数 import time def timmer(f): #func #ti ...

  8. Linux性能优化实战:系统的swap变高(09)

    一.实验环境 1.操作系统 root@openstack:~# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu ...

  9. nginx 容器反向代理网址的设置

    先讲一下场景:  nginx 容器要和SpringBoot 容器部署在一台机器上, nginx 为 SpringBoot 提供反向代理, 需要在 nginx.conf 中写上 SpringBoot 的 ...

  10. 【C++笔记】析构函数(destructor)

    “析构函数”是构造函数的反向函数. 在销毁(释放)对象时将调用它们. 通过在类名前面放置一个波形符 (~) 将函数指定为类的析构函数.   声明析构函数   析构函数是具有与类相同的名称但前面是波形符 ...