Components of the Impala Server

The Impala server is a distributed, massively parallel processing (MPP) database engine. It consists of different daemon processes that run on specific hosts within your CDH cluster.

Continue reading:

 

The Impala Daemon

The core Impala component is a daemon process that runs on each node of the cluster, physically represented by the impalad process. It reads and writes to data files; accepts queries transmitted from the impala-shell command, Hue, JDBC, or ODBC; parallelizes the queries and distributes work to other nodes in the Impala cluster; and transmits intermediate query results back to the central coordinator node.

You can submit a query to the Impala daemon running on any node, and that node serves as the coordinator node for that query. The other nodes transmit partial results back to the coordinator, which constructs the final result set for a query. When running experiments with functionality through the impala-shell command, you might always connect to the same Impala daemon for convenience. For clusters running production workloads, you might load-balance between the nodes by submitting each query to a different Impala daemon in round-robin style, using the JDBC or ODBC interfaces.

The Impala daemons are in constant communication with the statestore, to confirm which nodes are healthy and can accept new work.

They also receive broadcast messages from the catalogd daemon (introduced in Impala 1.2) whenever any Impala node in the cluster creates, alters, or drops any type of object, or when an INSERT or LOAD DATA statement is processed through Impala. This background communication minimizes the need for REFRESH or INVALIDATE METADATAstatements that were needed to coordinate metadata across nodes prior to Impala 1.2.

Related information: Modifying Impala Startup OptionsStarting ImpalaSetting the Idle Query and Idle Session Timeouts for impaladPorts Used by Impala,Using Impala through a Proxy for High Availability

The Impala Statestore

The Impala component known as the statestore checks on the health of Impala daemons on all the nodes in a cluster, and continuously relays its findings to each of those daemons. It is physically represented by a daemon process named statestored; you only need such a process on one node in the cluster. If an Impala node goes offline due to hardware failure, network error, software issue, or other reason, the statestore informs all the other nodes so that future queries can avoid making requests to the unreachable node.

Because the statestore's purpose is to help when things go wrong, it is not critical to the normal operation of an Impala cluster. If the statestore is not running or becomes unreachable, the other nodes continue running and distributing work among themselves as usual; the cluster just becomes less robust if other nodes fail while the statestore is offline. When the statestore comes back online, it re-establishes communication with the other nodes and resumes its monitoring function.

Related information:

Scalability Considerations for the Impala StatestoreModifying Impala Startup OptionsStarting ImpalaIncreasing the Statestore TimeoutPorts Used by Impala

The Impala Catalog Service

The Impala component known as the catalog service relays the metadata changes from Impala SQL statements to all the nodes in a cluster. It is physically represented by a daemon process named catalogd; you only need such a process on one node in the cluster. Because the requests are passed through the statestore daemon, it makes sense to run the statestored and catalogd services on the same node.

This new component in Impala 1.2 reduces the need for the REFRESH and INVALIDATE METADATA statements. Formerly, if you issued CREATE DATABASE, DROP DATABASE,CREATE TABLE, ALTER TABLE, or DROP TABLE statements on one Impala node, you needed to issue INVALIDATE METADATA on any other node before running a query there, so that it would pick up the changes to schema objects. Likewise, if you issued INSERT statements on one node, you needed to issue REFRESH table_name on any other node before running a query there, so that it would recognize the newly added data files. The catalog service removes the need to issue REFRESH and INVALIDATE METADATAstatements when the metadata changes are performed by statement issued through Impala; when you create a table, load data, and so on through Hive, you still need to issue REFRESH or INVALIDATE METADATA on an Impala node before executing a query there.

This feature, new in Impala 1.2, touches a number of aspects of Impala:

  • See Impala InstallationUpgrading Impala and Starting Impala, for usage information for the catalogd daemon.

  • The REFRESH and INVALIDATE METADATA statements are no longer needed when the CREATE TABLE, INSERT, or other table-changing or data-changing operation is performed through Impala. These statements are still needed if such operations are done through Hive or by manipulating data files directly in HDFS, but in those cases the statements only need to be issued on one Impala node rather than on all nodes. See REFRESH Statement and INVALIDATE METADATA Statement for the latest usage information for those statements.

By default, the metadata loading and caching on startup happens asynchronously, so Impala can begin accepting requests promptly. To enable the original behavior, where Impala waited until all metadata was loaded before accepting any requests, set the catalogd configuration option --load_catalog_in_background=false.

Components of the Impala Server的更多相关文章

  1. Cloudera Impala Guide

    Impala Concepts and Architecture The following sections provide background information to help you b ...

  2. 【原创】大数据基础之Impala(1)简介、安装、使用

    impala2.12 官方:http://impala.apache.org/ 一 简介 Apache Impala is the open source, native analytic datab ...

  3. 什么是staging server

    原文链接:http://blog.csdn.net/blade2001/article/details/7194895 软件应用开发的经典模型有这样几个环境:开发环境(development).集成环 ...

  4. Impala 源码分析-FE

    By yhluo 2015年7月29日 Impala 3 Comments Impala 源代码目录结构 SQL 解析 Impala 的 SQL 解析与执行计划生成部分是由 impala-fronte ...

  5. 【原创】大数据基础之Ambari(4)通过Ambari部署Impala

    ambari2.7.3(hdp3.1) 安装 impala2.12(自动安装最新) ambari的hdp中原生不支持impala安装,下面介绍如何通过mpack方式使ambari支持impala安装: ...

  6. How-to: Do Statistical Analysis with Impala and R

    sklearn实战-乳腺癌细胞数据挖掘(博客主亲自录制视频教程) https://study.163.com/course/introduction.htm?courseId=1005269003&a ...

  7. WebSphere Application Server V8.5.5.0

    Downloadable files Abstract IBM WebSphere Application Server Version 8.5.5 Refresh Pack for all plat ...

  8. Impala配置HA-Nginx

    Impala的高可用配置,官方的例子用的是Haproxy,考虑到nginx配置简单,使用人群广泛,再加上nginx1.9以后支持TCP的负载均衡,所以选用nginx. nginx安装:yum inst ...

  9. HUE配置文件hue.ini 的impala模块详解(图文详解)(分HA集群)

    不多说,直接上干货! 我的集群机器情况是 bigdatamaster(192.168.80.10).bigdataslave1(192.168.80.11)和bigdataslave2(192.168 ...

随机推荐

  1. C++:对象指针

    对象指针概念:每一个对象在初始化后都会在内存中占有一定的空间.因此,既可以通过对象名访问, 也可以通过一个对象地址来访问一个对象.对象指针就是用于存放对象地址的变量. 声明对象指针的一般语法格式为:类 ...

  2. Excel多条件筛选求和

    单位A 代码B 面积(㎡)C A组 011 124 A组 123 15 A组 011 356 A组 123 44 B组 123 31 B组 011 2 B组 123 2 按照单位和代码求面积的和,可以 ...

  3. PCL—综述—三维图像处理

    点云模型与三维信息 三维图像是一种特殊的信息表达形式,其特征是表达的空间中三个维度的数据.和二维图像相比,三维图像借助第三个维度的信息,可以实现天然的物体-背景解耦.除此之外,对于视觉测量来说,物体的 ...

  4. php扩展函数调用扩展中的标准函数

    这几天在写php的扩展函数,在网上学习步骤什么的都有,一般问题也都能查到,所以就不再此啰嗦,写这篇博客的原因是因为遇到的一个问题,百度谷歌都没找到,对于初学者,这个或许有用,对于过来人,我想他们肯定也 ...

  5. bzoj1266: [AHOI2006]上学路线route

    最短路+最小割 首先如何使最短路变长?就是要每一条最短路都割一条边. 我们求出每个点到点1和点n的距离,就可以知道哪些边在最短路上(一开始没有想到求到0和n的距离,想用floyd,但是n=500,怕超 ...

  6. Jqgrid入门-Jqgrid列数据拖动(七)

    上一章提到在Jqgrid中如何设置二级表头,这一章节主要探讨Jqgrid表格里面的数据如果实现拖动功能,比如你想把第一行的数据拖到当前页的最后一行,或者其他位置.     Jqgrid表格插件自己没有 ...

  7. UVA 10765 Doves and bombs(双连通分量)

    题意:在一个无向连通图上,求任意删除一个点,余下连通块的个数. 对于一个非割顶的点,删除之后,原图仍连通,即余下连通块个数为1:对于割顶,余下连通块个数>=2. 由于是用dfs查找双连通分量,树 ...

  8. unicode string和ansi string的转换函数及获取程序运行路径的代码

    #pragma once#include <string> namespace stds { class tool { public: std::string ws2s(const std ...

  9. I.MX6 开启 1000Mb/s interface

    /*********************************************************************** * I.MX6 开启 1000Mb/s interfa ...

  10. 结合daterangepicker实现Datatables表格带参数查询

    http://dt.thxopen.com/example/user_share/send_extra_param.html#@一颗树 http://www.guoxk.com/node/jquery ...