Components of the Impala Server
Components of the Impala Server
The Impala server is a distributed, massively parallel processing (MPP) database engine. It consists of different daemon processes that run on specific hosts within your CDH cluster.
Continue reading:
The Impala Daemon
The core Impala component is a daemon process that runs on each node of the cluster, physically represented by the impalad process. It reads and writes to data files; accepts queries transmitted from the impala-shell command, Hue, JDBC, or ODBC; parallelizes the queries and distributes work to other nodes in the Impala cluster; and transmits intermediate query results back to the central coordinator node.
You can submit a query to the Impala daemon running on any node, and that node serves as the coordinator node for that query. The other nodes transmit partial results back to the coordinator, which constructs the final result set for a query. When running experiments with functionality through the impala-shell command, you might always connect to the same Impala daemon for convenience. For clusters running production workloads, you might load-balance between the nodes by submitting each query to a different Impala daemon in round-robin style, using the JDBC or ODBC interfaces.
The Impala daemons are in constant communication with the statestore, to confirm which nodes are healthy and can accept new work.
They also receive broadcast messages from the catalogd daemon (introduced in Impala 1.2) whenever any Impala node in the cluster creates, alters, or drops any type of object, or when an INSERT or LOAD DATA statement is processed through Impala. This background communication minimizes the need for REFRESH or INVALIDATE METADATAstatements that were needed to coordinate metadata across nodes prior to Impala 1.2.
Related information: Modifying Impala Startup Options, Starting Impala, Setting the Idle Query and Idle Session Timeouts for impalad, Ports Used by Impala,Using Impala through a Proxy for High Availability
The Impala Statestore
The Impala component known as the statestore checks on the health of Impala daemons on all the nodes in a cluster, and continuously relays its findings to each of those daemons. It is physically represented by a daemon process named statestored; you only need such a process on one node in the cluster. If an Impala node goes offline due to hardware failure, network error, software issue, or other reason, the statestore informs all the other nodes so that future queries can avoid making requests to the unreachable node.
Because the statestore's purpose is to help when things go wrong, it is not critical to the normal operation of an Impala cluster. If the statestore is not running or becomes unreachable, the other nodes continue running and distributing work among themselves as usual; the cluster just becomes less robust if other nodes fail while the statestore is offline. When the statestore comes back online, it re-establishes communication with the other nodes and resumes its monitoring function.
Related information:
Scalability Considerations for the Impala Statestore, Modifying Impala Startup Options, Starting Impala, Increasing the Statestore Timeout, Ports Used by Impala
The Impala Catalog Service
The Impala component known as the catalog service relays the metadata changes from Impala SQL statements to all the nodes in a cluster. It is physically represented by a daemon process named catalogd; you only need such a process on one node in the cluster. Because the requests are passed through the statestore daemon, it makes sense to run the statestored and catalogd services on the same node.
This new component in Impala 1.2 reduces the need for the REFRESH and INVALIDATE METADATA statements. Formerly, if you issued CREATE DATABASE, DROP DATABASE,CREATE TABLE, ALTER TABLE, or DROP TABLE statements on one Impala node, you needed to issue INVALIDATE METADATA on any other node before running a query there, so that it would pick up the changes to schema objects. Likewise, if you issued INSERT statements on one node, you needed to issue REFRESH table_name on any other node before running a query there, so that it would recognize the newly added data files. The catalog service removes the need to issue REFRESH and INVALIDATE METADATAstatements when the metadata changes are performed by statement issued through Impala; when you create a table, load data, and so on through Hive, you still need to issue REFRESH or INVALIDATE METADATA on an Impala node before executing a query there.
This feature, new in Impala 1.2, touches a number of aspects of Impala:
See Impala Installation, Upgrading Impala and Starting Impala, for usage information for the catalogd daemon.
The REFRESH and INVALIDATE METADATA statements are no longer needed when the CREATE TABLE, INSERT, or other table-changing or data-changing operation is performed through Impala. These statements are still needed if such operations are done through Hive or by manipulating data files directly in HDFS, but in those cases the statements only need to be issued on one Impala node rather than on all nodes. See REFRESH Statement and INVALIDATE METADATA Statement for the latest usage information for those statements.
By default, the metadata loading and caching on startup happens asynchronously, so Impala can begin accepting requests promptly. To enable the original behavior, where Impala waited until all metadata was loaded before accepting any requests, set the catalogd configuration option --load_catalog_in_background=false.
Components of the Impala Server的更多相关文章
- Cloudera Impala Guide
Impala Concepts and Architecture The following sections provide background information to help you b ...
- 【原创】大数据基础之Impala(1)简介、安装、使用
impala2.12 官方:http://impala.apache.org/ 一 简介 Apache Impala is the open source, native analytic datab ...
- 什么是staging server
原文链接:http://blog.csdn.net/blade2001/article/details/7194895 软件应用开发的经典模型有这样几个环境:开发环境(development).集成环 ...
- Impala 源码分析-FE
By yhluo 2015年7月29日 Impala 3 Comments Impala 源代码目录结构 SQL 解析 Impala 的 SQL 解析与执行计划生成部分是由 impala-fronte ...
- 【原创】大数据基础之Ambari(4)通过Ambari部署Impala
ambari2.7.3(hdp3.1) 安装 impala2.12(自动安装最新) ambari的hdp中原生不支持impala安装,下面介绍如何通过mpack方式使ambari支持impala安装: ...
- How-to: Do Statistical Analysis with Impala and R
sklearn实战-乳腺癌细胞数据挖掘(博客主亲自录制视频教程) https://study.163.com/course/introduction.htm?courseId=1005269003&a ...
- WebSphere Application Server V8.5.5.0
Downloadable files Abstract IBM WebSphere Application Server Version 8.5.5 Refresh Pack for all plat ...
- Impala配置HA-Nginx
Impala的高可用配置,官方的例子用的是Haproxy,考虑到nginx配置简单,使用人群广泛,再加上nginx1.9以后支持TCP的负载均衡,所以选用nginx. nginx安装:yum inst ...
- HUE配置文件hue.ini 的impala模块详解(图文详解)(分HA集群)
不多说,直接上干货! 我的集群机器情况是 bigdatamaster(192.168.80.10).bigdataslave1(192.168.80.11)和bigdataslave2(192.168 ...
随机推荐
- Qt 添加外部库文件(四种方法)
Qt添加外部库文件, 一种就是直接加库文件的绝对路劲,这种方法简单,但是遇到多个库文件的时候,会很麻烦,而且,如果工程移动位置以后还需要重新配置 另一种就是相对路径了,不过Qt 编译的文件会在一个单独 ...
- Eclipse项目和MyEclipse项目
因为Eclipse的项目结构和MyEclipse项目的结构不同,所以两者的项目之间不能直接运行的. 我们在创建Eclipse项目的时候可以进行一些设置,这样在Eclipse中创建的项目可以直接在MyE ...
- Android getActionBar()报空指针异常
1. 加载完视图后,再去获取: 写在setContentView()后面. 2.sdk版本: Actionbar的主题在3.0以后才有,使用的时候要确保,最低的版本不能小于3.0. <uses- ...
- hadoop-0.23.9安装以及第一个mapreduce测试程序
hadoop是一个能够对大量数据进行分布式处理的软件框架.它实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS.HDFS有着高容错性的特点,并且设计 ...
- CFF前端沙龙总结
一. -OOCSS + Sass ——大漠 1. OOCSS 结构<=>皮肤 分离 容器<=>内容 分离 2. Sass 工具.处理器 SCSS(CSS风格)<=> ...
- Oracle的rownum原理和使用(整理几个达人的帖子)
整理和学习了一下网上高手关于rownum的帖子: 参考资料: http://tech.ddvip.com/2008-10/122490439383296.html 和 http://tenn.jav ...
- UVa 10817 (状压DP + 记忆化搜索) Headmaster's Headache
题意: 一共有s(s ≤ 8)门课程,有m个在职教师,n个求职教师. 每个教师有各自的工资要求,还有他能教授的课程,可以是一门或者多门. 要求在职教师不能辞退,问如何录用应聘者,才能使得每门课只少有两 ...
- 面向函数范式编程(Functional programming)
函数编程(简称FP)不只代指Haskell Scala等之类的语言,还表示一种编程思维,软件思考方式,也称面向函数编程. 编程的本质是组合,组合的本质是范畴Category,而范畴是函数的组合. 首先 ...
- Android、iPhone和Java三个平台一致的加密工具
先前一直在做安卓,最近要开发iPhone客户端,这其中遇到的最让人纠结的要属Java.Android和iPhone三个平台加解密不一致的问题. 因为手机端后台通常是用JAVA开发的Web Servic ...
- 苹果官方 Crash文件分析方法 (iOS系统Crash文件分析方法)
对于提交的苹果官方的app,在审核的时候会给我们一些crash文件,对于这些有用的文件,里面是关于我们的bug的一些信息,那么该如何去调试呢 第一步:在任意目录创建一个目录,用来调试crash,我这里 ...