What Is Hive

Hive is a data warehousing infrastructure based on Hadoop. Hadoop provides massive scale out and fault tolerance capabilities for data storage and processing (using the map-reduce programming paradigm) on commodity hardware.

Hive is designed to enable easy data summarization, ad-hoc querying and analysis of large volumes of data. It provides a simple query language called Hive QL, which is based on SQL and which enables users familiar with SQL to do ad-hoc querying, summarization and data analysis easily. At the same time, Hive QL also allows traditional map/reduce programmers to be able to plug in their custom mappers and reducers to do more sophisticated analysis that may not be supported by the built-in capabilities of the language.

What Hive Is NOT

Hadoop is a batch processing system and Hadoop jobs tend to have high latency and incur substantial overheads in job submission and scheduling. As a result - latency for Hive queries is generally very high (minutes) even when data sets involved are very small (say a few hundred megabytes). As a result it cannot be compared with systems such as Oracle where analyses are conducted on a significantly smaller amount of data but the analyses proceed much more iteratively with the response times between iterations being less than a few minutes. Hive aims to provide acceptable (but not optimal) latency for interactive data browsing, queries over small data sets or test queries.

Hive is not designed for online transaction processing and does not offer real-time queries and row level updates. It is best used for batch jobs over large sets of immutable data (like web logs). 不支持在线事务处理,不支持实时查询,不支持行数据更新。最好的用途是:批量处理大数据集的不可改变的数据,比如网络日志。

In the following sections we provide a tutorial on the capabilities of the system. We start by describing the concepts of data types, tables and partitions (which are very similar to what you would find in a traditional relational DBMS) and then illustrate the capabilities of the QL language with the help of some examples.

[Hive-Tutorial] What Is Hive的更多相关文章

  1. Hive Tutorial(上)(Hive 入门指导)

    用户指导 Hive 指导 Hive指导 概念 Hive是什么 Hive不是什么 获得和开始 数据单元 类型系统 内置操作符和方法 语言性能 用法和例子(在<下>里面) 概念 Hive是什么 ...

  2. [Hive - Tutorial] Type System 数据类型

    数据类型Type System Hive supports primitive and complex data types, as described below. See Hive Data Ty ...

  3. Hive Tutorial 阅读记录

    Hive Tutorial 目录 Hive Tutorial 1.Concepts 1.1.What Is Hive 1.2.What Hive Is NOT 1.3.Getting Started ...

  4. Spark入门实战系列--5.Hive(上)--Hive介绍及部署

    [注]该系列文章以及使用到安装包/测试数据 可以在<倾情大奉送--Spark入门实战系列>获取 .Hive介绍 1.1 Hive介绍 月开源的一个数据仓库框架,提供了类似于SQL语法的HQ ...

  5. hive学习3(hive基本操作)

    hive基本操作 hive的数据类型 1)基本数据类型 TINYINT,SMALLINT,INT,BIGINT FLOAT/DOUBLE BOOLEAN STRING 2)复合类型 ARRAY:一组有 ...

  6. [Hive - LanguageManual] Statistics in Hive

    Statistics in Hive Statistics in Hive Motivation Scope Table and Partition Statistics Column Statist ...

  7. Hive over HBase和Hive over HDFS性能比较分析

    http://superlxw1234.iteye.com/blog/2008274 环境配置: hadoop-2.0.0-cdh4.3.0 (4 nodes, 24G mem/node) hbase ...

  8. Hive学习之一 《Hive的介绍和安装》

    一.什么是Hive Hive是建立在 Hadoop 上的数据仓库基础构架.它提供了一系列的工具,可以用来进行数据提取转化加载(ETL),这是一种可以存储.查询和分析存储在 Hadoop 中的大规模数据 ...

  9. Hive综合HBase——经Hive阅读/书写 HBase桌子

    社论: 本文将Hive与HBase整合在一起,使Hive能够读取HBase中的数据,让Hadoop生态系统中最为经常使用的两大框架互相结合.相得益彰. watermark/2/text/aHR0cDo ...

  10. hive优化之——控制hive任务中的map数和reduce数

    一.    控制hive任务中的map数: 1.    通常情况下,作业会通过input的目录产生一个或者多个map任务.主要的决定因素有: input的文件总个数,input的文件大小,集群设置的文 ...

随机推荐

  1. 使用List,Dictionary加载数据库中的数据

    情景描述:数据库中有一张设备表,字段DWDM存放的是各个厂编号,字段ZNBH存放的是设备编号.其中DWDM跟ZNBH是一对多的关系.需要将数据库中的值加载到List<Dictionary< ...

  2. Linux命令学习笔记(1)

    groupadd 1.作用 groupadd命令用于将新组加入系统.2.格式groupadd [-g gid] [-o]] [-r] [-f] groupname3.主要参数-g gid:指定组ID号 ...

  3. player/stage 学习---安装

    环境 ubuntu 14.04 一,工具安装 sudo apt-get install git cmake g++ fltk1.1-dev libjpeg8-dev libpng12-dev libg ...

  4. 车牌识别LPR(四)-- 车牌定位

    第四篇:车牌定位 车牌定位就是采用一系列图像处理或者数学的方法从一幅图像中将车牌准确地定位出来.车牌定位提取出的车牌是整个车牌识别系统的数据来源,它的效果的好坏直接影响到整个系统的表现,只有准确地定位 ...

  5. Codeforces 672

    题目链接:http://codeforces.com/contest/672/problem A. Summer Camp(打表) 题意:123456789...一串字符串,问第n个是什么数字. 塞一 ...

  6. tuxedo入门

    为文件增加用户执行权限: 官网下载tuxedo111120_64_Linux_01_x86.bin su //进入root操作,防止权限不够 创建文件夹,用来做tuxedo文件的安装路径 cd /op ...

  7. servlet应用具体实例

    web,xml应用文件 1.<filter>参数 <filter> <filter-name>encodingFilter</filter-name> ...

  8. fil_space_create

    /*******************************************************************//** Creates a space memory obje ...

  9. org.tigris.subversion.javahl.ClientException: Attempted to lock an already-locked dir异常解决方法

    myeclipse用svn提交的时候报错: Attempted to lock an already-locked dir svn: Working copy 'D:/Program Files/My ...

  10. UVa 10837 (欧拉函数 搜索) A Research Problem

    发现自己搜索真的很弱,也许做题太少了吧.代码大部分是参考别人的,=_=|| 题意: 给出一个phi(n),求最小的n 分析: 回顾一下欧拉函数的公式:,注意这里的Pi是互不相同的素数,所以后面搜索的时 ...