What Is Hive

Hive is a data warehousing infrastructure based on Hadoop. Hadoop provides massive scale out and fault tolerance capabilities for data storage and processing (using the map-reduce programming paradigm) on commodity hardware.

Hive is designed to enable easy data summarization, ad-hoc querying and analysis of large volumes of data. It provides a simple query language called Hive QL, which is based on SQL and which enables users familiar with SQL to do ad-hoc querying, summarization and data analysis easily. At the same time, Hive QL also allows traditional map/reduce programmers to be able to plug in their custom mappers and reducers to do more sophisticated analysis that may not be supported by the built-in capabilities of the language.

What Hive Is NOT

Hadoop is a batch processing system and Hadoop jobs tend to have high latency and incur substantial overheads in job submission and scheduling. As a result - latency for Hive queries is generally very high (minutes) even when data sets involved are very small (say a few hundred megabytes). As a result it cannot be compared with systems such as Oracle where analyses are conducted on a significantly smaller amount of data but the analyses proceed much more iteratively with the response times between iterations being less than a few minutes. Hive aims to provide acceptable (but not optimal) latency for interactive data browsing, queries over small data sets or test queries.

Hive is not designed for online transaction processing and does not offer real-time queries and row level updates. It is best used for batch jobs over large sets of immutable data (like web logs). 不支持在线事务处理,不支持实时查询,不支持行数据更新。最好的用途是:批量处理大数据集的不可改变的数据,比如网络日志。

In the following sections we provide a tutorial on the capabilities of the system. We start by describing the concepts of data types, tables and partitions (which are very similar to what you would find in a traditional relational DBMS) and then illustrate the capabilities of the QL language with the help of some examples.

[Hive-Tutorial] What Is Hive的更多相关文章

  1. Hive Tutorial(上)(Hive 入门指导)

    用户指导 Hive 指导 Hive指导 概念 Hive是什么 Hive不是什么 获得和开始 数据单元 类型系统 内置操作符和方法 语言性能 用法和例子(在<下>里面) 概念 Hive是什么 ...

  2. [Hive - Tutorial] Type System 数据类型

    数据类型Type System Hive supports primitive and complex data types, as described below. See Hive Data Ty ...

  3. Hive Tutorial 阅读记录

    Hive Tutorial 目录 Hive Tutorial 1.Concepts 1.1.What Is Hive 1.2.What Hive Is NOT 1.3.Getting Started ...

  4. Spark入门实战系列--5.Hive(上)--Hive介绍及部署

    [注]该系列文章以及使用到安装包/测试数据 可以在<倾情大奉送--Spark入门实战系列>获取 .Hive介绍 1.1 Hive介绍 月开源的一个数据仓库框架,提供了类似于SQL语法的HQ ...

  5. hive学习3(hive基本操作)

    hive基本操作 hive的数据类型 1)基本数据类型 TINYINT,SMALLINT,INT,BIGINT FLOAT/DOUBLE BOOLEAN STRING 2)复合类型 ARRAY:一组有 ...

  6. [Hive - LanguageManual] Statistics in Hive

    Statistics in Hive Statistics in Hive Motivation Scope Table and Partition Statistics Column Statist ...

  7. Hive over HBase和Hive over HDFS性能比较分析

    http://superlxw1234.iteye.com/blog/2008274 环境配置: hadoop-2.0.0-cdh4.3.0 (4 nodes, 24G mem/node) hbase ...

  8. Hive学习之一 《Hive的介绍和安装》

    一.什么是Hive Hive是建立在 Hadoop 上的数据仓库基础构架.它提供了一系列的工具,可以用来进行数据提取转化加载(ETL),这是一种可以存储.查询和分析存储在 Hadoop 中的大规模数据 ...

  9. Hive综合HBase——经Hive阅读/书写 HBase桌子

    社论: 本文将Hive与HBase整合在一起,使Hive能够读取HBase中的数据,让Hadoop生态系统中最为经常使用的两大框架互相结合.相得益彰. watermark/2/text/aHR0cDo ...

  10. hive优化之——控制hive任务中的map数和reduce数

    一.    控制hive任务中的map数: 1.    通常情况下,作业会通过input的目录产生一个或者多个map任务.主要的决定因素有: input的文件总个数,input的文件大小,集群设置的文 ...

随机推荐

  1. 关于Linux的时间与时区

    转:http://linux.chinaunix.net/techdoc/beginner/2007/06/22/960790.shtml 首先要说明的是我的系统是fedora,其他系统可能不完全相同 ...

  2. Hibernate开发之二 映射主键-

    <class name="cn.itcast.e_hbm_id.User" table="user">            <!-- 映射主 ...

  3. 最受欢迎的5款PHP框架记录,我居然一个不知道。。。

    1. CodeIgniter Framework CodeIgniter 是目前使用最广泛的 PHP 框架.CodeIgniter 是一个简单快速的PHP MVC 框架.EllisLab 的工作人员发 ...

  4. 【设计模式】—— 单例模式Singleton

    前言:[模式总览]——————————by xingoo 模式意图 保证类仅有一个实例,并且可以供应用程序全局使用.为了保证这一点,就需要这个类自己创建自己的对象,并且对外有公开的调用方法. 模式结构 ...

  5. 用imagemagick和tesseract-ocr破解简单验证码

    用imagemagick和tesseract-ocr破解简单验证码 Tesseract-ocr据说辨识程度是世界排名第三,可谓神器啊. 准备工作: 1.安装tesseract-ocr sudo apt ...

  6. Python3 学习第四弹:编码问题(转载)

    关于python的编码问题一直以来不得解,终于在今天从这篇博文中明白了. 原文地址: http://nedbatchelder.com/text/unipain.html 译文地址:http://py ...

  7. 宏UT_LIST_ADD_FIRST

    /*******************************************************************//** Adds the node as the first ...

  8. Smack IQ包的扩展

    前几天一直很烦躁,怎么扩展smack的IQ包堵了我好久,今天静下心来看了下smack的源码,把这个问题解决了.下面给出步骤: 如果我们要扩展一个如下所示的IQ包: <iq id="00 ...

  9. UVa 1328 (KMP求字符串周期) Period

    当初学KMP的时候也做过这道题,现在看来还是刘汝佳的代码要精简一些,毕竟代码越短越好记,越不容易出错. 而且KMP的递推失配函数的代码风格和后面的Aho-Corasick自动机求失配函数的代码风格也是 ...

  10. Java [leetcode 2] Add Two Numbers

    问题描述: You are given two linked lists representing two non-negative numbers. The digits are stored in ...