HIVE快速入门分类： B4_HIVE 2015-06-06 11:27 59人阅读评论(0) 收藏

（一）简单入门

1、创建一个表

create table if not exists ljh_emp(

name string,

salary float,

gender string)

comment 'basic information of a employee'

row format delimited fields terminated by ',’;

2、准备数据文件

创建test目录且目录只有一个文件，文件内容如下：

ljh,25000,male

jediael,25000,male

llq,15000,female

3、将数据导入表中

load data local inpath '/home/ljhn1829/test' into table ljh_emp;

4、查询表中的内容

select * from ljh_emp;

OK

ljh   25000.0   male

jediael   25000.0   male

llq   15000.0   female

Time taken: 0.159 seconds, Fetched: 3 row(s)

（二）关于分隔符

1、默认分隔符

hive中的行默认分隔符为 \n，字段分隔符为 ctrl+A，此外还有ctrl+B，ctrl+C，可以用于分隔array,struct,map等，详见《hive编程指南》P44。

因此，若在建表是不指定row format delimited fields terminated by ‘,’，则认为默认字段分隔符为ctrl+A。

可以有2种解决方案：

一是在创建表时指定分隔符，如上例所示，

二是在数据文件中使用ctrl+A，见下例

2、在数据文件中使用ctrl+A全分隔符

（1）创建表

create table ljh_test_emp(name string, salary float, gender string);

（2）准备数据文件

创建test2目录，目录下只有一个文件，文件内容如下：

ljh^A25000^Amale

jediael^A25000^Amale

llq^A15000^Afemale

其中的^A字符仅在vi时才能看到，cat不能看到。

输出^A的方法是：在vi的插入模式下，先按ctrl+V，再按ctrl+A

（3）将数据导入表

create table ljh_test_emp(name string, salary float, gender string);

（4）查询数据

hive> select * from ljh_test_emp;

OK

ljh   25000.0   male

jediael   25000.0   male

llq   15000.0   female

Time taken: 0.2 seconds, Fetched: 3 row(s)

3、未指定分隔符，且又未使用ctrl+A作文件中的分隔符，出现以下错误

(1)创建表

create table if not exists ljh_emp_test(

name string,

salary float,

gender string)

comment 'basic information of a employee’;

（2）准备数据

ljh,25000,male

jediael,25000,male

llq,15000,female

（3）将数据导入表中

load data local inpath '/home/ljhn1829/test' into table ljh_emp_test;

（4）查看表中数据

select * from ljh_emp_test;

OK

ljh,25000,male   NULL   NULL

jediael,25000,male   NULL   NULL

llq,15000,female   NULL   NULL

Time taken: 0.185 seconds, Fetched: 3 row(s)

可以看出，由于分隔符为ctrl+A，因此导入数据时将文件中的每一行内容均只当作第一个字段，导致后面2个字段均为null。

（三）复杂一点的表

1、创建表

create table employees (

    name string,

    slalary float,

    suboddinates array<string>,

    deductions map<string,float>,

    address struct<stree:string, city:string, state:string, zip:int>

)

partitioned by(country string, state string);

2、准备数据

John Doe^A100001.1^AMary Smith^BTodd Jones^AFederal Taxes^C.2^BStateTaxes^C.05^BInsurance^C.1^A1 Michigan Ave.^BChicago^BIL^B60600

Mary Smith^A80000.0^ABill King^AFederal Taxes^C.2^BState Taxes^C.05^BInsurance^C.1^A100 Ontario St.^BChicago^BIL^B60601

Todd Jones^A70000.0^A^AFederal Taxes^C.15^BState Taxes^C.03^BInsurance^C.1^A200 Chicago Ave.^BOak Park^BIL^B60700

Bill King^A60001.0^A^AFederal Taxes^C.15^BState Taxes^C.03^BInsurance^C.1^A300 Obscure Dr.^BObscuria^BIL^B60100

注意 ^A：分隔字段 ^B：分隔array/struct/map中的元素 ^C：分隔map中的KV

详见《hive编程指南》P44。

3、将数据导入表中

load data local inpath '/home/ljhn1829/phd' into table employees partition(country='us',state='ca');

4、查看表数据

hive> select * from employees;

OK

John Doe   100001.1   ["Mary Smith","Todd Jones"]   {"Federal Taxes":0.2,"StateTaxes":0.05,"Insurance":0.1}   {"stree":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}   us   ca

Mary Smith   80000.0   ["Bill King"]   {"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1}   {"stree":"100 Ontario St.","city":"Chicago","state":"IL","zip":60601}   us   ca

Todd Jones   70000.0   []   {"Federal Taxes":0.15,"State Taxes":0.03,"Insurance":0.1}   {"stree":"200 Chicago Ave.","city":"Oak Park","state":"IL","zip":60700}   us   ca

Bill King   60001.0   []   {"Federal Taxes":0.15,"State Taxes":0.03,"Insurance":0.1}   {"stree":"300 Obscure Dr.","city":"Obscuria","state":"IL","zip":60100}   us   ca

Time taken: 0.312 seconds, Fetched: 4 row(s)

5、查看hdfs中的文件

hadoop fs -ls /data/gamein/g4_us/meta/employees/country=us/state=ca

Found 1 items

-rwxr-x---   3 ljhn1829 g4_us        428 2015-05-12 12:49 /data/gamein/g4_us/meta/employees/country=us/state=ca/progamming_hive_data.txt

该文件中的内容与原有文件一致。

（四）通过select子句插入数据

1、创建表

create table employees2 (

    name string,

    slalary float,

    suboddinates array<string>,

    deductions map<string,float>,

    address struct<stree:string, city:string, state:string, zip:int>

)

partitioned by(country string, state string);

2、插入数据

hive> set hive.exec.dynamic.partition.mode=nonstrict;

否则会出现以下异常：

FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict

insert into table employees2

partition (country,state)

select name,slalary,suboddinates,deductions,address, e.country, e.state

from employees e;

HIVE快速入门分类： B4_HIVE 2015-06-06 11:27 59人阅读评论(0) 收藏的更多相关文章

Retinex系列之McCann99 Retinex 分类：图像处理 Matlab 2014-12-03 11:27 585人阅读评论(0) 收藏
一.McCann99 Retinex McCann99利用金字塔模型建立对图像的多分辨率描述,自顶向下逐层迭代,提高增强效率.对输入图像的长宽有严格的限制,要求可表示成 ,且 ,. 上述限制来源于金 ...
C#中的线程(上)-入门分类： C# 线程 2015-03-09 10:56 53人阅读评论(0) 收藏
1. 概述与概念 C#支持通过多线程并行地执行代码,一个线程有它独立的执行路径,能够与其它的线程同时地运行.一个C#程序开始于一个单线程,这个单线程是被CLR和操作系统(也称为"主线 ...
百度编辑器UEditor ASP.NET示例Demo 分类： ASP.NET 2015-01-12 11:18 346人阅读评论(0) 收藏
在百度编辑器示例代码基础上进行了修改,封装成类库,只需简单配置即可使用. 完整demo下载版权声明:本文为博主原创文章,未经博主允许不得转载.
Train Problem I 分类： HDU 2015-06-26 11:27 10人阅读评论(0) 收藏
Train Problem I Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 65536/32768 K (Java/Others) ...
Least Common Ancestors 分类： ACM TYPE 2014-10-19 11:24 84人阅读评论(0) 收藏
#include <iostream> #include <cstdio> #include <cstring> #include <cmath> #i ...
二分图匹配（KM算法）n^4 分类： ACM TYPE 2014-10-04 11:36 88人阅读评论(0) 收藏
#include <iostream> #include<cstring> #include<cstdio> #include<cmath> #incl ...
Segment Tree with Lazy 分类： ACM TYPE 2014-08-29 11:28 134人阅读评论(0) 收藏
#include<stdio.h> #include<string.h> #include<algorithm> using namespace std; stru ...
8大排序算法图文讲解分类： Brush Mode 2014-08-18 11:49 78人阅读评论(0) 收藏
排序算法可以分为内部排序和外部排序,内部排序是数据记录在内存中进行排序,而外部排序是因排序的数据很大,一次不能容纳全部的排序记录,在排序过程中需要访问外存. 常见的内部排序算法有:插入排序.希尔排序. ...
C语言之void类型及void指针分类： C/C++ 2015-07-13 11:24 8人阅读评论(0) 收藏
原文网址:http://www.cnblogs.com/pengyingh/articles/2407267.html 1.概述许多初学者对C/C 语言中的void及void指针类型不甚理解,因此在 ...

随机推荐

51.cgi网站后门
运行截图: html开发: <html> <body> <form id="form" name="form" method=&q ...
10.cocos2d坐标系
一.笛卡儿坐标系 OpenGl坐标系为笛卡儿右手系.x向右,y向上,z向外.在cocos2d-lua中坐标系原点在屏幕的左下角,x向右,y向上,z则是指的zorder(层级). 二.世界坐标系,本地坐 ...
洛谷P2251 质量检测
题目背景无题目描述为了检测生产流水线上总共N件产品的质量,我们首先给每一件产品打一个分数A表示其品质,然后统计前M件产品中质量最差的产品的分值Q[m] = min{A1, A2, ... Am} ...
netstat---显示Linux中网络系统的状态信息
netstat命令用来打印Linux中网络系统的状态信息,可让你得知整个Linux系统的网络情况. 语法 netstat(选项) 选项 -a或--all:显示所有连线中的Socket: -A<网 ...
004 python 流程控制语句
流程控制语句 1.if判断语法 a = 10,b = 20# 1if a == 10: print('a等于10')# 2if a > b: print('a大于b')else: pri ...
洛谷——T1725 探险
http://codevs.cn/problem/1725/ 时间限制: 1 s 空间限制: 256000 KB 题目等级 : 钻石 Diamond 题解查看运行结果题目描述 Descri ...
Java总结之线程
[线程的基本概念] 线程是一个程序内部的顺序控制流. 线程和进程的差别: 每一个进程都有独立的代码和数据空间(进程上下文),进程间的切换会有较大的开销. 线程能够看成是轻量级的进程,同一类线程 ...
[BZOJ1672][Usaco2005 Dec]Cleaning Shifts 清理牛棚线段树优化DP
链接题意:给你一些区间,每个区间都有一个花费,求覆盖区间 \([S,T]\) 的最小花费题解先将区间排序设 \(f[i]\) 表示决策到第 \(i\) 个区间,覆盖满 \(S\dots R[i ...
Zabbix主动代理模式 + 主动模式agent客户端
2.1.1 安装软件 ]# rpm -qa zabbix* zabbix-proxy-sqlite3-3.4.15-1.el7.x86_64 zabbix-proxy-mysql-3.4.15-1.e ...
将vue-cli 2.x的项目升级到3.x
尝试将vue-cli 2.x的项目升级到3.x,记录一下升级过程,和遇到的坑 1. 直接复制替换src文件夹 2. 安装项目需要的依赖 (可以将原来package.json dependencies下 ...

HIVE快速入门 分类： B4_HIVE 2015-06-06 11:27 59人阅读 评论(0) 收藏