hive数据导入方法

可以通过多种方式将数据导入hive表

1.通过外部表导入

用户在hive上建external表，建表的同时指定hdfs路径，在数据拷贝到指定hdfs路径的同时，也同时完成数据插入external表。

例如：

编辑文件test.txt

$ cat test.txt

1 hello

2 world

3 test

4 case

字段之间以'\t'分割

启动hive：

$ hive

建external表：

hive> CREATE EXTERNAL TABLE MYTEST(num INT, name STRING)

> COMMENT 'this is a test'

> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

> STORED AS TEXTFILE

> LOCATION '/data/test';

OK

Time taken: 0.714 seconds

hive> show tables;

OK

mytest

partition_test

partition_test_input

test

Time taken: 0.07 seconds

hive> desc mytest ;

OK

num int

name string

Time taken: 0.121 seconds|

数据拷贝到hdfs：

$ hadoop fs -put test.txt /data/test

查看hive表数据：

hive> select * from mytest;

OK

1 hello

2 world

3 test

4 case

Time taken: 0.375 seconds

hive> select num from mytest;

Total MapReduce jobs = 1

Launching Job 1 out of 1

......

Total MapReduce CPU Time Spent: 510 msec

OK

Time taken: 27.157 seconds

这种方式常常用于当hdfs上有一些历史数据，而我们需要在这些数据上做一些hive的操作时使用。这种方式避免了数据拷贝开销

2.从本地导入

数据不在hdfs上，直接从本地导入hive表

文件/home/work/test.txt内容同上

建表：

hive> CREATE TABLE MYTEST2(num INT, name STRING)

> COMMENT 'this is a test2'

> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

> STORED AS TEXTFILE;

OK

Time taken: 0.077 seconds

导数据入表：

hive> LOAD DATA LOCAL INPATH '/home/work/test.txt' INTO TABLE MYTEST2;

Copying data from file:/home/work/test.txt

Copying file: file:/home/work/test.txt

Loading data to table default.mytest2

OK

Time taken: 0.24 seconds

查看数据：

hive> select * from MYTEST2;

OK

1 hello

2 world

3 test

4 case

Time taken: 0.11 seconds

这种方式导入的本地数据可以是一个文件，一个文件夹或者通配符，需要注意的是，如果是文件夹，文件夹内不能包含子目录，同样，通配符只能通配文件。

3.从hdfs导入

上述test.txt文件已经导入/data/test

则可以使用下述命令直接将数据导入hive表：

hive> CREATE TABLE MYTEST3(num INT, name STRING)

> COMMENT "this is a test3"

> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

> STORED AS TEXTFILE;

OK

Time taken: 4.735 seconds

hive> LOAD DATA INPATH '/data/test/test.txt' INTO TABLE MYTEST3;

Loading data to table default.mytest3

OK

Time taken: 0.337 seconds

hive> select * from MYTEST3 ;

OK

1 hello

2 world

3 test

4 case

Time taken: 0.227 seconds

4. 从其它表导入数据：

hive> CREATE EXTERNAL TABLE MYTEST4(num INT) ;

OK

Time taken: 0.091 seconds

hive> FROM MYTEST3 test3

> INSERT OVERWRITE TABLE MYTEST4

> select test3.num where name='world';

Total MapReduce jobs = 2

Launching Job 1 out of 2

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_201207230024_0002, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201207230024_0002

Kill Command = /home/work/hadoop/hadoop-1.0.3/libexec/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201207230024_0002

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2012-07-23 18:59:02,365 Stage-1 map = 0%, reduce = 0%

2012-07-23 18:59:08,417 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

2012-07-23 18:59:09,435 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

2012-07-23 18:59:10,445 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

2012-07-23 18:59:11,455 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

2012-07-23 18:59:12,470 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

2012-07-23 18:59:13,489 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

2012-07-23 18:59:14,508 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 0.62 sec

MapReduce Total cumulative CPU time: 620 msec

Ended Job = job_201207230024_0002

Ended Job = -174856900, job is filtered out (removed at runtime).

Moving data to: hdfs://localhost:9000/tmp/hive-work/hive_2012-07-23_18-58-44_166_189728317691010041/-ext-10000

Loading data to table default.mytest4

Deleted hdfs://localhost:9000/user/hive/warehouse/mytest4

Table default.mytest4 stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 2, raw_data_size: 0]

1 Rows loaded to mytest4

MapReduce Jobs Launched:

Job 0: Map: 1 Accumulative CPU: 0.62 sec HDFS Read: 242 HDFS Write: 2 SUCESS

Total MapReduce CPU Time Spent: 620 msec

OK

Time taken: 30.663 seconds

hive> select * from mytest4;

OK

Time taken: 0.103 seconds