hive的shell用法（脑子糊涂了，对着脚本第一行是 #!/bin/sh 疯狂执行hive -f 结果报错）

hive脚本的执行方式

hive脚本的执行方式大致有三种：

hive控制台执行；
hive -e "SQL"执行；
hive -f SQL文件执行；
参考hive用法：

usage: hive

 -d,--define <key=value>          Variable subsitution to apply to hive

                                  commands. e.g. -d A=B or --define A=B

    --database <databasename>     Specify the database to use

 -e <quoted-query-string>         SQL from command line

 -f <filename>                    SQL from files

 -H,--help                        Print help information

 -h <hostname>                    connecting to Hive Server on remote host

    --hiveconf <property=value>   Use value for given property

    --hivevar <key=value>         Variable subsitution to apply to hive

                                  commands. e.g. --hivevar A=B

 -i <filename>                    Initialization SQL file

 -p <port>                        connecting to Hive Server on port number

 -S,--silent                      Silent mode in interactive shell

 -v,--verbose                     Verbose mode (echo executed SQL to the

                                  console)

1.1. hive控制台执行

顾名思义，是进入hive控制台以后，执行sql脚本，例如：

hive> set mapred.job.queue.name=pms;

hive> select page_name, tpa_name from pms.pms_exps_prepro limit 2;

Total MapReduce jobs = 1

Launching Job 1 out of 1

...

Job running in-process (local Hadoop)

2015-10-23 10:06:47,756 null map = 100%,  reduce = 0%

2015-10-23 10:06:48,863 null map = 23%,  reduce = 0%

2015-10-23 10:06:49,946 null map = 38%,  reduce = 0%

2015-10-23 10:06:51,051 null map = 72%,  reduce = 0%

2015-10-23 10:06:52,129 null map = 100%,  reduce = 0%

Ended Job = job_local1109193547_0001

Execution completed successfully

Mapred Local Task Succeeded . Convert the Join into MapJoin

OK

APP首页   APP首页_价格比京东低

APP首页   APP首页_价格比京东低

Time taken: 14.279 seconds

hive>

1.2. hive -e "SQL"方式执行

利用hive -e "SQL"的方式进入hive控制台并直接执行sql脚本，例如：

hive -e "

set mapred.job.queue.name=pms;

set mapred.job.name=[HQL]exps_prepro_query;

select page_name, tpa_name

from pms.pms_exps_prepro

limit 2;"

1.3. hive -f SQL文件方式执行

执行sql文件中的sql脚本，例如：

pms_exps_prepro.sql文件内容如下：

set mapred.job.queue.name=pms;

set hive.exec.reducers.max=48;

set mapred.reduce.tasks=48;

set mapred.job.name=[HQL]pms_exps_prepro;

drop table if exists pms.pms_exps_prepro;

create table pms.pms_exps_prepro as

select

  a.provinceid,

  a.cityid,

  a.ieversion,

  a.platform,

  '${date}' as ds

from track_exps a;

上述文件中的sql脚本接收一个日期，接收参数写法类似${date}，执行时如下执行：

date=2015-10-22

hive -f pms_exps_prepro.sql --hivevar date=$date

date=2015-10-22

hive -f pms_exps_prepro.sql --hivevar date=$date

2. hive转义字符的问题

下面以一个业务场景阐述关于hive转义字符的问题

track_exps记录曝光数据，现在小A希望获取2015-10-20有效的曝光数据
其中有效的曝光记录是指，

relatedinfo字段满足数字.数字.数字.数字.数字的格式，
例如4.4.5.1080100.1

extfield1字段满足request-字符串,section-数字的格式，
例如request-b470805b620900ac492bb892ad7e955e,section-4
对于这个问题，小A写出了如下sql脚本：

select

    *

from track_exps

where ds = '2015-10-20'

  and relatedinfo rlike '^4.\d+.\d+.\d+.\d+$'

  and extfield1 rlike '^request.+section-\d+$';

但是由于正则表达式是被包含在sql里面，所以里面的特殊字符需要转义

2.1. hive -e "SQL"的方式执行

改动如下：

 hive -e "

 set mapred.job.queue.name=pms;

 explain select

     cityid

 from track_exps

 where ds = '2015-10-20'

   and relatedinfo rlike '\\^4\\.\\\d\\+\\.\\\d\\+\\.\\\d\\+\\.\\\d\\+\\$'

   and extfield1 rlike '\\^request\\.\\+section\\-\\\d\\+\\$';"

查看执行计划，可以确定正则表达式解析正确了：

...

predicate:

  expr: ((relatedinfo rlike '^4.\d+.\d+.\d+.\d+$') and (extfield1 rlike '^request.+section-\d+$'))

  type: boolean

...

分析如下：

在hive -e “SQL"的执行方式中，”‘正则表达式’"，正则表达式先被一个单引号括起来，再被一个双引号括起来的，所以正则表达式里面，\^的第一个\用来解析第二个\，第二个\才真正起到了转义的作用

2.2. hive -f SQL文件的方式执行

改动如下：

pms_exps_prepro.sql文件内容如下：

select

    *

from track_exps

where ds = '2015-10-20'

  and relatedinfo rlike '\^4\.\\d\+\.\\d\+\.\\d\+\.\\d\+\$'

  and extfield1 rlike '\^request\.\+section\-\\d\+\$';

分析如下：

不同于hive -e "SQL"的执行方式，因为是sql文件，所以正则表达式只被一个单引号括起来而已，一个\就起到了转义的作用了

注意：今天脑子突然糊涂了，对着脚本第一行是 #!/bin/sh 疯狂执行hive -f 结果报错，很愚蠢的问题就是，这样的文件应该是Linux的执行方式是：sh 文件名而不是hive -f sql文件