SparkSQL使用之Spark SQL CLI

Spark SQL CLI描述

Spark SQL CLI的引入使得在SparkSQL中通过hive metastore就可以直接对hive进行查询更加方便；当前版本中还不能使用Spark SQL CLI与ThriftServer进行交互。

使用Spark SQL CLI前需要注意：

1、将hive-site.xml配置文件拷贝到$SPARK_HOME/conf目录下；

2、需要在$SPARK_HOME/conf/spark-env.sh中的SPARK_CLASSPATH添加jdbc驱动的jar包

export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/hadoop/software/mysql-connector-java-5.1.-bin.jar

Spark SQL CLI命令参数介绍：

cd $SPARK_HOME/bin

spark-sql --help

Usage: ./bin/spark-sql [options] [cli option]

Spark assembly has been built with Hive, including Datanucleus jars on classpath

Options:

  --master MASTER_URL         spark://host:port, mesos://host:port, yarn, or local.

  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or

                              on one of the worker machines inside the cluster ("cluster")

                              (Default: client).

  --class CLASS_NAME          Your application's main class (for Java / Scala apps).

  --name NAME                 A name of your application.

  --jars JARS                 Comma-separated list of local jars to include on the driver

                              and executor classpaths.

  --py-files PY_FILES         Comma-separated list of .zip, .egg, or .py files to place

                              on the PYTHONPATH for Python apps.

  --files FILES               Comma-separated list of files to be placed in the working

                              directory of each executor.

  --conf PROP=VALUE           Arbitrary Spark configuration property.

  --properties-file FILE      Path to a file from which to load extra properties. If not

                              specified, this will look for conf/spark-defaults.conf.

  --driver-memory MEM         Memory for driver (e.g. 1000M, 2G) (Default: 512M).

  --driver-java-options       Extra Java options to pass to the driver.

  --driver-library-path       Extra library path entries to pass to the driver.

  --driver-class-path         Extra class path entries to pass to the driver. Note that

                              jars added with --jars are automatically included in the

                              classpath.

  --executor-memory MEM       Memory per executor (e.g. 1000M, 2G) (Default: 1G).

  --help, -h                  Show this help message and exit

  --verbose, -v               Print additional debug output

 Spark standalone with cluster deploy mode only:

  --driver-cores NUM          Cores for driver (Default: ).

  --supervise                 If given, restarts the driver on failure.

 Spark standalone and Mesos only:

  --total-executor-cores NUM  Total cores for all executors.

 YARN-only:

  --executor-cores NUM        Number of cores per executor (Default: ).

  --queue QUEUE_NAME          The YARN queue to submit to (Default: "default").

  --num-executors NUM         Number of executors to launch (Default: ).

  --archives ARCHIVES         Comma separated list of archives to be extracted into the

                              working directory of each executor.

CLI options:

-d,--define <key=value>          Variable subsitution to apply to hive

                                  commands. e.g. -d A=B or --define A=B

    --database <databasename>     Specify the database to use

 -e <quoted-query-string>         SQL from command line

 -f <filename>                    SQL from files

 -h <hostname>                    connecting to Hive Server on remote host

    --hiveconf <property=value>   Use value for given property

    --hivevar <key=value>         Variable subsitution to apply to hive

                                  commands. e.g. --hivevar A=B

 -i <filename>                    Initialization SQL file

 -p <port>                        connecting to Hive Server on port number

 -S,--silent                      Silent mode in interactive shell

 -v,--verbose                     Verbose mode (echo executed SQL to the console)

在启动spark-sql时，如果不指定master，则以local的方式运行，master既可以指定standalone的地址，也可以指定yarn；

当设定master为yarn时(spark-sql --master yarn)时，可以通过http://hadoop000:8088页面监控到整个job的执行过程；

注：如果在$SPARK_HOME/conf/spark-defaults.conf中配置了spark.master spark://hadoop000:7077，那么在启动spark-sql时不指定master也是运行在standalone集群之上。

spark-sql使用

启动spark-sql：由于我已经在spark-defaults.conf中配置了spark.master spark://hadoop000:7077，就没在spark-sql启动时指定master了

cd $SPARK_HOME/bin

spark-sql

SELECT track_time, url, session_id, referer, ip, end_user_id, city_id FROM page_views WHERE city_id = - limit ;

SELECT session_id, count(*) c FROM page_views group by session_id order by c desc limit ;

上面两个sql语句用到的表现在存在hive中了，如果没有则手工创建下，创建脚本以及导入数据脚本如下：

create table page_views(

track_time string,

url string,

session_id string,

referer string,

ip string,

end_user_id string,

city_id string

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

load data local inpath '/home/spark/software/data/page_views.dat' overwrite into table page_views;

SparkSQL使用之Spark SQL CLI的更多相关文章

6. 运行Spark SQL CLI
Spark SQL CLI可以很方便的在本地运行Hive元数据服务以及从命令行执行任务查询.需要注意的是,Spark SQL CLI不能与Thrift JDBC服务交互.在Spark目录下执行如下命令 ...
第6章运行Spark SQL CLI
第6章运行Spark SQL CLI Spark SQL CLI可以很方便的在本地运行Hive元数据服务以及从命令行执行查询任务.需要注意的是,Spark SQL CLI不能与Thrift JDBC ...
Spark SQL CLI 实现分析
背景本文主要介绍了Spark SQL里眼下的CLI实现,代码之后肯定会有不少变动,所以我关注的是比較核心的逻辑.主要是对照了Hive CLI的实现方式,比較Spark SQL在哪块地方做了改动,哪些 ...
spark-sql(spark sql cli)客户端集成hive
1.安装hadoop集群参考:http://www.cnblogs.com/wcwen1990/p/6739151.html 2.安装hive 参考:http://www.cnblogs.com/w ...
Spark 官方文档（5）——Spark SQL，DataFrames和Datasets 指南
Spark版本:1.6.2 概览 Spark SQL用于处理结构化数据,与Spark RDD API不同,它提供更多关于数据结构信息和计算任务运行信息的接口,Spark SQL内部使用这些额外的信息完 ...
Spark SQL 官方文档-中文翻译
Spark SQL 官方文档-中文翻译 Spark版本:Spark 1.5.2 转载请注明出处:http://www.cnblogs.com/BYRans/ 1 概述(Overview) 2 Data ...
Spark SQL 之 Performance Tuning & Distributed SQL Engine
Spark SQL 之 Performance Tuning & Distributed SQL Engine 转载请注明出处:http://www.cnblogs.com/BYRans/ 缓 ...
Apache Spark 2.2.0 中文文档 - Spark SQL, DataFrames and Datasets Guide | ApacheCN
Spark SQL, DataFrames and Datasets Guide Overview SQL Datasets and DataFrames 开始入门起始点: SparkSession ...
Spark SQL官方文档阅读--待完善
1,DataFrame是一个将数据格式化为列形式的分布式容器,类似于一个关系型数据库表. 编程入口:SQLContext 2,SQLContext由SparkContext对象创建也可创建一个功能更 ...

随机推荐

SVN 获取源码一直没绿色打勾的标识原因。
左天晚上,找了一晚的原因,各种尝试,SVN 获取源码一直没绿色打勾的标识原因. 后来在百度知道发现了这个原因,是因为我没有使用的是EXPORT,这样只是导出没有代码关联, 只要使用checkout就 ...
Verilog杂谈
1. Testbech总是用reg去驱动DUT的input端口,因为需要在仿真期间设置和保持输入端的值(例如在initial中设置初值,在always中设置激励值): 2. 避免对局部reg在定义时赋 ...
Window下Nexus私服搭建
项目组大部分人员不能访问maven的central repository,因此在局域网里找一台有外网权限的机器,搭建nexus私服,然后开发人员连到这台私服上环境是:nexus-2.1.1.mav ...
51nod 1211 数独
数独游戏规则如下:在9 * 9的盘面上有些已知的数字及未知的数字,推理出所有未知的数字,并满足每一行.每一列.每一个粗线宫内的数字均含1-9,不重复. 有些局面存在多个解或无解,这属于不标 ...
重复ID的记录，只显示其中1条
--重复ID的记录,只显示其中1条 --生成原始表 select * into #tempTable from ( select '1' as id ,'a' as name union all se ...
PHP Socket 编程详解
PHP中的实现服务端 <?php set_time_limit(0); // 设置主机和端口 $host = "127.0.0.1"; $port = 12387; // ...
Skip StyleCop Warnings.
[SuppressMessage("Microsoft.StyleCop.CSharp.MaintainabilityRules", "SA1401:FieldsMust ...
Spark作业调度
Spark在任务提交时,主要存在于Driver和Executor的两个节点. (1)Driver的作用: 用于将所有要处理的RDD的操作转化为DAG,并且根据RDD DAG将JBO分割为多个Stage ...
Form_Form页面跳转的四种方式(open_form, call_form, new_form, fnd_function)详解（汇总）
2014-06-29 Created By BaoXinjian
laravel项目报错DecryptException：The MAC is invalid.
1.问题描述把Laravel项目上传至服务器,本地数据库导出再导入至服务器数据库,一切运行正常,但是当进行用户登录时报错 DecryptException in compiled.php line ...

SparkSQL使用之Spark SQL CLI

SparkSQL使用之Spark SQL CLI的更多相关文章

随机推荐

热门专题