Spark SQL CLI描述

Spark SQL CLI的引入使得在SparkSQL中通过hive metastore就可以直接对hive进行查询更加方便;当前版本中还不能使用Spark SQL CLI与ThriftServer进行交互。

使用Spark SQL CLI前需要注意:

1、将hive-site.xml配置文件拷贝到$SPARK_HOME/conf目录下;

2、需要在$SPARK_HOME/conf/spark-env.sh中的SPARK_CLASSPATH添加jdbc驱动的jar包

export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/hadoop/software/mysql-connector-java-5.1.-bin.jar

Spark SQL CLI命令参数介绍:

cd $SPARK_HOME/bin
spark-sql --help
Usage: ./bin/spark-sql [options] [cli option]
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Options:
--master MASTER_URL spark://host:port, mesos://host:port, yarn, or local.
--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
on one of the worker machines inside the cluster ("cluster")
(Default: client).
--class CLASS_NAME Your application's main class (for Java / Scala apps).
--name NAME A name of your application.
--jars JARS Comma-separated list of local jars to include on the driver
and executor classpaths.
--py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place
on the PYTHONPATH for Python apps.
--files FILES Comma-separated list of files to be placed in the working
directory of each executor. --conf PROP=VALUE Arbitrary Spark configuration property.
--properties-file FILE Path to a file from which to load extra properties. If not
specified, this will look for conf/spark-defaults.conf. --driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 512M).
--driver-java-options Extra Java options to pass to the driver.
--driver-library-path Extra library path entries to pass to the driver.
--driver-class-path Extra class path entries to pass to the driver. Note that
jars added with --jars are automatically included in the
classpath. --executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G). --help, -h Show this help message and exit
--verbose, -v Print additional debug output Spark standalone with cluster deploy mode only:
--driver-cores NUM Cores for driver (Default: ).
--supervise If given, restarts the driver on failure. Spark standalone and Mesos only:
--total-executor-cores NUM Total cores for all executors. YARN-only:
--executor-cores NUM Number of cores per executor (Default: ).
--queue QUEUE_NAME The YARN queue to submit to (Default: "default").
--num-executors NUM Number of executors to launch (Default: ).
--archives ARCHIVES Comma separated list of archives to be extracted into the
working directory of each executor. CLI options:
-d,--define <key=value> Variable subsitution to apply to hive
commands. e.g. -d A=B or --define A=B
--database <databasename> Specify the database to use
-e <quoted-query-string> SQL from command line
-f <filename> SQL from files
-h <hostname> connecting to Hive Server on remote host
--hiveconf <property=value> Use value for given property
--hivevar <key=value> Variable subsitution to apply to hive
commands. e.g. --hivevar A=B
-i <filename> Initialization SQL file
-p <port> connecting to Hive Server on port number
-S,--silent Silent mode in interactive shell
-v,--verbose Verbose mode (echo executed SQL to the console)

在启动spark-sql时,如果不指定master,则以local的方式运行,master既可以指定standalone的地址,也可以指定yarn;

当设定master为yarn时(spark-sql --master yarn)时,可以通过http://hadoop000:8088页面监控到整个job的执行过程;

注:如果在$SPARK_HOME/conf/spark-defaults.conf中配置了spark.master spark://hadoop000:7077,那么在启动spark-sql时不指定master也是运行在standalone集群之上。

spark-sql使用

启动spark-sql: 由于我已经在spark-defaults.conf中配置了spark.master spark://hadoop000:7077,就没在spark-sql启动时指定master了

cd $SPARK_HOME/bin
spark-sql
SELECT track_time, url, session_id, referer, ip, end_user_id, city_id FROM page_views WHERE city_id = - limit ;
SELECT session_id, count(*) c FROM page_views group by session_id order by c desc limit ;

上面两个sql语句用到的表现在存在hive中了,如果没有则手工创建下,创建脚本以及导入数据脚本如下:

create table page_views(
track_time string,
url string,
session_id string,
referer string,
ip string,
end_user_id string,
city_id string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data local inpath '/home/spark/software/data/page_views.dat' overwrite into table page_views;   

SparkSQL使用之Spark SQL CLI的更多相关文章

  1. 6. 运行Spark SQL CLI

    Spark SQL CLI可以很方便的在本地运行Hive元数据服务以及从命令行执行任务查询.需要注意的是,Spark SQL CLI不能与Thrift JDBC服务交互.在Spark目录下执行如下命令 ...

  2. 第6章 运行Spark SQL CLI

    第6章 运行Spark SQL CLI Spark SQL CLI可以很方便的在本地运行Hive元数据服务以及从命令行执行查询任务.需要注意的是,Spark SQL CLI不能与Thrift JDBC ...

  3. Spark SQL CLI 实现分析

    背景 本文主要介绍了Spark SQL里眼下的CLI实现,代码之后肯定会有不少变动,所以我关注的是比較核心的逻辑.主要是对照了Hive CLI的实现方式,比較Spark SQL在哪块地方做了改动,哪些 ...

  4. spark-sql(spark sql cli)客户端集成hive

    1.安装hadoop集群 参考:http://www.cnblogs.com/wcwen1990/p/6739151.html 2.安装hive 参考:http://www.cnblogs.com/w ...

  5. Spark 官方文档(5)——Spark SQL,DataFrames和Datasets 指南

    Spark版本:1.6.2 概览 Spark SQL用于处理结构化数据,与Spark RDD API不同,它提供更多关于数据结构信息和计算任务运行信息的接口,Spark SQL内部使用这些额外的信息完 ...

  6. Spark SQL 官方文档-中文翻译

    Spark SQL 官方文档-中文翻译 Spark版本:Spark 1.5.2 转载请注明出处:http://www.cnblogs.com/BYRans/ 1 概述(Overview) 2 Data ...

  7. Spark SQL 之 Performance Tuning & Distributed SQL Engine

    Spark SQL 之 Performance Tuning & Distributed SQL Engine 转载请注明出处:http://www.cnblogs.com/BYRans/ 缓 ...

  8. Apache Spark 2.2.0 中文文档 - Spark SQL, DataFrames and Datasets Guide | ApacheCN

    Spark SQL, DataFrames and Datasets Guide Overview SQL Datasets and DataFrames 开始入门 起始点: SparkSession ...

  9. Spark SQL官方文档阅读--待完善

    1,DataFrame是一个将数据格式化为列形式的分布式容器,类似于一个关系型数据库表. 编程入口:SQLContext 2,SQLContext由SparkContext对象创建 也可创建一个功能更 ...

随机推荐

  1. mvc checked=\"checked\"

    <input id="IsDiscount" type="checkbox" @(Model != null && Model.IsDis ...

  2. Learning Puppet — Manifests

    Begin In a text editor — vim, emacs, or nano — create a file with the following contents and filenam ...

  3. POWERDESIGNER中显示样式设置

    1.调整表.视图的显示样式. 右键选中的对象,选择format(或ctrl+t),在弹出窗口中选中content,可以设置只显示表名还是把所有列也显示出来. 2.如何显示表中字段的code. tool ...

  4. foxmail 6.5升级到7.0版本后,旧邮件的导入处理

    随着foxmail 7.0版的火热升级,部分从foxmial 6.5版升级到7.0版的用户可能会出现旧邮件丢失的困扰.这里,foxmail为大家提供的解决方案如下:   打开Foxmail,点击 文件 ...

  5. PHP中的中文截取乱码问题_gb2312_utf-8

    一.字符串编码为gb2312,一个中文占俩字节 public static function chinesesubstr($str, $start, $len) { // $str指字符串,$star ...

  6. hibernate 大对象映射

    1. 在pojo类中 用Blob类和Clob public class Student { private int id; private String name; private int age; ...

  7. image和字节流之间的相互转换

    //将图片转化为长二进制 public Byte[] SetImgToByte(string imgPath) { FileStream file = new FileStream(imgPath, ...

  8. JDK动态代理与Cglib库

    JDK动态代理 代理模式是常用的java设计模式,他的特征是代理类与委托类有同样的接口,代理类主要负责为委托类预处理消息.过滤消息.把消息转发给委托类,以及事后处理消息等.代理类与委托类之间通常会存在 ...

  9. mysql随机更新时间

    UPDATE data_arch_point SET click_time = DATE_ADD('2016-06-15 00:00:00', INTERVAL )) SECOND) WHERE DA ...

  10. vs2015-Azure Mobile Service

    /App_Data /App_Start/ WebApiConfig.cs using System; using System.Collections.Generic; using System.C ...