嵌入式 ThriftServer in Spark

我们知道在Spark中可以通过start-thriftServer.sh 来启动ThriftServer，之后并可以通过beeline或者JDBC来连接并执行Spark SQL。在一般的Spark应用中，通常并不希望另外起一个服务进程，自然就要问：可以在Spark dirver program里启一个嵌入式的ThriftServer吗？

答案是Yes。要启动ThriftServer，首先需要HiveContext，并且需要在Spark中已经configure好了Hive。通过启动HiveContext，可以利用 DataFrame 的saveAsTable方法将dataframe save 成 Hive table，达到持久化效果。下面是代码示例：

import org.apache.spark.sql.hive.HiveContext

import  org.apache.spark.sql.hive.thriftserver._

// start the Thrift Server with existing sqlContext casting to HiveContext

HiveThriftServer2.startWithContext(sqlContext.asInstanceOf[HiveContext])

// wisdom_lu_country has two columns: id and desc

case class lu_country(id:Short,desc:String)

// load the file as RDD, split each line to id and desc, and convert it to DataFrame

val countryDF = sc.textFile("/FB_100/wisdom_lu_country.csv").map(_.split('^')).map(p=>lu_country(p(0).toShort,p(1))).toDF()

// save as Hive table

countryDF.write.saveAsTable("wisdom_lu_country")

上述代码在spark-shell中执行成功。

嵌入式 ThriftServer in Spark的更多相关文章

理解Spark SQL(一）—— CLI和ThriftServer
Spark SQL主要提供了两个工具来访问hive中的数据,即CLI和ThriftServer.前提是需要Spark支持Hive,即编译Spark时需要带上hive和hive-thriftserver ...
CentOS6安装各种大数据软件第十章：Spark集群安装和部署
相关文章链接 CentOS6安装各种大数据软件第一章:各个软件版本介绍 CentOS6安装各种大数据软件第二章:Linux各个软件启动命令 CentOS6安装各种大数据软件第三章:Linux基础 ...
spark动态资源（executor）分配
spark动态资源调整其实也就是说的executor数目支持动态增减,动态增减是根据spark应用的实际负载情况来决定. 开启动态资源调整需要(on yarn情况下) 1.将spark.dynamic ...
YARN 命令总结
起因:YARN 使用capability schedule queue调度container,spark 的app卡死在YARN的队列里面无法出来,无奈请教大神时,可用[yarn applicatio ...
Spark Sql之ThriftServer和Beeline的使用
概述 ThriftServer相当于service层,而ThriftServer通过Beeline来连接数据库.客户端用于连接JDBC的Server的一个工具步骤 1:启动metastore服务 . ...
spark thriftserver
spark可以作为一个分布式的查询引擎,用户通过JDBC的形式无需写任何代码,写写sql就可以实现查询啦,spark thriftserver的实现也是相当于hiveserver2的方式,并且在测试时 ...
编译spark支持thriftserver
cdh默认把spark的spark-sql以及hive-thriftserver给弃用掉了,想玩玩thriftserver,于是自己重新编译一个官网参考: http://spark.apache.o ...
Spark ThriftServer使用的大坑
当用beeline连接default后,通过use xxx切换到其他数据库,再退出, 再次使用beeline -u jdbc:hive2://hadoop000:10000/default -n sp ...
Hive On Spark hiveserver2方式使用
启动hiveserver2: hiveserver2 --hiveconf hive.execution.engine=spark spark.master=yarn 使用beeline连接hives ...

随机推荐

hibernate4.3版本构造SessionFactory方法
hibernate3.X构造SessionFactory方法 //读取hibernate.cfg.xml文件 Configuration cfg = new Configuration().confi ...
组合数性质求K个数选取i*j个数分成j组的方案数
分析:设方案数为ANS,C代表组合数: ANS=(C[K,I]*C[K-I,I][K-2*I,I]*...*C[K-(J-1)*I,I])/(J!); 也即: ANS=C[K,I*J]*(C[I*J, ...
0808关于RDS如何恢复到本地教程
转自http://www.cnblogs.com/ilanni/archive/2016/02/25/5218129.html 公司目前使用的数据库是阿里云的RDS,目前RDS的版本为mysql5.6 ...
How to pass external configuration properties to storm topology?
How to pass external configuration properties to storm topology? I want to pass some custom configur ...
AlterDialog 经常使用的样式
使用AlerDialog 创建对话框 : AlertDialog.Builder builder = new AlertDialog.Builder(this); 1.设置简单的对话框 builder ...
#leetcode#Anagrames
Given an array of strings, return all groups of strings that are anagrams. Note: All inputs will be ...
Redis各种数据类型的使用场景
Redis的六种特性 l Strings l Hashs l Lists l Sets l Sorted Sets l Pub/Sub Redis各特性的应用场景 Strings Strings 数据 ...
Working with SQL Server LocalDB
https://docs.asp.net/en/latest/tutorials/first-mvc-app/working-with-sql.html The ApplicationDbContex ...
nyoj--635--Oh, my goddess（dfs）
Oh, my goddess 时间限制:3000 ms | 内存限制:65535 KB 难度:3 描述 Shining Knight is the embodiment of justice an ...
B1260 [CQOI2007]涂色paint 区间dp
这个题和我一开始想的区别不是很大,但是要我独自做出来还是有一些难度. 每一次涂色只有这两种可能: 1) 把一段未被覆盖过的区间涂成 * 色 2) 把一段被一种颜色覆盖的区间涂成 * 色 (并且 ...

嵌入式 ThriftServer in Spark

嵌入式 ThriftServer in Spark的更多相关文章

随机推荐

热门专题