Spark记录-本地Spark读取Hive数据简单例子
注意:将mysql的驱动包拷贝到spark/lib下,将hive-site.xml拷贝到项目resources下,远程调试不要使用主机名 import org.apache.spark._
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.hive.HiveContext
import java.io.FileNotFoundException
import java.io.IOException object HiveSelect {
def main(args: Array[String]) {
System.setProperty("hadoop.home.dir", "D:\\hadoop") //加载hadoop组件
val conf = new SparkConf().setAppName("HiveApp").setMaster("spark://192.168.66.66:7077")
.set("spark.executor.memory", "1g")
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.setJars(Seq("D:\\workspace\\scala\\out\\scala.jar"))//加载远程spark
//.set("hive.metastore.uris", "thrift://192.168.66.66:9083")//远程hive的meterstore地址
// .set("spark.driver.extraClassPath","D:\\json\\mysql-connector-java-5.1.39.jar")
val sparkcontext = new SparkContext(conf);
try {
val hiveContext = new HiveContext(sparkcontext);
hiveContext.sql("use siat"); //使用数据库
hiveContext.sql("DROP TABLE IF EXISTS src") //删除表
hiveContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) " +
"ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ");//创建表
hiveContext.sql("LOAD DATA LOCAL INPATH 'D:\\workspace\\scala\\src.txt' INTO TABLE src "); //导入数据
hiveContext.sql(" SELECT * FROM src").collect().foreach(println);//查询数据
}
catch {
case e: FileNotFoundException => println("Missing file exception")
case ex: IOException => println("IO Exception")
case ee: ArithmeticException => println(ee)
case eee: Throwable => println("found a unknown exception" + eee)
case ef: NumberFormatException => println(ef)
case ec: Exception => println(ec)
case e: IllegalArgumentException => println("illegal arg. exception");
case e: IllegalStateException => println("illegal state exception");
}
finally {
sparkcontext.stop()
}
}
}
附录1:scala-spark api-http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.package
org.apache.spark org.apache.spark.api.java org.apache.spark.api.java.function org.apache.spark.broadcast org.apache.spark.graphx org.apache.spark.graphx.impl org.apache.spark.graphx.lib org.apache.spark.graphx.util org.apache.spark.input org.apache.spark.internal org.apache.spark.internal.io org.apache.spark.io org.apache.spark.launcher org.apache.spark.mapred org.apache.spark.metrics.source org.apache.spark.ml org.apache.spark.ml.attribute org.apache.spark.ml.classification org.apache.spark.ml.clustering org.apache.spark.ml.evaluation org.apache.spark.ml.feature org.apache.spark.ml.fpm org.apache.spark.ml.linalg org.apache.spark.ml.param org.apache.spark.ml.recommendation org.apache.spark.ml.regression org.apache.spark.ml.source.libsvm org.apache.spark.ml.stat org.apache.spark.ml.stat.distribution org.apache.spark.ml.tree org.apache.spark.ml.tuning org.apache.spark.ml.util org.apache.spark.mllib org.apache.spark.mllib.classification org.apache.spark.mllib.clustering org.apache.spark.mllib.evaluation org.apache.spark.mllib.feature org.apache.spark.mllib.fpm org.apache.spark.mllib.linalg org.apache.spark.mllib.linalg.distributed org.apache.spark.mllib.optimization org.apache.spark.mllib.pmml org.apache.spark.mllib.random org.apache.spark.mllib.rdd org.apache.spark.mllib.recommendation org.apache.spark.mllib.regression org.apache.spark.mllib.stat org.apache.spark.mllib.stat.distribution org.apache.spark.mllib.stat.test org.apache.spark.mllib.tree org.apache.spark.mllib.tree.configuration org.apache.spark.mllib.tree.impurity org.apache.spark.mllib.tree.loss org.apache.spark.mllib.tree.model org.apache.spark.mllib.util org.apache.spark.partial org.apache.spark.rdd org.apache.spark.scheduler org.apache.spark.scheduler.cluster org.apache.spark.security org.apache.spark.serializer org.apache.spark.sql org.apache.spark.sql.api.java org.apache.spark.sql.catalog org.apache.spark.sql.expressions org.apache.spark.sql.expressions.javalang org.apache.spark.sql.expressions.scalalang org.apache.spark.sql.hive org.apache.spark.sql.hive.execution org.apache.spark.sql.hive.orc org.apache.spark.sql.jdbc org.apache.spark.sql.sources org.apache.spark.sql.streaming org.apache.spark.sql.types org.apache.spark.sql.util org.apache.spark.status.api.v1 org.apache.spark.status.api.v1.streaming org.apache.spark.storage org.apache.spark.streaming org.apache.spark.streaming.api.java org.apache.spark.streaming.dstream org.apache.spark.streaming.flume org.apache.spark.streaming.kafka org.apache.spark.streaming.kinesis org.apache.spark.streaming.receiver org.apache.spark.streaming.scheduler org.apache.spark.streaming.scheduler.rate org.apache.spark.streaming.util org.apache.spark.ui.env org.apache.spark.ui.exec org.apache.spark.ui.jobs org.apache.spark.ui.storage org.apache.spark.util org.apache.spark.util.random org.apache.spark.util.sketch
Spark记录-本地Spark读取Hive数据简单例子的更多相关文章
- R语言读取Hive数据表
R通过RJDBC包连接Hive 目前Hive集群是可以通过跳板机来访问 HiveServer, 将Hive 中的批量数据读入R环境,并进行后续的模型和算法运算. 1. 登录跳板机后需要首先在Linux ...
- javascript读取xml文件读取节点数据的例子
分享下用javascript读取xml文件读取节点数据方法. 读取的节点数据,还有一种情况是读取节点属性数据. <head> <title></title> < ...
- Spark记录-Spark-Shell客户端操作读取Hive数据
1.拷贝hive-site.xml到spark/conf下,拷贝mysql-connector-java-xxx-bin.jar到hive/lib下 2.开启hive元数据服务:hive --ser ...
- Spark SQL读取hive数据时报找不到mysql驱动
Exception: Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BoneC ...
- Spark从HDFS上读取JSON数据
代码如下: import org.apache.spark.sql.Row; import org.apache.spark.SparkConf; import org.apache.spark.ap ...
- Spark记录-阿里巴巴开源工具DataX数据同步工具使用
1.官网下载 下载地址:https://github.com/alibaba/DataX DataX 是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括 MySQL.Oracle.SqlSe ...
- python 读取hive数据
话不多说,直接上代码 from pyhive import hivedef pyhive(hql): conn = hive.Connection(host='HiveServer2 host', p ...
- ListBox和ComboBox绑定数据简单例子
1. 将集合数据绑定到ListBox和ComboBox控件,界面上显示某个属性的内容 //自定义了Person类(有Name,Age,Heigth等属性) List<Person> per ...
- Spark读取elasticsearch数据指南
最近要在 Spark job 中通过 Spark SQL 的方式读取 Elasticsearch 数据,踩了一些坑,总结于此. 环境说明 Spark job 的编写语言为 Scala,scala-li ...
随机推荐
- 利用 jrebel 热部署\远程调试\远程热部署 springboot项目 服务器上的代码
首先要在eclipse 中启用 启用以后在 resource 中生成了 rebel-remote.xml 然后build,把生成的jar包放到服务器上. 然后用下面的命令启动 java -agentp ...
- CSS 天坑 I - 字体单位
首先,本文所讨论的“坑”是在做回应式网页设计( Responsive Web Design 以下简称 RWD)时显现的,如果你还只是在做传统的Web设计这算不上是一个坑,因为传统的Web页面是死的,不 ...
- 浅谈android Service和BroadCastReceiver
1.题记 Android中的服务和windows中的服务是类似的东西,服务一般没有用户操作界面,它运行于系统中不容易被用户发觉,可以使用它开发如监控之类的程序. 广播接收者(BroadcastRece ...
- Linux Socket 编程简介
在 TCP/IP 协议中,"IP地址 + TCP或UDP端口号" 可以唯一标识网络通讯中的一个进程,"IP地址+端口号" 就称为 socket.本文以一个简单的 ...
- GitHub 新手教程 六,Git GUI 新手教程(3),从GitHub远端同步代码库
从GitHub把代码库下载到本地: 1,打开 GitGUI,单击我们之前克隆好的本地库: 2,按图片所示点击,同步远端代码: 3,出现如下提示后,点击“Close”: 4,上面只是把代码下载下来,还没 ...
- nodejs 监控代码变动实现ftp上传
被动模式下 //https://www.npmjs.com/package/watch //文件同步功能 var watch = require('watch'); var path = requir ...
- 基于tensorflow使用全连接层函数实现多层神经网络并保存和读取模型
使用之前那个格式写法到后面层数多的话会很乱,所以编写了一个函数创建层,这样看起来可读性高点也更方便整理后期修改维护 #全连接层函数 def fcn_layer( inputs, #输入数据 input ...
- Python进阶量化交易场外篇3——最大回撤评价策略风险
新年伊始,很荣幸笔者的<教你用 Python 进阶量化交易>专栏在慕课专栏板块上线了,欢迎大家订阅!为了能够提供给大家更轻松的学习过程,笔者在专栏内容之外会陆续推出一些手记来辅助同学们学习 ...
- python中eval方法的使用
eval函数就是实现list.dict.tuple与str之间的转化str函数把list,dict,tuple转为为字符串# 字符串转换成列表a = "[[1,2], [3,4], [5,6 ...
- Estimation And Gain
Estimation: Almost every is spent on ergod the text and build the dictionary. Gains: I have never us ...