Spark记录-本地Spark读取Hive数据简单例子
注意:将mysql的驱动包拷贝到spark/lib下,将hive-site.xml拷贝到项目resources下,远程调试不要使用主机名 import org.apache.spark._
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.hive.HiveContext
import java.io.FileNotFoundException
import java.io.IOException object HiveSelect {
def main(args: Array[String]) {
System.setProperty("hadoop.home.dir", "D:\\hadoop") //加载hadoop组件
val conf = new SparkConf().setAppName("HiveApp").setMaster("spark://192.168.66.66:7077")
.set("spark.executor.memory", "1g")
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.setJars(Seq("D:\\workspace\\scala\\out\\scala.jar"))//加载远程spark
//.set("hive.metastore.uris", "thrift://192.168.66.66:9083")//远程hive的meterstore地址
// .set("spark.driver.extraClassPath","D:\\json\\mysql-connector-java-5.1.39.jar")
val sparkcontext = new SparkContext(conf);
try {
val hiveContext = new HiveContext(sparkcontext);
hiveContext.sql("use siat"); //使用数据库
hiveContext.sql("DROP TABLE IF EXISTS src") //删除表
hiveContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) " +
"ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ");//创建表
hiveContext.sql("LOAD DATA LOCAL INPATH 'D:\\workspace\\scala\\src.txt' INTO TABLE src "); //导入数据
hiveContext.sql(" SELECT * FROM src").collect().foreach(println);//查询数据
}
catch {
case e: FileNotFoundException => println("Missing file exception")
case ex: IOException => println("IO Exception")
case ee: ArithmeticException => println(ee)
case eee: Throwable => println("found a unknown exception" + eee)
case ef: NumberFormatException => println(ef)
case ec: Exception => println(ec)
case e: IllegalArgumentException => println("illegal arg. exception");
case e: IllegalStateException => println("illegal state exception");
}
finally {
sparkcontext.stop()
}
}
}
附录1:scala-spark api-http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.package
org.apache.spark org.apache.spark.api.java org.apache.spark.api.java.function org.apache.spark.broadcast org.apache.spark.graphx org.apache.spark.graphx.impl org.apache.spark.graphx.lib org.apache.spark.graphx.util org.apache.spark.input org.apache.spark.internal org.apache.spark.internal.io org.apache.spark.io org.apache.spark.launcher org.apache.spark.mapred org.apache.spark.metrics.source org.apache.spark.ml org.apache.spark.ml.attribute org.apache.spark.ml.classification org.apache.spark.ml.clustering org.apache.spark.ml.evaluation org.apache.spark.ml.feature org.apache.spark.ml.fpm org.apache.spark.ml.linalg org.apache.spark.ml.param org.apache.spark.ml.recommendation org.apache.spark.ml.regression org.apache.spark.ml.source.libsvm org.apache.spark.ml.stat org.apache.spark.ml.stat.distribution org.apache.spark.ml.tree org.apache.spark.ml.tuning org.apache.spark.ml.util org.apache.spark.mllib org.apache.spark.mllib.classification org.apache.spark.mllib.clustering org.apache.spark.mllib.evaluation org.apache.spark.mllib.feature org.apache.spark.mllib.fpm org.apache.spark.mllib.linalg org.apache.spark.mllib.linalg.distributed org.apache.spark.mllib.optimization org.apache.spark.mllib.pmml org.apache.spark.mllib.random org.apache.spark.mllib.rdd org.apache.spark.mllib.recommendation org.apache.spark.mllib.regression org.apache.spark.mllib.stat org.apache.spark.mllib.stat.distribution org.apache.spark.mllib.stat.test org.apache.spark.mllib.tree org.apache.spark.mllib.tree.configuration org.apache.spark.mllib.tree.impurity org.apache.spark.mllib.tree.loss org.apache.spark.mllib.tree.model org.apache.spark.mllib.util org.apache.spark.partial org.apache.spark.rdd org.apache.spark.scheduler org.apache.spark.scheduler.cluster org.apache.spark.security org.apache.spark.serializer org.apache.spark.sql org.apache.spark.sql.api.java org.apache.spark.sql.catalog org.apache.spark.sql.expressions org.apache.spark.sql.expressions.javalang org.apache.spark.sql.expressions.scalalang org.apache.spark.sql.hive org.apache.spark.sql.hive.execution org.apache.spark.sql.hive.orc org.apache.spark.sql.jdbc org.apache.spark.sql.sources org.apache.spark.sql.streaming org.apache.spark.sql.types org.apache.spark.sql.util org.apache.spark.status.api.v1 org.apache.spark.status.api.v1.streaming org.apache.spark.storage org.apache.spark.streaming org.apache.spark.streaming.api.java org.apache.spark.streaming.dstream org.apache.spark.streaming.flume org.apache.spark.streaming.kafka org.apache.spark.streaming.kinesis org.apache.spark.streaming.receiver org.apache.spark.streaming.scheduler org.apache.spark.streaming.scheduler.rate org.apache.spark.streaming.util org.apache.spark.ui.env org.apache.spark.ui.exec org.apache.spark.ui.jobs org.apache.spark.ui.storage org.apache.spark.util org.apache.spark.util.random org.apache.spark.util.sketch
Spark记录-本地Spark读取Hive数据简单例子的更多相关文章
- R语言读取Hive数据表
R通过RJDBC包连接Hive 目前Hive集群是可以通过跳板机来访问 HiveServer, 将Hive 中的批量数据读入R环境,并进行后续的模型和算法运算. 1. 登录跳板机后需要首先在Linux ...
- javascript读取xml文件读取节点数据的例子
分享下用javascript读取xml文件读取节点数据方法. 读取的节点数据,还有一种情况是读取节点属性数据. <head> <title></title> < ...
- Spark记录-Spark-Shell客户端操作读取Hive数据
1.拷贝hive-site.xml到spark/conf下,拷贝mysql-connector-java-xxx-bin.jar到hive/lib下 2.开启hive元数据服务:hive --ser ...
- Spark SQL读取hive数据时报找不到mysql驱动
Exception: Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BoneC ...
- Spark从HDFS上读取JSON数据
代码如下: import org.apache.spark.sql.Row; import org.apache.spark.SparkConf; import org.apache.spark.ap ...
- Spark记录-阿里巴巴开源工具DataX数据同步工具使用
1.官网下载 下载地址:https://github.com/alibaba/DataX DataX 是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括 MySQL.Oracle.SqlSe ...
- python 读取hive数据
话不多说,直接上代码 from pyhive import hivedef pyhive(hql): conn = hive.Connection(host='HiveServer2 host', p ...
- ListBox和ComboBox绑定数据简单例子
1. 将集合数据绑定到ListBox和ComboBox控件,界面上显示某个属性的内容 //自定义了Person类(有Name,Age,Heigth等属性) List<Person> per ...
- Spark读取elasticsearch数据指南
最近要在 Spark job 中通过 Spark SQL 的方式读取 Elasticsearch 数据,踩了一些坑,总结于此. 环境说明 Spark job 的编写语言为 Scala,scala-li ...
随机推荐
- HTML 表格实例
1.表格这个例子演示如何在 HTML 文档中创建表格. <p>每个表格由 table 标签开始.</p><p>每个表格行由 tr 标签开始.</p>&l ...
- WebService技术,服务端发布到Tomcat(使用Servlet发布),客户端使用axis2实现(二)
还是在WebService技术,服务端and客户端JDK-wsimport工具(一)的基础上实现.新建一个包:com.aixs2client.目录结构如下: 一.服务端: 1.还是使用com.webs ...
- Kaggle: Google Analytics Customer Revenue Prediction EDA
前言 内容提要 本文为Kaggle竞赛 Google Analytics Customer Revenue Prediction 的探索性分析 题目要求根据历史顾客访问GStore的数据,预测其中部分 ...
- 英特尔和 Valve* 将英特尔® Embree 光线追踪技术添加至全新 Steam* Audio 插件
本文从英特尔® Embree 光线追踪技术着手,深入探讨英特尔与 Valve 合作带来的优势:一方面,开发人员使用英特尔高度优化的库创建场景,可以显著加快编译速度:另一方面,逼真的声效可以增强游戏性, ...
- Linux_01
要安装centos系统,就必须得有centos系统软件安装程序,可以通过浏览器访问centos官网http://www.centos.org,然后找到Downloads - > mirror ...
- js中判断是否包含某个字符串
1,字符串中是否包含 str.indexOf("3")indexOf() 方法可返回某个指定的字符串值在字符串中首次出现的位置.如果要检索的字符串值没有出现,则该方法返回 -1. ...
- NS2安装过程中环境变量设置的问题(ns-2.35)
nam: Can't find a usable tk.tcl in the following directories: */ns-allinone-2.35/tcl8.5.10/library/t ...
- # linux读书笔记(3章)
linux读书笔记(3章) 标签(空格分隔): 20135328陈都 第三章 进程管理 3.1 进程 进程就是处于执行期的程序(目标码存放在某种存储介质上).但进程并不仅仅局限于一段可执行程序代码( ...
- YQCB冲刺第二周绩效评价
标准队员 工作质量 20% 工作态度 20% 工作量 30% 工作难易程度 20% 团队意识 10% 总分 陈美琪 17 18 24 17 9 85 张晨阳 19 19 27 19 9 93 刘昭为 ...
- ElasticSearch 2 (25) - 语言处理系列之同义词
ElasticSearch 2 (25) - 语言处理系列之同义词 摘要 词干提取有助于通过简化屈折词到它们词根的形式来扩展搜索的范围,而同义词是通过关联概念和想法来扩展搜索范围的.或许没有文档能与查 ...