IDEA开发spark本地运行
1.建立spakTesk项目,建立scala对象Test

2.Tesk对象的代码如下
package sparkTest /**
* Created by jiahong on 15-8-2.
*/
import org.apache.spark.{SparkConf,SparkContext}
object Test {
def main(args: Array[String]) {
if (args.length < ) {
System.err.println("Usage: <file>")
System.exit()
} val conf=new SparkConf().setAppName("Test").setMaster("local")
val sc=new SparkContext(conf) val rdd=sc.textFile("/home/jiahong/sparkWorkSpace/input") //统计单词个数,然后按个数从高到低排序
val result=rdd.flatMap(_.split(" ")).map((_,)).reduceByKey(_+_).map(x=>(x._2,x._1)).sortByKey(false).map(x=>(x._2,x._1)) result.saveAsTextFile("/home/jiahong/sparkWorkSpace/output") print(result) } }
本地测试hive的话,代码如下:
package sparkTest.sparkSql import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.{SparkContext, SparkConf} object HiveSqlTest {
def main(args: Array[String]) { val conf = new SparkConf().setAppName("HiveLink").setMaster("spark://JIAs-Mac.local:7077").
setJars(Array("/Users/JIA/Desktop/jar/hiveTest/sparkTest.jar")) val sc = new SparkContext(conf)
val sqlContext = new HiveContext(sc) val sql = "select * from Test limit 100" sqlContext.sql(sql).map(s => s(0) + "," + s(1) + "," + s(2) + "," + s(3)+","+s(4)).collect().foreach(println) }
}
注意:需要把hive-site.xml放到项目目录下,新建Resources设置为Resources root

3.设置本地运行,在IDEA的右上角-点开Edit Configurations

4.设置本地运行,在Vm options:上填写:-Dspark.master=local ,Program arguments上填写:local

5.点击run运行,run前先开启本机的spark
/usr/lib/jdk/jdk1..0_79/bin/java -Dspark.master=local -Didea.launcher.port= -Didea.launcher.bin.path=/home/jiahong/idea-IC-141.1532./bin -Dfile.encoding=UTF- -classpath /usr/lib/jdk/jdk1..0_79/jre/lib/resources.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/jfxrt.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/charsets.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/jsse.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/rt.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/plugin.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/deploy.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/jfr.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/javaws.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/management-agent.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/jce.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/ext/zipfs.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/ext/dnsns.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/ext/sunec.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/ext/sunjce_provider.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/ext/sunpkcs11.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/ext/localedata.jar:/home/jiahong/IdeaProjects/sparkTest/out/production/sparkTest:/home/jiahong/apache/spark-1.3.-bin-hadoop2./lib/spark-assembly-1.3.-hadoop2.6.0.jar:/home/jiahong/apache/scala-2.10./lib/scala-actors-migration.jar:/home/jiahong/apache/scala-2.10./lib/scala-reflect.jar:/home/jiahong/apache/scala-2.10./lib/scala-actors.jar:/home/jiahong/apache/scala-2.10./lib/scala-swing.jar:/home/jiahong/apache/scala-2.10./lib/scala-library.jar:/home/jiahong/idea-IC-141.1532./lib/idea_rt.jar com.intellij.rt.execution.application.AppMain sparkTest.Test local
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
// :: INFO SparkContext: Running Spark version 1.3.
// :: WARN Utils: Your hostname, jiahong-OptiPlex- resolves to a loopback address: 127.0.1.1; using 192.168.199.187 instead (on interface eth0)
// :: WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
// :: WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
// :: INFO SecurityManager: Changing view acls to: jiahong
// :: INFO SecurityManager: Changing modify acls to: jiahong
// :: INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jiahong); users with modify permissions: Set(jiahong)
// :: INFO Slf4jLogger: Slf4jLogger started
// :: INFO Remoting: Starting remoting
// :: INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@jiahong-OptiPlex-7010.lan:37917]
// :: INFO Utils: Successfully started service 'sparkDriver' on port .
// :: INFO SparkEnv: Registering MapOutputTracker
// :: INFO SparkEnv: Registering BlockManagerMaster
// :: INFO DiskBlockManager: Created local directory at /tmp/spark-a2cbde0d--4a95-80df-a99a14127efc/blockmgr-3cbdae80-810a-4ecf-b012-0979b3d714d0
// :: INFO MemoryStore: MemoryStore started with capacity 469.5 MB
// :: INFO HttpFileServer: HTTP File server directory is /tmp/spark--df98-4e7e-afa1-4dd36b655012/httpd-28cb8de9-caa4---347cea890b07
// :: INFO HttpServer: Starting HTTP Server
// :: INFO Server: jetty-.y.z-SNAPSHOT
// :: INFO AbstractConnector: Started SocketConnector@0.0.0.0:
// :: INFO Utils: Successfully started service 'HTTP file server' on port .
// :: INFO SparkEnv: Registering OutputCommitCoordinator
// :: INFO Server: jetty-.y.z-SNAPSHOT
// :: INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:
// :: INFO Utils: Successfully started service 'SparkUI' on port .
// :: INFO SparkUI: Started SparkUI at http://jiahong-OptiPlex-7010.lan:4040
// :: INFO Executor: Starting executor ID <driver> on host localhost
// :: INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@jiahong-OptiPlex-7010.lan:37917/user/HeartbeatReceiver
// :: INFO NettyBlockTransferService: Server created on
// :: INFO BlockManagerMaster: Trying to register BlockManager
// :: INFO BlockManagerMasterActor: Registering block manager localhost: with 469.5 MB RAM, BlockManagerId(<driver>, localhost, )
// :: INFO BlockManagerMaster: Registered BlockManager
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 178.6 KB, free 469.4 MB)
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 24.8 KB, free 469.3 MB)
// :: INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost: (size: 24.8 KB, free: 469.5 MB)
// :: INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
// :: INFO SparkContext: Created broadcast from textFile at Test.scala:
// :: INFO FileInputFormat: Total input paths to process :
MapPartitionsRDD[] at map at Test.scala:
// :: INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
// :: INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
// :: INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
// :: INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
// :: INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
// :: INFO SparkContext: Starting job: saveAsTextFile at Test.scala:
// :: INFO DAGScheduler: Registering RDD (map at Test.scala:)
// :: INFO DAGScheduler: Registering RDD (map at Test.scala:)
// :: INFO DAGScheduler: Got job (saveAsTextFile at Test.scala:) with output partitions (allowLocal=false)
// :: INFO DAGScheduler: Final stage: Stage (saveAsTextFile at Test.scala:)
// :: INFO DAGScheduler: Parents of final stage: List(Stage )
// :: INFO DAGScheduler: Missing parents: List(Stage )
// :: INFO DAGScheduler: Submitting Stage (MapPartitionsRDD[] at map at Test.scala:), which has no missing parents
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.6 KB, free 469.3 MB)
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.6 KB, free 469.3 MB)
// :: INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost: (size: 2.6 KB, free: 469.5 MB)
// :: INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
// :: INFO SparkContext: Created broadcast from broadcast at DAGScheduler.scala:
// :: INFO DAGScheduler: Submitting missing tasks from Stage (MapPartitionsRDD[] at map at Test.scala:)
// :: INFO TaskSchedulerImpl: Adding task set 0.0 with tasks
// :: INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID , localhost, PROCESS_LOCAL, bytes)
// :: INFO Executor: Running task 0.0 in stage 0.0 (TID )
// :: INFO HadoopRDD: Input split: file:/home/jiahong/sparkWorkSpace/input/test.txt:+
// :: INFO Executor: Finished task 0.0 in stage 0.0 (TID ). bytes result sent to driver
// :: INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID ) in ms on localhost (/)
// :: INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
// :: INFO DAGScheduler: Stage (map at Test.scala:) finished in 0.092 s
// :: INFO DAGScheduler: looking for newly runnable stages
// :: INFO DAGScheduler: running: Set()
// :: INFO DAGScheduler: waiting: Set(Stage , Stage )
// :: INFO DAGScheduler: failed: Set()
// :: INFO DAGScheduler: Missing parents for Stage : List()
// :: INFO DAGScheduler: Missing parents for Stage : List(Stage )
// :: INFO DAGScheduler: Submitting Stage (MapPartitionsRDD[] at map at Test.scala:), which is now runnable
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.0 KB, free 469.3 MB)
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.1 KB, free 469.3 MB)
// :: INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost: (size: 2.1 KB, free: 469.5 MB)
// :: INFO BlockManagerMaster: Updated info of block broadcast_2_piece0
// :: INFO SparkContext: Created broadcast from broadcast at DAGScheduler.scala:
// :: INFO DAGScheduler: Submitting missing tasks from Stage (MapPartitionsRDD[] at map at Test.scala:)
// :: INFO TaskSchedulerImpl: Adding task set 1.0 with tasks
// :: INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID , localhost, PROCESS_LOCAL, bytes)
// :: INFO Executor: Running task 0.0 in stage 1.0 (TID )
// :: INFO ShuffleBlockFetcherIterator: Getting non-empty blocks out of blocks
// :: INFO ShuffleBlockFetcherIterator: Started remote fetches in ms
// :: INFO Executor: Finished task 0.0 in stage 1.0 (TID ). bytes result sent to driver
// :: INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID ) in ms on localhost (/)
// :: INFO DAGScheduler: Stage (map at Test.scala:) finished in 0.077 s
// :: INFO DAGScheduler: looking for newly runnable stages
// :: INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
// :: INFO DAGScheduler: running: Set()
// :: INFO DAGScheduler: waiting: Set(Stage )
// :: INFO DAGScheduler: failed: Set()
// :: INFO DAGScheduler: Missing parents for Stage : List()
// :: INFO DAGScheduler: Submitting Stage (MapPartitionsRDD[] at saveAsTextFile at Test.scala:), which is now runnable
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 124.7 KB, free 469.2 MB)
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 74.9 KB, free 469.1 MB)
// :: INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on localhost: (size: 74.9 KB, free: 469.4 MB)
// :: INFO BlockManagerMaster: Updated info of block broadcast_3_piece0
// :: INFO SparkContext: Created broadcast from broadcast at DAGScheduler.scala:
// :: INFO DAGScheduler: Submitting missing tasks from Stage (MapPartitionsRDD[] at saveAsTextFile at Test.scala:)
// :: INFO TaskSchedulerImpl: Adding task set 2.0 with tasks
// :: INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID , localhost, PROCESS_LOCAL, bytes)
// :: INFO Executor: Running task 0.0 in stage 2.0 (TID )
// :: INFO ShuffleBlockFetcherIterator: Getting non-empty blocks out of blocks
// :: INFO ShuffleBlockFetcherIterator: Started remote fetches in ms
// :: INFO FileOutputCommitter: Saved output of task 'attempt_201508021058_0002_m_000000_2' to file:/home/jiahong/sparkWorkSpace/output/_temporary//task_201508021058_0002_m_000000
// :: INFO SparkHadoopMapRedUtil: attempt_201508021058_0002_m_000000_2: Committed
// :: INFO Executor: Finished task 0.0 in stage 2.0 (TID ). bytes result sent to driver
// :: INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID ) in ms on localhost (/)
// :: INFO DAGScheduler: Stage (saveAsTextFile at Test.scala:) finished in 0.138 s
// :: INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
// :: INFO DAGScheduler: Job finished: saveAsTextFile at Test.scala:, took 0.483353 s
MapPartitionsRDD[] at map at Test.scala:
Process finished with exit code
6.结果如下:
input目录下有个test.txt文件,内容如下

运行之后,output目录下文件如下:

注意:
一开始运行时,可能会碰到如下问题
Exception in thread "main" java.lang.NoSuchMethodError:
解决办法是,在启动你的spark时,观看scala的版本是多少,然后你在本机安装对应的版本,最后在IDEA上修改过来。我之前本机安装的是2.11.7版本,导致错误,最后查看spark的scala版本为2.10.4,我重新安装了,然后再在IDEA上修改过来,就可以正确运行了!
IDEA开发spark本地运行的更多相关文章
- Spark本地运行成功,集群运行空指针异。
一个很久之前写的Spark作业,当时运行在local模式下.最近又开始处理这方面数据了,就打包提交集群,结果频频空指针.最开始以为是程序中有null调用了,经过排除发现是继承App导致集群运行时候无法 ...
- spark之scala程序开发(本地运行模式):单词出现次数统计
准备工作: 将运行Scala-Eclipse的机器节点(CloudDeskTop)内存调整至4G,因为需要在该节点上跑本地(local)Spark程序,本地Spark程序会启动Worker进程耗用大量 ...
- spark之scala程序开发(集群运行模式):单词出现次数统计
准备工作: 将运行Scala-Eclipse的机器节点(CloudDeskTop)内存调整至4G,因为需要在该节点上跑本地(local)Spark程序,本地Spark程序会启动Worker进程耗用大量 ...
- windows下Idea结合maven开发spark和本地调试
本人的开发环境: 1.虚拟机centos 6.5 2.jdk 1.8 3.spark2.2.0 4.scala 2.11.8 5.maven 3.5.2 在开发和搭环境时必须注意版本兼容的问题 ...
- 开发函数计算的正确姿势 —— 使用 Fun Local 本地运行与调试
前言 首先介绍下在本文出现的几个比较重要的概念: 函数计算(Function Compute): 函数计算是一个事件驱动的服务,通过函数计算,用户无需管理服务器等运行情况,只需编写代码并上传.函数计算 ...
- Spark程序本地运行
Spark程序本地运行 本次安装是在JDK安装完成的基础上进行的! SPARK版本和hadoop版本必须对应!!! spark是基于hadoop运算的,两者有依赖关系,见下图: 前言: 1.环境 ...
- spark本地环境的搭建到运行第一个spark程序
搭建spark本地环境 搭建Java环境 (1)到官网下载JDK 官网链接:https://www.oracle.com/technetwork/java/javase/downloads/jdk8- ...
- spark window本地运行wordcount错误
在运行本地运行spark或者hadoop代码时可能会遇到一下三种问题 1.Exception in thread "main" java.lang.UnsatisfiedLin ...
- 使用scala开发spark入门总结
使用scala开发spark入门总结 一.spark简单介绍 关于spark的介绍网上有很多,可以自行百度和google,这里只做简单介绍.推荐简单介绍连接:http://blog.jobbole.c ...
随机推荐
- bootstrap-table-master
http://bootstrap-table.wenzhixin.net.cn/getting-started/ 1.安装bower 2. 3.编译css and js 以上就编译完了boostrap ...
- Azure SQL 数据库与新的数据库吞吐量单位DTU
azure中新的数据库吞吐量单位 (Database Throughput Unit, DTU) 是什么,以及用户如何通过它来了解新服务级别可以提供的服务内容.DTU 对于提供预测性更强的性能体验起着 ...
- Go时间戳和日期字符串的相互转换
Go语言中,获取时间戳用time.Now().Unix(),格式化时间用t.Format,解析时间用time.Parse. 看实例代码: package main import ( "fmt ...
- EF之高级查询
首先我们来看看一个页面 这里面有多选的条件,大于,小于等等,包括每个字段都有 如此多的查询条件,我们的后台该如何实现呢? 难道我们还得每个参数都去判断吗? 那得传多少参数进来才能实现这个页面的功能啊! ...
- Java实现TCP之Echo客户端和服务端
Java实现TCP之Echo客户端和服务端 代码内容 采用TCP协议编写服务器端代码(端口任意) 编写客户机的代码访问该端口 客户机按行输入 服务器将收到的字符流和接收到的时间输出在服务器consol ...
- 动态链接库知识点总结之三(如何以显示的方式加载DLL)
总结一下如何显示加载方式加载DLL, 首先,我们新建一个win32项目,选择dll,空项目,再添加一个源文件,一个模块定义文件(.def),具体如下图.(详细方法已经在前两篇文章中讲述,如有不懂,打开 ...
- 【转】MATLAB在一幅图上添加多个纵坐标轴
来源:http://wenku.baidu.com/link?url=m_eEkzbjnT9ccgAnlPVDqHCVyrZOD2EplXxxIiQc69DI0lHAWzwpZXfdDy_7DPbwI ...
- mysql 5.7 64位 解压版安装
64位操作系统最好安装64位的mysql数据库,充分利用内存的寻址能力,对于windows而言,mysql官网只提供了32位的MSI安装程序,因为在windows下安装64位的mysql,选择解压版安 ...
- 向Array中添加选择排序
选择排序思路 在无序区中选出最小的元素,然后将它和有序区的第一个元素交换位置. 选择排序实现 Function.prototype.method = function(name, func){ thi ...
- linux设备驱动第五篇:驱动中的并发与竟态
综述 在上一篇介绍了linux驱动的调试方法,这一篇介绍一下在驱动编程中会遇到的并发和竟态以及如何处理并发和竞争. 首先什么是并发与竟态呢?并发(concurrency)指的是多个执行单元同时.并行被 ...