1.建立spakTesk项目,建立scala对象Test

2.Tesk对象的代码如下

package sparkTest

/**
* Created by jiahong on 15-8-2.
*/
import org.apache.spark.{SparkConf,SparkContext}
object Test {
def main(args: Array[String]) {
if (args.length < ) {
System.err.println("Usage: <file>")
System.exit()
} val conf=new SparkConf().setAppName("Test").setMaster("local")
val sc=new SparkContext(conf) val rdd=sc.textFile("/home/jiahong/sparkWorkSpace/input") //统计单词个数,然后按个数从高到低排序
val result=rdd.flatMap(_.split(" ")).map((_,)).reduceByKey(_+_).map(x=>(x._2,x._1)).sortByKey(false).map(x=>(x._2,x._1)) result.saveAsTextFile("/home/jiahong/sparkWorkSpace/output") print(result) } }

本地测试hive的话,代码如下

package sparkTest.sparkSql

import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.{SparkContext, SparkConf} object HiveSqlTest {
def main(args: Array[String]) { val conf = new SparkConf().setAppName("HiveLink").setMaster("spark://JIAs-Mac.local:7077").
setJars(Array("/Users/JIA/Desktop/jar/hiveTest/sparkTest.jar")) val sc = new SparkContext(conf)
val sqlContext = new HiveContext(sc) val sql = "select * from Test limit 100" sqlContext.sql(sql).map(s => s(0) + "," + s(1) + "," + s(2) + "," + s(3)+","+s(4)).collect().foreach(println) }
}

注意:需要把hive-site.xml放到项目目录下,新建Resources设置为Resources root

3.设置本地运行,在IDEA的右上角-点开Edit Configurations

4.设置本地运行,在Vm options:上填写:-Dspark.master=local ,Program arguments上填写:local

5.点击run运行,run前先开启本机的spark

/usr/lib/jdk/jdk1..0_79/bin/java -Dspark.master=local -Didea.launcher.port= -Didea.launcher.bin.path=/home/jiahong/idea-IC-141.1532./bin -Dfile.encoding=UTF- -classpath /usr/lib/jdk/jdk1..0_79/jre/lib/resources.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/jfxrt.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/charsets.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/jsse.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/rt.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/plugin.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/deploy.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/jfr.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/javaws.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/management-agent.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/jce.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/ext/zipfs.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/ext/dnsns.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/ext/sunec.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/ext/sunjce_provider.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/ext/sunpkcs11.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/ext/localedata.jar:/home/jiahong/IdeaProjects/sparkTest/out/production/sparkTest:/home/jiahong/apache/spark-1.3.-bin-hadoop2./lib/spark-assembly-1.3.-hadoop2.6.0.jar:/home/jiahong/apache/scala-2.10./lib/scala-actors-migration.jar:/home/jiahong/apache/scala-2.10./lib/scala-reflect.jar:/home/jiahong/apache/scala-2.10./lib/scala-actors.jar:/home/jiahong/apache/scala-2.10./lib/scala-swing.jar:/home/jiahong/apache/scala-2.10./lib/scala-library.jar:/home/jiahong/idea-IC-141.1532./lib/idea_rt.jar com.intellij.rt.execution.application.AppMain sparkTest.Test local
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
// :: INFO SparkContext: Running Spark version 1.3.
// :: WARN Utils: Your hostname, jiahong-OptiPlex- resolves to a loopback address: 127.0.1.1; using 192.168.199.187 instead (on interface eth0)
// :: WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
// :: WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
// :: INFO SecurityManager: Changing view acls to: jiahong
// :: INFO SecurityManager: Changing modify acls to: jiahong
// :: INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jiahong); users with modify permissions: Set(jiahong)
// :: INFO Slf4jLogger: Slf4jLogger started
// :: INFO Remoting: Starting remoting
// :: INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@jiahong-OptiPlex-7010.lan:37917]
// :: INFO Utils: Successfully started service 'sparkDriver' on port .
// :: INFO SparkEnv: Registering MapOutputTracker
// :: INFO SparkEnv: Registering BlockManagerMaster
// :: INFO DiskBlockManager: Created local directory at /tmp/spark-a2cbde0d--4a95-80df-a99a14127efc/blockmgr-3cbdae80-810a-4ecf-b012-0979b3d714d0
// :: INFO MemoryStore: MemoryStore started with capacity 469.5 MB
// :: INFO HttpFileServer: HTTP File server directory is /tmp/spark--df98-4e7e-afa1-4dd36b655012/httpd-28cb8de9-caa4---347cea890b07
// :: INFO HttpServer: Starting HTTP Server
// :: INFO Server: jetty-.y.z-SNAPSHOT
// :: INFO AbstractConnector: Started SocketConnector@0.0.0.0:
// :: INFO Utils: Successfully started service 'HTTP file server' on port .
// :: INFO SparkEnv: Registering OutputCommitCoordinator
// :: INFO Server: jetty-.y.z-SNAPSHOT
// :: INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:
// :: INFO Utils: Successfully started service 'SparkUI' on port .
// :: INFO SparkUI: Started SparkUI at http://jiahong-OptiPlex-7010.lan:4040
// :: INFO Executor: Starting executor ID <driver> on host localhost
// :: INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@jiahong-OptiPlex-7010.lan:37917/user/HeartbeatReceiver
// :: INFO NettyBlockTransferService: Server created on
// :: INFO BlockManagerMaster: Trying to register BlockManager
// :: INFO BlockManagerMasterActor: Registering block manager localhost: with 469.5 MB RAM, BlockManagerId(<driver>, localhost, )
// :: INFO BlockManagerMaster: Registered BlockManager
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 178.6 KB, free 469.4 MB)
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 24.8 KB, free 469.3 MB)
// :: INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost: (size: 24.8 KB, free: 469.5 MB)
// :: INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
// :: INFO SparkContext: Created broadcast from textFile at Test.scala:
// :: INFO FileInputFormat: Total input paths to process :
MapPartitionsRDD[] at map at Test.scala:
// :: INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
// :: INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
// :: INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
// :: INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
// :: INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
// :: INFO SparkContext: Starting job: saveAsTextFile at Test.scala:
// :: INFO DAGScheduler: Registering RDD (map at Test.scala:)
// :: INFO DAGScheduler: Registering RDD (map at Test.scala:)
// :: INFO DAGScheduler: Got job (saveAsTextFile at Test.scala:) with output partitions (allowLocal=false)
// :: INFO DAGScheduler: Final stage: Stage (saveAsTextFile at Test.scala:)
// :: INFO DAGScheduler: Parents of final stage: List(Stage )
// :: INFO DAGScheduler: Missing parents: List(Stage )
// :: INFO DAGScheduler: Submitting Stage (MapPartitionsRDD[] at map at Test.scala:), which has no missing parents
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.6 KB, free 469.3 MB)
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.6 KB, free 469.3 MB)
// :: INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost: (size: 2.6 KB, free: 469.5 MB)
// :: INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
// :: INFO SparkContext: Created broadcast from broadcast at DAGScheduler.scala:
// :: INFO DAGScheduler: Submitting missing tasks from Stage (MapPartitionsRDD[] at map at Test.scala:)
// :: INFO TaskSchedulerImpl: Adding task set 0.0 with tasks
// :: INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID , localhost, PROCESS_LOCAL, bytes)
// :: INFO Executor: Running task 0.0 in stage 0.0 (TID )
// :: INFO HadoopRDD: Input split: file:/home/jiahong/sparkWorkSpace/input/test.txt:+
// :: INFO Executor: Finished task 0.0 in stage 0.0 (TID ). bytes result sent to driver
// :: INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID ) in ms on localhost (/)
// :: INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
// :: INFO DAGScheduler: Stage (map at Test.scala:) finished in 0.092 s
// :: INFO DAGScheduler: looking for newly runnable stages
// :: INFO DAGScheduler: running: Set()
// :: INFO DAGScheduler: waiting: Set(Stage , Stage )
// :: INFO DAGScheduler: failed: Set()
// :: INFO DAGScheduler: Missing parents for Stage : List()
// :: INFO DAGScheduler: Missing parents for Stage : List(Stage )
// :: INFO DAGScheduler: Submitting Stage (MapPartitionsRDD[] at map at Test.scala:), which is now runnable
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.0 KB, free 469.3 MB)
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.1 KB, free 469.3 MB)
// :: INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost: (size: 2.1 KB, free: 469.5 MB)
// :: INFO BlockManagerMaster: Updated info of block broadcast_2_piece0
// :: INFO SparkContext: Created broadcast from broadcast at DAGScheduler.scala:
// :: INFO DAGScheduler: Submitting missing tasks from Stage (MapPartitionsRDD[] at map at Test.scala:)
// :: INFO TaskSchedulerImpl: Adding task set 1.0 with tasks
// :: INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID , localhost, PROCESS_LOCAL, bytes)
// :: INFO Executor: Running task 0.0 in stage 1.0 (TID )
// :: INFO ShuffleBlockFetcherIterator: Getting non-empty blocks out of blocks
// :: INFO ShuffleBlockFetcherIterator: Started remote fetches in ms
// :: INFO Executor: Finished task 0.0 in stage 1.0 (TID ). bytes result sent to driver
// :: INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID ) in ms on localhost (/)
// :: INFO DAGScheduler: Stage (map at Test.scala:) finished in 0.077 s
// :: INFO DAGScheduler: looking for newly runnable stages
// :: INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
// :: INFO DAGScheduler: running: Set()
// :: INFO DAGScheduler: waiting: Set(Stage )
// :: INFO DAGScheduler: failed: Set()
// :: INFO DAGScheduler: Missing parents for Stage : List()
// :: INFO DAGScheduler: Submitting Stage (MapPartitionsRDD[] at saveAsTextFile at Test.scala:), which is now runnable
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 124.7 KB, free 469.2 MB)
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 74.9 KB, free 469.1 MB)
// :: INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on localhost: (size: 74.9 KB, free: 469.4 MB)
// :: INFO BlockManagerMaster: Updated info of block broadcast_3_piece0
// :: INFO SparkContext: Created broadcast from broadcast at DAGScheduler.scala:
// :: INFO DAGScheduler: Submitting missing tasks from Stage (MapPartitionsRDD[] at saveAsTextFile at Test.scala:)
// :: INFO TaskSchedulerImpl: Adding task set 2.0 with tasks
// :: INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID , localhost, PROCESS_LOCAL, bytes)
// :: INFO Executor: Running task 0.0 in stage 2.0 (TID )
// :: INFO ShuffleBlockFetcherIterator: Getting non-empty blocks out of blocks
// :: INFO ShuffleBlockFetcherIterator: Started remote fetches in ms
// :: INFO FileOutputCommitter: Saved output of task 'attempt_201508021058_0002_m_000000_2' to file:/home/jiahong/sparkWorkSpace/output/_temporary//task_201508021058_0002_m_000000
// :: INFO SparkHadoopMapRedUtil: attempt_201508021058_0002_m_000000_2: Committed
// :: INFO Executor: Finished task 0.0 in stage 2.0 (TID ). bytes result sent to driver
// :: INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID ) in ms on localhost (/)
// :: INFO DAGScheduler: Stage (saveAsTextFile at Test.scala:) finished in 0.138 s
// :: INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
// :: INFO DAGScheduler: Job finished: saveAsTextFile at Test.scala:, took 0.483353 s
MapPartitionsRDD[] at map at Test.scala:
Process finished with exit code

6.结果如下:

input目录下有个test.txt文件,内容如下

运行之后,output目录下文件如下:

  

注意:

一开始运行时,可能会碰到如下问题

Exception in thread "main" java.lang.NoSuchMethodError: 

解决办法是,在启动你的spark时,观看scala的版本是多少,然后你在本机安装对应的版本,最后在IDEA上修改过来。我之前本机安装的是2.11.7版本,导致错误,最后查看spark的scala版本为2.10.4,我重新安装了,然后再在IDEA上修改过来,就可以正确运行了!

IDEA开发spark本地运行的更多相关文章

  1. Spark本地运行成功,集群运行空指针异。

    一个很久之前写的Spark作业,当时运行在local模式下.最近又开始处理这方面数据了,就打包提交集群,结果频频空指针.最开始以为是程序中有null调用了,经过排除发现是继承App导致集群运行时候无法 ...

  2. spark之scala程序开发(本地运行模式):单词出现次数统计

    准备工作: 将运行Scala-Eclipse的机器节点(CloudDeskTop)内存调整至4G,因为需要在该节点上跑本地(local)Spark程序,本地Spark程序会启动Worker进程耗用大量 ...

  3. spark之scala程序开发(集群运行模式):单词出现次数统计

    准备工作: 将运行Scala-Eclipse的机器节点(CloudDeskTop)内存调整至4G,因为需要在该节点上跑本地(local)Spark程序,本地Spark程序会启动Worker进程耗用大量 ...

  4. windows下Idea结合maven开发spark和本地调试

    本人的开发环境: 1.虚拟机centos 6.5 2.jdk 1.8 3.spark2.2.0 4.scala 2.11.8 5.maven 3.5.2     在开发和搭环境时必须注意版本兼容的问题 ...

  5. 开发函数计算的正确姿势 —— 使用 Fun Local 本地运行与调试

    前言 首先介绍下在本文出现的几个比较重要的概念: 函数计算(Function Compute): 函数计算是一个事件驱动的服务,通过函数计算,用户无需管理服务器等运行情况,只需编写代码并上传.函数计算 ...

  6. Spark程序本地运行

    Spark程序本地运行   本次安装是在JDK安装完成的基础上进行的!  SPARK版本和hadoop版本必须对应!!! spark是基于hadoop运算的,两者有依赖关系,见下图: 前言: 1.环境 ...

  7. spark本地环境的搭建到运行第一个spark程序

    搭建spark本地环境 搭建Java环境 (1)到官网下载JDK 官网链接:https://www.oracle.com/technetwork/java/javase/downloads/jdk8- ...

  8. spark window本地运行wordcount错误

    在运行本地运行spark或者hadoop代码时可能会遇到一下三种问题   1.Exception in thread "main" java.lang.UnsatisfiedLin ...

  9. 使用scala开发spark入门总结

    使用scala开发spark入门总结 一.spark简单介绍 关于spark的介绍网上有很多,可以自行百度和google,这里只做简单介绍.推荐简单介绍连接:http://blog.jobbole.c ...

随机推荐

  1. 利用HttpModule开发asp.net页面、ashx等访问时session失效的统一处理入口

    web程序时,当使用session时总会出现失效而报“未将对象引用设置到对象的实例”的http 500错误,本人比较懒,不想每个地方都用try catch处理,就找到个用httpModule统一处理的 ...

  2. Erlang generic standard behaviours -- gen

    在分析 gen_server (或者是gen_fsm )之前,首先应该弄明白,gen 这个module . -module(gen). -compile({inline,[get_node/1]}). ...

  3. iOS学习之基础控件

    一.UILabel      1.UILabel(标签):是显示文本的空间.在App中UILabel是出现频率最高的控件.      2.UILabel是UIView的子类,作为子类一般是为了扩充父类 ...

  4. Centos 安装 p7zip,即Linux下的7z

    Centos 无法直接通过yum安装7z,我们一般通过repoforge,rpmforge的软件包进行安装,你只需要下载一个对应的包,直接安装就可以 p7zip-9.20.1-1.el4.rf.i38 ...

  5. 0-N背包为题(动态规划算法)

    /****************0-N背包问题****************** * 有n个物体装入容量为c的背包,每一个物体有一个体积 * 和一个价值,所装入的物体体积之和不大于背包体积, * ...

  6. rapid js framework

    allcountjs.com http://mean.io/ https://www.meteor.com/ http://sailsjs.org/#!/ nodejs 博客 http://hexo. ...

  7. 如何解决读取到文件末尾时碰到EOF导致的重复输出或者无用输出

    当读取到文件末尾时,会碰到EOF,如何解决呢?    方法一:我们可以通过(ch=fin.get())!=EOF来结束读取,这样就不会像eof()那样碰到EOF之后,还会再进行一次读取,导致输出一个无 ...

  8. 测试web数据库的分布式事务atomikos 的三种数据源 SimpleDataSourceBean,AtomikosDataSourceBean,AtomikosNonXADataSourceBean

    这2天学习了atomikos事务控制框架,其中看到有3种数据源,分别是,SimpleDataSourceBean,AtomikosDataSourceBean,AtomikosNonXADataSou ...

  9. ERP系统实施与企业内部控制管理实践

    COSO内部控制体系包含5 个要素,分别为控制环境.风险评估.控制活动.信息与沟通.监督,涉及公司层面的控制.业务活动的控制以及信息系统总体控制.随着ERP系统的上线运行,企业的内部控制体系建设应与E ...

  10. 软件工程随堂小作业——(C++)

    一.设计思路 本来我的思路是根据上楼的人数和上楼的层数来计算出平均值,但是我发现这个思路不对.于是我选择了最笨的方法,复杂度为O(n2). (1)输入坐电梯的人数和要去的楼层: (2)找到输入楼层里最 ...