1.建立spakTesk项目,建立scala对象Test

2.Tesk对象的代码如下

package sparkTest

/**
* Created by jiahong on 15-8-2.
*/
import org.apache.spark.{SparkConf,SparkContext}
object Test {
def main(args: Array[String]) {
if (args.length < ) {
System.err.println("Usage: <file>")
System.exit()
} val conf=new SparkConf().setAppName("Test").setMaster("local")
val sc=new SparkContext(conf) val rdd=sc.textFile("/home/jiahong/sparkWorkSpace/input") //统计单词个数,然后按个数从高到低排序
val result=rdd.flatMap(_.split(" ")).map((_,)).reduceByKey(_+_).map(x=>(x._2,x._1)).sortByKey(false).map(x=>(x._2,x._1)) result.saveAsTextFile("/home/jiahong/sparkWorkSpace/output") print(result) } }

本地测试hive的话,代码如下

package sparkTest.sparkSql

import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.{SparkContext, SparkConf} object HiveSqlTest {
def main(args: Array[String]) { val conf = new SparkConf().setAppName("HiveLink").setMaster("spark://JIAs-Mac.local:7077").
setJars(Array("/Users/JIA/Desktop/jar/hiveTest/sparkTest.jar")) val sc = new SparkContext(conf)
val sqlContext = new HiveContext(sc) val sql = "select * from Test limit 100" sqlContext.sql(sql).map(s => s(0) + "," + s(1) + "," + s(2) + "," + s(3)+","+s(4)).collect().foreach(println) }
}

注意:需要把hive-site.xml放到项目目录下,新建Resources设置为Resources root

3.设置本地运行,在IDEA的右上角-点开Edit Configurations

4.设置本地运行,在Vm options:上填写:-Dspark.master=local ,Program arguments上填写:local

5.点击run运行,run前先开启本机的spark

/usr/lib/jdk/jdk1..0_79/bin/java -Dspark.master=local -Didea.launcher.port= -Didea.launcher.bin.path=/home/jiahong/idea-IC-141.1532./bin -Dfile.encoding=UTF- -classpath /usr/lib/jdk/jdk1..0_79/jre/lib/resources.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/jfxrt.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/charsets.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/jsse.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/rt.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/plugin.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/deploy.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/jfr.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/javaws.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/management-agent.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/jce.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/ext/zipfs.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/ext/dnsns.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/ext/sunec.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/ext/sunjce_provider.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/ext/sunpkcs11.jar:/usr/lib/jdk/jdk1..0_79/jre/lib/ext/localedata.jar:/home/jiahong/IdeaProjects/sparkTest/out/production/sparkTest:/home/jiahong/apache/spark-1.3.-bin-hadoop2./lib/spark-assembly-1.3.-hadoop2.6.0.jar:/home/jiahong/apache/scala-2.10./lib/scala-actors-migration.jar:/home/jiahong/apache/scala-2.10./lib/scala-reflect.jar:/home/jiahong/apache/scala-2.10./lib/scala-actors.jar:/home/jiahong/apache/scala-2.10./lib/scala-swing.jar:/home/jiahong/apache/scala-2.10./lib/scala-library.jar:/home/jiahong/idea-IC-141.1532./lib/idea_rt.jar com.intellij.rt.execution.application.AppMain sparkTest.Test local
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
// :: INFO SparkContext: Running Spark version 1.3.
// :: WARN Utils: Your hostname, jiahong-OptiPlex- resolves to a loopback address: 127.0.1.1; using 192.168.199.187 instead (on interface eth0)
// :: WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
// :: WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
// :: INFO SecurityManager: Changing view acls to: jiahong
// :: INFO SecurityManager: Changing modify acls to: jiahong
// :: INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jiahong); users with modify permissions: Set(jiahong)
// :: INFO Slf4jLogger: Slf4jLogger started
// :: INFO Remoting: Starting remoting
// :: INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@jiahong-OptiPlex-7010.lan:37917]
// :: INFO Utils: Successfully started service 'sparkDriver' on port .
// :: INFO SparkEnv: Registering MapOutputTracker
// :: INFO SparkEnv: Registering BlockManagerMaster
// :: INFO DiskBlockManager: Created local directory at /tmp/spark-a2cbde0d--4a95-80df-a99a14127efc/blockmgr-3cbdae80-810a-4ecf-b012-0979b3d714d0
// :: INFO MemoryStore: MemoryStore started with capacity 469.5 MB
// :: INFO HttpFileServer: HTTP File server directory is /tmp/spark--df98-4e7e-afa1-4dd36b655012/httpd-28cb8de9-caa4---347cea890b07
// :: INFO HttpServer: Starting HTTP Server
// :: INFO Server: jetty-.y.z-SNAPSHOT
// :: INFO AbstractConnector: Started SocketConnector@0.0.0.0:
// :: INFO Utils: Successfully started service 'HTTP file server' on port .
// :: INFO SparkEnv: Registering OutputCommitCoordinator
// :: INFO Server: jetty-.y.z-SNAPSHOT
// :: INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:
// :: INFO Utils: Successfully started service 'SparkUI' on port .
// :: INFO SparkUI: Started SparkUI at http://jiahong-OptiPlex-7010.lan:4040
// :: INFO Executor: Starting executor ID <driver> on host localhost
// :: INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@jiahong-OptiPlex-7010.lan:37917/user/HeartbeatReceiver
// :: INFO NettyBlockTransferService: Server created on
// :: INFO BlockManagerMaster: Trying to register BlockManager
// :: INFO BlockManagerMasterActor: Registering block manager localhost: with 469.5 MB RAM, BlockManagerId(<driver>, localhost, )
// :: INFO BlockManagerMaster: Registered BlockManager
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 178.6 KB, free 469.4 MB)
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 24.8 KB, free 469.3 MB)
// :: INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost: (size: 24.8 KB, free: 469.5 MB)
// :: INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
// :: INFO SparkContext: Created broadcast from textFile at Test.scala:
// :: INFO FileInputFormat: Total input paths to process :
MapPartitionsRDD[] at map at Test.scala:
// :: INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
// :: INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
// :: INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
// :: INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
// :: INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
// :: INFO SparkContext: Starting job: saveAsTextFile at Test.scala:
// :: INFO DAGScheduler: Registering RDD (map at Test.scala:)
// :: INFO DAGScheduler: Registering RDD (map at Test.scala:)
// :: INFO DAGScheduler: Got job (saveAsTextFile at Test.scala:) with output partitions (allowLocal=false)
// :: INFO DAGScheduler: Final stage: Stage (saveAsTextFile at Test.scala:)
// :: INFO DAGScheduler: Parents of final stage: List(Stage )
// :: INFO DAGScheduler: Missing parents: List(Stage )
// :: INFO DAGScheduler: Submitting Stage (MapPartitionsRDD[] at map at Test.scala:), which has no missing parents
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.6 KB, free 469.3 MB)
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.6 KB, free 469.3 MB)
// :: INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost: (size: 2.6 KB, free: 469.5 MB)
// :: INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
// :: INFO SparkContext: Created broadcast from broadcast at DAGScheduler.scala:
// :: INFO DAGScheduler: Submitting missing tasks from Stage (MapPartitionsRDD[] at map at Test.scala:)
// :: INFO TaskSchedulerImpl: Adding task set 0.0 with tasks
// :: INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID , localhost, PROCESS_LOCAL, bytes)
// :: INFO Executor: Running task 0.0 in stage 0.0 (TID )
// :: INFO HadoopRDD: Input split: file:/home/jiahong/sparkWorkSpace/input/test.txt:+
// :: INFO Executor: Finished task 0.0 in stage 0.0 (TID ). bytes result sent to driver
// :: INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID ) in ms on localhost (/)
// :: INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
// :: INFO DAGScheduler: Stage (map at Test.scala:) finished in 0.092 s
// :: INFO DAGScheduler: looking for newly runnable stages
// :: INFO DAGScheduler: running: Set()
// :: INFO DAGScheduler: waiting: Set(Stage , Stage )
// :: INFO DAGScheduler: failed: Set()
// :: INFO DAGScheduler: Missing parents for Stage : List()
// :: INFO DAGScheduler: Missing parents for Stage : List(Stage )
// :: INFO DAGScheduler: Submitting Stage (MapPartitionsRDD[] at map at Test.scala:), which is now runnable
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.0 KB, free 469.3 MB)
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.1 KB, free 469.3 MB)
// :: INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost: (size: 2.1 KB, free: 469.5 MB)
// :: INFO BlockManagerMaster: Updated info of block broadcast_2_piece0
// :: INFO SparkContext: Created broadcast from broadcast at DAGScheduler.scala:
// :: INFO DAGScheduler: Submitting missing tasks from Stage (MapPartitionsRDD[] at map at Test.scala:)
// :: INFO TaskSchedulerImpl: Adding task set 1.0 with tasks
// :: INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID , localhost, PROCESS_LOCAL, bytes)
// :: INFO Executor: Running task 0.0 in stage 1.0 (TID )
// :: INFO ShuffleBlockFetcherIterator: Getting non-empty blocks out of blocks
// :: INFO ShuffleBlockFetcherIterator: Started remote fetches in ms
// :: INFO Executor: Finished task 0.0 in stage 1.0 (TID ). bytes result sent to driver
// :: INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID ) in ms on localhost (/)
// :: INFO DAGScheduler: Stage (map at Test.scala:) finished in 0.077 s
// :: INFO DAGScheduler: looking for newly runnable stages
// :: INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
// :: INFO DAGScheduler: running: Set()
// :: INFO DAGScheduler: waiting: Set(Stage )
// :: INFO DAGScheduler: failed: Set()
// :: INFO DAGScheduler: Missing parents for Stage : List()
// :: INFO DAGScheduler: Submitting Stage (MapPartitionsRDD[] at saveAsTextFile at Test.scala:), which is now runnable
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 124.7 KB, free 469.2 MB)
// :: INFO MemoryStore: ensureFreeSpace() called with curMem=, maxMem=
// :: INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 74.9 KB, free 469.1 MB)
// :: INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on localhost: (size: 74.9 KB, free: 469.4 MB)
// :: INFO BlockManagerMaster: Updated info of block broadcast_3_piece0
// :: INFO SparkContext: Created broadcast from broadcast at DAGScheduler.scala:
// :: INFO DAGScheduler: Submitting missing tasks from Stage (MapPartitionsRDD[] at saveAsTextFile at Test.scala:)
// :: INFO TaskSchedulerImpl: Adding task set 2.0 with tasks
// :: INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID , localhost, PROCESS_LOCAL, bytes)
// :: INFO Executor: Running task 0.0 in stage 2.0 (TID )
// :: INFO ShuffleBlockFetcherIterator: Getting non-empty blocks out of blocks
// :: INFO ShuffleBlockFetcherIterator: Started remote fetches in ms
// :: INFO FileOutputCommitter: Saved output of task 'attempt_201508021058_0002_m_000000_2' to file:/home/jiahong/sparkWorkSpace/output/_temporary//task_201508021058_0002_m_000000
// :: INFO SparkHadoopMapRedUtil: attempt_201508021058_0002_m_000000_2: Committed
// :: INFO Executor: Finished task 0.0 in stage 2.0 (TID ). bytes result sent to driver
// :: INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID ) in ms on localhost (/)
// :: INFO DAGScheduler: Stage (saveAsTextFile at Test.scala:) finished in 0.138 s
// :: INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
// :: INFO DAGScheduler: Job finished: saveAsTextFile at Test.scala:, took 0.483353 s
MapPartitionsRDD[] at map at Test.scala:
Process finished with exit code

6.结果如下:

input目录下有个test.txt文件,内容如下

运行之后,output目录下文件如下:

  

注意:

一开始运行时,可能会碰到如下问题

Exception in thread "main" java.lang.NoSuchMethodError: 

解决办法是,在启动你的spark时,观看scala的版本是多少,然后你在本机安装对应的版本,最后在IDEA上修改过来。我之前本机安装的是2.11.7版本,导致错误,最后查看spark的scala版本为2.10.4,我重新安装了,然后再在IDEA上修改过来,就可以正确运行了!

IDEA开发spark本地运行的更多相关文章

  1. Spark本地运行成功,集群运行空指针异。

    一个很久之前写的Spark作业,当时运行在local模式下.最近又开始处理这方面数据了,就打包提交集群,结果频频空指针.最开始以为是程序中有null调用了,经过排除发现是继承App导致集群运行时候无法 ...

  2. spark之scala程序开发(本地运行模式):单词出现次数统计

    准备工作: 将运行Scala-Eclipse的机器节点(CloudDeskTop)内存调整至4G,因为需要在该节点上跑本地(local)Spark程序,本地Spark程序会启动Worker进程耗用大量 ...

  3. spark之scala程序开发(集群运行模式):单词出现次数统计

    准备工作: 将运行Scala-Eclipse的机器节点(CloudDeskTop)内存调整至4G,因为需要在该节点上跑本地(local)Spark程序,本地Spark程序会启动Worker进程耗用大量 ...

  4. windows下Idea结合maven开发spark和本地调试

    本人的开发环境: 1.虚拟机centos 6.5 2.jdk 1.8 3.spark2.2.0 4.scala 2.11.8 5.maven 3.5.2     在开发和搭环境时必须注意版本兼容的问题 ...

  5. 开发函数计算的正确姿势 —— 使用 Fun Local 本地运行与调试

    前言 首先介绍下在本文出现的几个比较重要的概念: 函数计算(Function Compute): 函数计算是一个事件驱动的服务,通过函数计算,用户无需管理服务器等运行情况,只需编写代码并上传.函数计算 ...

  6. Spark程序本地运行

    Spark程序本地运行   本次安装是在JDK安装完成的基础上进行的!  SPARK版本和hadoop版本必须对应!!! spark是基于hadoop运算的,两者有依赖关系,见下图: 前言: 1.环境 ...

  7. spark本地环境的搭建到运行第一个spark程序

    搭建spark本地环境 搭建Java环境 (1)到官网下载JDK 官网链接:https://www.oracle.com/technetwork/java/javase/downloads/jdk8- ...

  8. spark window本地运行wordcount错误

    在运行本地运行spark或者hadoop代码时可能会遇到一下三种问题   1.Exception in thread "main" java.lang.UnsatisfiedLin ...

  9. 使用scala开发spark入门总结

    使用scala开发spark入门总结 一.spark简单介绍 关于spark的介绍网上有很多,可以自行百度和google,这里只做简单介绍.推荐简单介绍连接:http://blog.jobbole.c ...

随机推荐

  1. 【J2EE】Java连接SQL Server 2000问题:“com.microsoft.sqlserver.jdbc.SQLServerException:用户'sa'登录失败。该用户与可信SQL Server连接无关联”

    1.问题现象 E:\JSP\HibernateDemo\HibernateDemoProject\src\sine>java ConnectSQLServerConnect failed!com ...

  2. Cygwin下软件安装 - apt-cyg

    安装了cygwin,但不能像centos上装yum,装东西很不方便.找了下可以用apt-cyg来安装软件. 1.下载apt-cyg $ wget raw.github.com/transcode-op ...

  3. oracle分区表(整理)

    Oracle 表分区 早在8.0.5版本中,Oracle就将范围分区技术引入,现在分区功能已经越来越强大,包括支持扩展分区功能.Interval分区.外键分区.模拟列分区.以及分区建议器等.那么,分区 ...

  4. hdu 1548 A strange lift

    题目连接 http://acm.hdu.edu.cn/showproblem.php?pid=1548 A strange lift Description There is a strange li ...

  5. Android之通过向WebService服务器发送XML数据获取相关服务

    原理图如下:        即客户端向WebService服务器通过HTTP协议发送XML数据(内部包含调用的一些方法和相关参数数据),然后WebService服务器给客户端返回一定的XML格式的数据 ...

  6. Ubuntu14.04 Objective-C hello world

    1. Install GNUstep sudo apt-get install gnustep gnustep-devel 2. Write hello world program, and save ...

  7. linux安装R语言

    系统:centos 6.4  64bit 安装可以使用rpm包安装,也可以用源码安装. 但是rpm安装,各种依赖比较麻烦.所以我采用源码安装. 下载:http://www.r-project.org/ ...

  8. [转]理解与使用Javascript中的回调函数

    在Javascript中,函数是第一类对象,这意味着函数可以像对象一样按照第一类管理被使用.既然函数实际上是对象:它们能被“存储”在变量中,能作为函数参数被传递,能在函数中被创建,能从函数中返回. 因 ...

  9. 为SM30视图分配事务代码

    Tcode:SE93

  10. mysql启动问题access denied for user 'root'@'localhost'(using password:YES)

    安装Mysql后利用SQLyogEnt启动是提示“access denied for user 'root'@'localhost'(using password:YES)”,开始我还为是因为是密码问 ...