spark 1.3.0下的问题
1.在spark SQL的一个test中
无论是registerAsTable还是registerTempTable 都会有问题,经过查找各种资料,采用如下的方式:
val sqlCon=new org.apache.spark.sql.SQLContext(sc)
import sqlContext.
val data=sc.textFile("hdfs://spark-master.dragon.org:8020/user/a.csv")
case class Person(cname:String,age:Int )
val people=data.map(.split(",")).map(p=>Person(p(0),p(1).toInt))
people.toDF.registerTempTable("people")
sql("select * from people").collect
2.有一个问题待解决下边的程序中,如果我在ide里边打包,然后用spark-submit运行,会运行的很正确,如果放在ide里边直接运行,会报错,但是改成local模式怎没有问题报错内容在最下边展示着,这个问题还一直没有解决,待续
package com.san.spark.basic
import org.apache.spark.{SparkContext, SparkConf}
/**
- Created by hadoop on 3/23/16.
*/
object aaa {
def main(args: Array[String]) {
val logFile = "hdfs://spark-master:8020/user/sparkS/a.csv" // Should be some file on your system
val conf = new SparkConf().setAppName("aa") //
.setMaster("spark://spark-master:7077")
//.setJars(List("/opt/data01/myTes/aaa.jar"))
// .setMaster("local")
val sc = new SparkContext(conf)
//sc.addJar("/opt/data01/myTes/aaa.jar")
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
sc.stop()
}
}
报错提示:
/opt/data02/modules/jdk1.7.0_25/bin/java -Didea.launcher.port=7532 -Didea.launcher.bin.path=/opt/data01/idea1411/bin -Dfile.encoding=UTF-8 -classpath /opt/data02/modules/jdk1.7.0_25/jre/lib/jfr.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/rt.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/plugin.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/resources.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/jfxrt.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/deploy.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/charsets.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/jce.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/javaws.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/jsse.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/management-agent.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/sunjce_provider.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/zipfs.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/localedata.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/sunec.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/dnsns.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/sunpkcs11.jar:/opt/data01/myTest/out/production/myTest:/opt/data02/modules/scala-2.10.4/lib/scala-swing.jar:/opt/data02/modules/scala-2.10.4/lib/scala-reflect.jar:/opt/data02/modules/scala-2.10.4/lib/scala-library.jar:/opt/data02/modules/scala-2.10.4/lib/scala-actors-migration.jar:/opt/data02/modules/scala-2.10.4/lib/scala-actors.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/datanucleus-api-jdo-3.2.6.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/datanucleus-core-3.2.10.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/datanucleus-rdbms-3.2.9.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/spark-1.3.0-yarn-shuffle.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/spark-assembly-1.3.0-hadoop2.6.0-cdh5.4.0.jar:/opt/data01/idea1411/lib/idea_rt.jar com.intellij.rt.execution.application.AppMain com.san.spark.basic.aaa
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/03/23 20:34:23 INFO SparkContext: Running Spark version 1.3.0
16/03/23 20:34:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/03/23 20:34:34 INFO SecurityManager: Changing view acls to: hadoop
16/03/23 20:34:34 INFO SecurityManager: Changing modify acls to: hadoop
16/03/23 20:34:34 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
16/03/23 20:34:42 INFO Slf4jLogger: Slf4jLogger started
16/03/23 20:34:43 INFO Remoting: Starting remoting
16/03/23 20:34:45 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@spark-master.dragon.org:42809]
16/03/23 20:34:45 INFO Utils: Successfully started service 'sparkDriver' on port 42809.
16/03/23 20:34:45 INFO SparkEnv: Registering MapOutputTracker
16/03/23 20:34:45 INFO SparkEnv: Registering BlockManagerMaster
16/03/23 20:34:46 INFO DiskBlockManager: Created local directory at /tmp/spark-a5f85f13-60ff-4c6b-b304-6c4875c2050c/blockmgr-f4419025-ba24-4954-ad41-bcde60f2e30f
16/03/23 20:34:46 INFO MemoryStore: MemoryStore started with capacity 243.3 MB
16/03/23 20:34:47 INFO HttpFileServer: HTTP File server directory is /tmp/spark-3fa0b690-a8b2-4325-87db-15c1194d8edf/httpd-97b84871-cd63-4a64-ba36-b7ffdc6aecf7
16/03/23 20:34:47 INFO HttpServer: Starting HTTP Server
16/03/23 20:34:48 INFO Server: jetty-8.y.z-SNAPSHOT
16/03/23 20:34:48 INFO AbstractConnector: Started SocketConnector@0.0.0.0:46460
16/03/23 20:34:48 INFO Utils: Successfully started service 'HTTP file server' on port 46460.
16/03/23 20:34:48 INFO SparkEnv: Registering OutputCommitCoordinator
16/03/23 20:34:49 INFO Server: jetty-8.y.z-SNAPSHOT
16/03/23 20:34:49 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
16/03/23 20:34:49 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/03/23 20:34:49 INFO SparkUI: Started SparkUI at http://spark-master.dragon.org:4040
16/03/23 20:34:50 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@spark-master.dragon.org:7077/user/Master...
16/03/23 20:34:53 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160323203453-0006
16/03/23 20:34:53 INFO AppClient$ClientActor: Executor added: app-20160323203453-0006/0 on worker-20160323192033-spark-master.dragon.org-7078 (spark-master.dragon.org:7078) with 1 cores
16/03/23 20:34:53 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160323203453-0006/0 on hostPort spark-master.dragon.org:7078 with 1 cores, 512.0 MB RAM
16/03/23 20:34:53 INFO AppClient$ClientActor: Executor updated: app-20160323203453-0006/0 is now RUNNING
16/03/23 20:34:54 INFO AppClient$ClientActor: Executor updated: app-20160323203453-0006/0 is now LOADING
16/03/23 20:34:56 INFO NettyBlockTransferService: Server created on 42339
16/03/23 20:34:56 INFO BlockManagerMaster: Trying to register BlockManager
16/03/23 20:34:56 INFO BlockManagerMasterActor: Registering block manager spark-master.dragon.org:42339 with 243.3 MB RAM, BlockManagerId(, spark-master.dragon.org, 42339)
16/03/23 20:34:56 INFO BlockManagerMaster: Registered BlockManager
16/03/23 20:35:00 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
16/03/23 20:35:04 INFO MemoryStore: ensureFreeSpace(188959) called with curMem=0, maxMem=255087083
16/03/23 20:35:04 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 184.5 KB, free 243.1 MB)
16/03/23 20:35:05 INFO MemoryStore: ensureFreeSpace(26111) called with curMem=188959, maxMem=255087083
16/03/23 20:35:05 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 25.5 KB, free 243.1 MB)
16/03/23 20:35:05 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on spark-master.dragon.org:42339 (size: 25.5 KB, free: 243.2 MB)
16/03/23 20:35:05 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
16/03/23 20:35:05 INFO SparkContext: Created broadcast 0 from textFile at aaa.scala:15
16/03/23 20:35:23 INFO FileInputFormat: Total input paths to process : 1
16/03/23 20:35:25 INFO SparkContext: Starting job: count at aaa.scala:16
16/03/23 20:35:26 INFO DAGScheduler: Got job 0 (count at aaa.scala:16) with 2 output partitions (allowLocal=false)
16/03/23 20:35:26 INFO DAGScheduler: Final stage: Stage 0(count at aaa.scala:16)
16/03/23 20:35:26 INFO DAGScheduler: Parents of final stage: List()
16/03/23 20:35:27 INFO DAGScheduler: Missing parents: List()
16/03/23 20:35:27 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[2] at filter at aaa.scala:16), which has no missing parents
16/03/23 20:35:28 INFO MemoryStore: ensureFreeSpace(2856) called with curMem=215070, maxMem=255087083
16/03/23 20:35:28 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.8 KB, free 243.1 MB)
16/03/23 20:35:29 INFO MemoryStore: ensureFreeSpace(2068) called with curMem=217926, maxMem=255087083
16/03/23 20:35:29 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.0 KB, free 243.1 MB)
16/03/23 20:35:29 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on spark-master.dragon.org:42339 (size: 2.0 KB, free: 243.2 MB)
16/03/23 20:35:29 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
16/03/23 20:35:29 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:839
16/03/23 20:35:29 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MapPartitionsRDD[2] at filter at aaa.scala:16)
16/03/23 20:35:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/03/23 20:35:44 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
16/03/23 20:35:53 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@spark-master.dragon.org:52843/user/Executor#-985631278] with ID 0
16/03/23 20:35:54 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, spark-master.dragon.org, NODE_LOCAL, 1317 bytes)
16/03/23 20:35:57 INFO BlockManagerMasterActor: Registering block manager spark-master.dragon.org:40898 with 267.3 MB RAM, BlockManagerId(0, spark-master.dragon.org, 40898)
16/03/23 20:36:06 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on spark-master.dragon.org:40898 (size: 2.0 KB, free: 267.3 MB)
16/03/23 20:36:09 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, spark-master.dragon.org, NODE_LOCAL, 1317 bytes)
16/03/23 20:36:09 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, spark-master.dragon.org): java.lang.ClassNotFoundException: com.san.spark.basic.aaa$$anonfun$1
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1610)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1515)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1769)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
16/03/23 20:36:09 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 2, spark-master.dragon.org, NODE_LOCAL, 1317 bytes)
16/03/23 20:36:10 INFO TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) on executor spark-master.dragon.org: java.lang.ClassNotFoundException (com.san.spark.basic.aaa$$anonfun$1) [duplicate 1]
16/03/23 20:36:10 INFO TaskSetManager: Starting task 1.1 in stage 0.0 (TID 3, spark-master.dragon.org, NODE_LOCAL, 1317 bytes)
16/03/23 20:36:10 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 2) on executor spark-master.dragon.org: java.lang.ClassNotFoundException (com.san.spark.basic.aaa$$anonfun$1) [duplicate 2]
16/03/23 20:36:10 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 4, spark-master.dragon.org, NODE_LOCAL, 1317 bytes)
16/03/23 20:36:10 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 3) on executor spark-master.dragon.org: java.lang.ClassNotFoundException (com.san.spark.basic.aaa$$anonfun$1) [duplicate 3]
16/03/23 20:36:10 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID 5, spark-master.dragon.org, NODE_LOCAL, 1317 bytes)
16/03/23 20:36:10 INFO TaskSetManager: Lost task 0.2 in stage 0.0 (TID 4) on executor spark-master.dragon.org: java.lang.ClassNotFoundException (com.san.spark.basic.aaa$$anonfun$1) [duplicate 4]
16/03/23 20:36:10 INFO TaskSetManager: Starting task 0.3 in stage 0.0 (TID 6, spark-master.dragon.org, NODE_LOCAL, 1317 bytes)
16/03/23 20:36:10 INFO TaskSetManager: Lost task 1.2 in stage 0.0 (TID 5) on executor spark-master.dragon.org: java.lang.ClassNotFoundException (com.san.spark.basic.aaa$$anonfun$1) [duplicate 5]
16/03/23 20:36:11 INFO TaskSetManager: Starting task 1.3 in stage 0.0 (TID 7, spark-master.dragon.org, NODE_LOCAL, 1317 bytes)
16/03/23 20:36:11 INFO TaskSetManager: Lost task 0.3 in stage 0.0 (TID 6) on executor spark-master.dragon.org: java.lang.ClassNotFoundException (com.san.spark.basic.aaa$$anonfun$1) [duplicate 6]
16/03/23 20:36:11 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job
16/03/23 20:36:11 INFO TaskSetManager: Lost task 1.3 in stage 0.0 (TID 7) on executor spark-master.dragon.org: java.lang.ClassNotFoundException (com.san.spark.basic.aaa$$anonfun$1) [duplicate 7]
16/03/23 20:36:11 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/03/23 20:36:11 INFO TaskSchedulerImpl: Cancelling stage 0
16/03/23 20:36:11 INFO DAGScheduler: Job 0 failed: count at aaa.scala:16, took 45.782447 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, spark-master.dragon.org): java.lang.ClassNotFoundException: com.san.spark.basic.aaa$$anonfun$1
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1610)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1515)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1769)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Process finished with exit code 1
问题3:在IDEA中运行一个scala程序的时候出现了如下错误:
/opt/data02/modules/jdk1.7.0_25/bin/java -Didea.launcher.port=7535 -Didea.launcher.bin.path=/opt/data01/idea1411/bin -Dfile.encoding=UTF-8 -classpath /opt/data02/modules/jdk1.7.0_25/jre/lib/jfr.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/rt.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/plugin.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/resources.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/jfxrt.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/deploy.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/charsets.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/jce.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/javaws.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/jsse.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/management-agent.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/sunjce_provider.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/zipfs.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/localedata.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/sunec.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/dnsns.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/sunpkcs11.jar:/opt/data01/myTest/out/production/myTest:/opt/data02/modules/scala-2.10.4/lib/scala-swing.jar:/opt/data02/modules/scala-2.10.4/lib/scala-reflect.jar:/opt/data02/modules/scala-2.10.4/lib/scala-library.jar:/opt/data02/modules/scala-2.10.4/lib/scala-actors-migration.jar:/opt/data02/modules/scala-2.10.4/lib/scala-actors.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/datanucleus-api-jdo-3.2.6.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/datanucleus-core-3.2.10.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/datanucleus-rdbms-3.2.9.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/spark-1.3.0-yarn-shuffle.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/spark-assembly-1.3.0-hadoop2.6.0-cdh5.4.0.jar:/opt/data01/idea1411/lib/idea_rt.jar com.intellij.rt.execution.application.AppMain com.github.GroupByTest
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/03/27 02:18:09 INFO SparkContext: Running Spark version 1.3.0
Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.(SparkContext.scala:206)
at com.github.GroupByTest$.main(GroupByTest.scala:18)
at com.github.GroupByTest.main(GroupByTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Process finished with exit code 1
我运行的程序很简单:
package com.github
import java.util.Random
import org.apache.spark.{SparkConf, SparkContext}
/**
Created by hadoop on 3/27/16.
*/
object GroupByTest {
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("GroupBy Test")
var numMappers = if (args.length > 0) args(0).toInt else 2
var numKVPairs = if (args.length > 1) args(1).toInt else 1000
var valSize = if (args.length > 2) args(2).toInt else 1000
var numReducers = if (args.length > 3) args(3).toInt else numMappersval sc = new SparkContext(sparkConf)
val pairs1 = sc.parallelize(0 until numMappers, numMappers).flatMap { p =>
val ranGen = new Random
var arr1 = new Array(Int, Array[Byte])
for (i <- 0 until numKVPairs) {
val byteArr = new ArrayByte
ranGen.nextBytes(byteArr)
arr1(i) = (ranGen.nextInt(Int.MaxValue), byteArr)
}
arr1
}.cache()
// Enforce that everything has been calculated and in cache
pairs1.count()println(pairs1.groupByKey(numReducers).count())
sc.stop()
}
}
原因很简单,我没有设置VM Options ,因此我只需点击 run---run configurations---配置参数即可,截图如下:
即可
4.在终端下运行我的第一个spark streaming的时候,提示:

原因是我的9999端口没有打开,
查看端口:netstat -an | grep 9999
打开端口:nc -lp 9999 &
spark 1.3.0下的问题的更多相关文章
- Spark快速入门 - Spark 1.6.0
Spark快速入门 - Spark 1.6.0 转载请注明出处:http://www.cnblogs.com/BYRans/ 快速入门(Quick Start) 本文简单介绍了Spark的使用方式.首 ...
- Apache Spark 2.2.0 中文文档 - 概述 | ApacheCN
Spark 概述 Apache Spark 是一个快速的, 多用途的集群计算系统. 它提供了 Java, Scala, Python 和 R 的高级 API,以及一个支持通用的执行图计算的优化过的引擎 ...
- Apache Spark 2.2.0 中文文档 - Spark 编程指南 | ApacheCN
Spark 编程指南 概述 Spark 依赖 初始化 Spark 使用 Shell 弹性分布式数据集 (RDDs) 并行集合 外部 Datasets(数据集) RDD 操作 基础 传递 Functio ...
- Apache Spark 2.2.0 中文文档 - Spark Streaming 编程指南 | ApacheCN
Spark Streaming 编程指南 概述 一个入门示例 基础概念 依赖 初始化 StreamingContext Discretized Streams (DStreams)(离散化流) Inp ...
- Apache Spark 2.2.0 中文文档 - Spark SQL, DataFrames and Datasets Guide | ApacheCN
Spark SQL, DataFrames and Datasets Guide Overview SQL Datasets and DataFrames 开始入门 起始点: SparkSession ...
- Apache Spark 2.2.0 中文文档 - SparkR (R on Spark) | ApacheCN
SparkR (R on Spark) 概述 SparkDataFrame 启动: SparkSession 从 RStudio 来启动 创建 SparkDataFrames 从本地的 data fr ...
- Apache Spark 2.2.0 中文文档 - Submitting Applications | ApacheCN
Submitting Applications 在 script in Spark的 bin 目录中的spark-submit 脚本用与在集群上启动应用程序.它可以通过一个统一的接口使用所有 Spar ...
- [Spark性能调优] 第三章 : Spark 2.1.0 中 Sort-Based Shuffle 产生的内幕
本課主題 Sorted-Based Shuffle 的诞生和介绍 Shuffle 中六大令人费解的问题 Sorted-Based Shuffle 的排序和源码鉴赏 Shuffle 在运行时的内存管理 ...
- Spark RDD的默认分区数:(spark 2.1.0)
本文基于Spark 2.1.0版本 新手首先要明白几个配置: spark.default.parallelism:(默认的并发数) 如果配置文件spark-default.conf中没有显示的配置,则 ...
随机推荐
- CCScale9Sprite的使用
#include "cocos-ext.h" USING_NS_CC_EXT; //框中需要显示的label CCLabelTTF *label = CCLableTTF::cre ...
- android开发之路11(用SharedPreferences存储数据)
Android平台给我们提供了一个SharedPreferences类,实际上SharedPreferences处理的就是一个key-value(键值对),它是 一个轻量级的存储类,特别适合用于保存软 ...
- [转]"Windows Phone 7程序设计”完全版电子书可以免费下载了
本文转自:http://www.cnblogs.com/salam/archive/2010/10/29/1864246.html 现在学习Windows Phone 7开发资料十分有限,除了MSDN ...
- 【二分查找最优解】FZU 2056 最大正方形
题意:现在有一个n*m的矩阵A,在A中找一个H*H的正方形,使得其面积最大且该正方形元素的和不大于 limit. 分析:开始以为是DP或者二维RMQ,其实用二分就可以做出来: 在输入时构造元素和矩阵d ...
- IP地址计算和划分
一. B类地址 范围从128-191(第一串8位二进制10000000~10111111),如172.168.1.1,第一和第二段号码为网络号码,剩下的2段号码为本地计算机的号码.转换为2进 ...
- css 盒模型相关样式
话不多说,一切还是从最基础的说起. 盒的类型 1.盒的基本类型 在css中,用display定义盒的类型,一般分为block类型与inline类型. 例如div属于block类型,span属于in ...
- Ajax具体使用
Ajax在注册页面经常使用,当注册用户是用户名必须唯一时,要使用ajax获取表单中你输入的内容, 然后将这个数据利用XMLHttpRequest对象请求一个php文件,这个php文件接收到信息,查询数 ...
- Jersey(1.19.1) - Security
Security information is available by obtaining the SecurityContext using @Context, which is essentia ...
- Quartz Scheduler(2.2.1) - Working with JobStores
About Job Stores JobStores are responsible for keeping track of all the work data you give to the sc ...
- Redis - 环境的安装配置
Redis 下载安装步骤: 下载 redis 源码: [huey@huey-K42JE ~]$ wget http://download.redis.io/releases/redis-x.y.z.t ...