spark 1.3.0下的问题
1.在spark SQL的一个test中
无论是registerAsTable还是registerTempTable 都会有问题,经过查找各种资料,采用如下的方式:
val sqlCon=new org.apache.spark.sql.SQLContext(sc)
import sqlContext.
val data=sc.textFile("hdfs://spark-master.dragon.org:8020/user/a.csv")
case class Person(cname:String,age:Int )
val people=data.map(.split(",")).map(p=>Person(p(0),p(1).toInt))
people.toDF.registerTempTable("people")
sql("select * from people").collect
2.有一个问题待解决下边的程序中,如果我在ide里边打包,然后用spark-submit运行,会运行的很正确,如果放在ide里边直接运行,会报错,但是改成local模式怎没有问题报错内容在最下边展示着,这个问题还一直没有解决,待续
package com.san.spark.basic
import org.apache.spark.{SparkContext, SparkConf}
/**
- Created by hadoop on 3/23/16.
*/
object aaa {
def main(args: Array[String]) {
val logFile = "hdfs://spark-master:8020/user/sparkS/a.csv" // Should be some file on your system
val conf = new SparkConf().setAppName("aa") //
.setMaster("spark://spark-master:7077")
//.setJars(List("/opt/data01/myTes/aaa.jar"))
// .setMaster("local")
val sc = new SparkContext(conf)
//sc.addJar("/opt/data01/myTes/aaa.jar")
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
sc.stop()
}
}
报错提示:
/opt/data02/modules/jdk1.7.0_25/bin/java -Didea.launcher.port=7532 -Didea.launcher.bin.path=/opt/data01/idea1411/bin -Dfile.encoding=UTF-8 -classpath /opt/data02/modules/jdk1.7.0_25/jre/lib/jfr.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/rt.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/plugin.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/resources.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/jfxrt.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/deploy.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/charsets.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/jce.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/javaws.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/jsse.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/management-agent.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/sunjce_provider.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/zipfs.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/localedata.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/sunec.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/dnsns.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/sunpkcs11.jar:/opt/data01/myTest/out/production/myTest:/opt/data02/modules/scala-2.10.4/lib/scala-swing.jar:/opt/data02/modules/scala-2.10.4/lib/scala-reflect.jar:/opt/data02/modules/scala-2.10.4/lib/scala-library.jar:/opt/data02/modules/scala-2.10.4/lib/scala-actors-migration.jar:/opt/data02/modules/scala-2.10.4/lib/scala-actors.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/datanucleus-api-jdo-3.2.6.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/datanucleus-core-3.2.10.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/datanucleus-rdbms-3.2.9.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/spark-1.3.0-yarn-shuffle.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/spark-assembly-1.3.0-hadoop2.6.0-cdh5.4.0.jar:/opt/data01/idea1411/lib/idea_rt.jar com.intellij.rt.execution.application.AppMain com.san.spark.basic.aaa
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/03/23 20:34:23 INFO SparkContext: Running Spark version 1.3.0
16/03/23 20:34:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/03/23 20:34:34 INFO SecurityManager: Changing view acls to: hadoop
16/03/23 20:34:34 INFO SecurityManager: Changing modify acls to: hadoop
16/03/23 20:34:34 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
16/03/23 20:34:42 INFO Slf4jLogger: Slf4jLogger started
16/03/23 20:34:43 INFO Remoting: Starting remoting
16/03/23 20:34:45 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@spark-master.dragon.org:42809]
16/03/23 20:34:45 INFO Utils: Successfully started service 'sparkDriver' on port 42809.
16/03/23 20:34:45 INFO SparkEnv: Registering MapOutputTracker
16/03/23 20:34:45 INFO SparkEnv: Registering BlockManagerMaster
16/03/23 20:34:46 INFO DiskBlockManager: Created local directory at /tmp/spark-a5f85f13-60ff-4c6b-b304-6c4875c2050c/blockmgr-f4419025-ba24-4954-ad41-bcde60f2e30f
16/03/23 20:34:46 INFO MemoryStore: MemoryStore started with capacity 243.3 MB
16/03/23 20:34:47 INFO HttpFileServer: HTTP File server directory is /tmp/spark-3fa0b690-a8b2-4325-87db-15c1194d8edf/httpd-97b84871-cd63-4a64-ba36-b7ffdc6aecf7
16/03/23 20:34:47 INFO HttpServer: Starting HTTP Server
16/03/23 20:34:48 INFO Server: jetty-8.y.z-SNAPSHOT
16/03/23 20:34:48 INFO AbstractConnector: Started SocketConnector@0.0.0.0:46460
16/03/23 20:34:48 INFO Utils: Successfully started service 'HTTP file server' on port 46460.
16/03/23 20:34:48 INFO SparkEnv: Registering OutputCommitCoordinator
16/03/23 20:34:49 INFO Server: jetty-8.y.z-SNAPSHOT
16/03/23 20:34:49 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
16/03/23 20:34:49 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/03/23 20:34:49 INFO SparkUI: Started SparkUI at http://spark-master.dragon.org:4040
16/03/23 20:34:50 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@spark-master.dragon.org:7077/user/Master...
16/03/23 20:34:53 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160323203453-0006
16/03/23 20:34:53 INFO AppClient$ClientActor: Executor added: app-20160323203453-0006/0 on worker-20160323192033-spark-master.dragon.org-7078 (spark-master.dragon.org:7078) with 1 cores
16/03/23 20:34:53 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160323203453-0006/0 on hostPort spark-master.dragon.org:7078 with 1 cores, 512.0 MB RAM
16/03/23 20:34:53 INFO AppClient$ClientActor: Executor updated: app-20160323203453-0006/0 is now RUNNING
16/03/23 20:34:54 INFO AppClient$ClientActor: Executor updated: app-20160323203453-0006/0 is now LOADING
16/03/23 20:34:56 INFO NettyBlockTransferService: Server created on 42339
16/03/23 20:34:56 INFO BlockManagerMaster: Trying to register BlockManager
16/03/23 20:34:56 INFO BlockManagerMasterActor: Registering block manager spark-master.dragon.org:42339 with 243.3 MB RAM, BlockManagerId(, spark-master.dragon.org, 42339)
16/03/23 20:34:56 INFO BlockManagerMaster: Registered BlockManager
16/03/23 20:35:00 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
16/03/23 20:35:04 INFO MemoryStore: ensureFreeSpace(188959) called with curMem=0, maxMem=255087083
16/03/23 20:35:04 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 184.5 KB, free 243.1 MB)
16/03/23 20:35:05 INFO MemoryStore: ensureFreeSpace(26111) called with curMem=188959, maxMem=255087083
16/03/23 20:35:05 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 25.5 KB, free 243.1 MB)
16/03/23 20:35:05 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on spark-master.dragon.org:42339 (size: 25.5 KB, free: 243.2 MB)
16/03/23 20:35:05 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
16/03/23 20:35:05 INFO SparkContext: Created broadcast 0 from textFile at aaa.scala:15
16/03/23 20:35:23 INFO FileInputFormat: Total input paths to process : 1
16/03/23 20:35:25 INFO SparkContext: Starting job: count at aaa.scala:16
16/03/23 20:35:26 INFO DAGScheduler: Got job 0 (count at aaa.scala:16) with 2 output partitions (allowLocal=false)
16/03/23 20:35:26 INFO DAGScheduler: Final stage: Stage 0(count at aaa.scala:16)
16/03/23 20:35:26 INFO DAGScheduler: Parents of final stage: List()
16/03/23 20:35:27 INFO DAGScheduler: Missing parents: List()
16/03/23 20:35:27 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[2] at filter at aaa.scala:16), which has no missing parents
16/03/23 20:35:28 INFO MemoryStore: ensureFreeSpace(2856) called with curMem=215070, maxMem=255087083
16/03/23 20:35:28 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.8 KB, free 243.1 MB)
16/03/23 20:35:29 INFO MemoryStore: ensureFreeSpace(2068) called with curMem=217926, maxMem=255087083
16/03/23 20:35:29 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.0 KB, free 243.1 MB)
16/03/23 20:35:29 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on spark-master.dragon.org:42339 (size: 2.0 KB, free: 243.2 MB)
16/03/23 20:35:29 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
16/03/23 20:35:29 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:839
16/03/23 20:35:29 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MapPartitionsRDD[2] at filter at aaa.scala:16)
16/03/23 20:35:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/03/23 20:35:44 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
16/03/23 20:35:53 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@spark-master.dragon.org:52843/user/Executor#-985631278] with ID 0
16/03/23 20:35:54 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, spark-master.dragon.org, NODE_LOCAL, 1317 bytes)
16/03/23 20:35:57 INFO BlockManagerMasterActor: Registering block manager spark-master.dragon.org:40898 with 267.3 MB RAM, BlockManagerId(0, spark-master.dragon.org, 40898)
16/03/23 20:36:06 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on spark-master.dragon.org:40898 (size: 2.0 KB, free: 267.3 MB)
16/03/23 20:36:09 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, spark-master.dragon.org, NODE_LOCAL, 1317 bytes)
16/03/23 20:36:09 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, spark-master.dragon.org): java.lang.ClassNotFoundException: com.san.spark.basic.aaa$$anonfun$1
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1610)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1515)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1769)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
16/03/23 20:36:09 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 2, spark-master.dragon.org, NODE_LOCAL, 1317 bytes)
16/03/23 20:36:10 INFO TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) on executor spark-master.dragon.org: java.lang.ClassNotFoundException (com.san.spark.basic.aaa$$anonfun$1) [duplicate 1]
16/03/23 20:36:10 INFO TaskSetManager: Starting task 1.1 in stage 0.0 (TID 3, spark-master.dragon.org, NODE_LOCAL, 1317 bytes)
16/03/23 20:36:10 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 2) on executor spark-master.dragon.org: java.lang.ClassNotFoundException (com.san.spark.basic.aaa$$anonfun$1) [duplicate 2]
16/03/23 20:36:10 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 4, spark-master.dragon.org, NODE_LOCAL, 1317 bytes)
16/03/23 20:36:10 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 3) on executor spark-master.dragon.org: java.lang.ClassNotFoundException (com.san.spark.basic.aaa$$anonfun$1) [duplicate 3]
16/03/23 20:36:10 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID 5, spark-master.dragon.org, NODE_LOCAL, 1317 bytes)
16/03/23 20:36:10 INFO TaskSetManager: Lost task 0.2 in stage 0.0 (TID 4) on executor spark-master.dragon.org: java.lang.ClassNotFoundException (com.san.spark.basic.aaa$$anonfun$1) [duplicate 4]
16/03/23 20:36:10 INFO TaskSetManager: Starting task 0.3 in stage 0.0 (TID 6, spark-master.dragon.org, NODE_LOCAL, 1317 bytes)
16/03/23 20:36:10 INFO TaskSetManager: Lost task 1.2 in stage 0.0 (TID 5) on executor spark-master.dragon.org: java.lang.ClassNotFoundException (com.san.spark.basic.aaa$$anonfun$1) [duplicate 5]
16/03/23 20:36:11 INFO TaskSetManager: Starting task 1.3 in stage 0.0 (TID 7, spark-master.dragon.org, NODE_LOCAL, 1317 bytes)
16/03/23 20:36:11 INFO TaskSetManager: Lost task 0.3 in stage 0.0 (TID 6) on executor spark-master.dragon.org: java.lang.ClassNotFoundException (com.san.spark.basic.aaa$$anonfun$1) [duplicate 6]
16/03/23 20:36:11 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job
16/03/23 20:36:11 INFO TaskSetManager: Lost task 1.3 in stage 0.0 (TID 7) on executor spark-master.dragon.org: java.lang.ClassNotFoundException (com.san.spark.basic.aaa$$anonfun$1) [duplicate 7]
16/03/23 20:36:11 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/03/23 20:36:11 INFO TaskSchedulerImpl: Cancelling stage 0
16/03/23 20:36:11 INFO DAGScheduler: Job 0 failed: count at aaa.scala:16, took 45.782447 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, spark-master.dragon.org): java.lang.ClassNotFoundException: com.san.spark.basic.aaa$$anonfun$1
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1610)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1515)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1769)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Process finished with exit code 1
问题3:在IDEA中运行一个scala程序的时候出现了如下错误:
/opt/data02/modules/jdk1.7.0_25/bin/java -Didea.launcher.port=7535 -Didea.launcher.bin.path=/opt/data01/idea1411/bin -Dfile.encoding=UTF-8 -classpath /opt/data02/modules/jdk1.7.0_25/jre/lib/jfr.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/rt.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/plugin.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/resources.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/jfxrt.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/deploy.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/charsets.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/jce.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/javaws.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/jsse.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/management-agent.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/sunjce_provider.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/zipfs.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/localedata.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/sunec.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/dnsns.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/sunpkcs11.jar:/opt/data01/myTest/out/production/myTest:/opt/data02/modules/scala-2.10.4/lib/scala-swing.jar:/opt/data02/modules/scala-2.10.4/lib/scala-reflect.jar:/opt/data02/modules/scala-2.10.4/lib/scala-library.jar:/opt/data02/modules/scala-2.10.4/lib/scala-actors-migration.jar:/opt/data02/modules/scala-2.10.4/lib/scala-actors.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/datanucleus-api-jdo-3.2.6.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/datanucleus-core-3.2.10.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/datanucleus-rdbms-3.2.9.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/spark-1.3.0-yarn-shuffle.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/spark-assembly-1.3.0-hadoop2.6.0-cdh5.4.0.jar:/opt/data01/idea1411/lib/idea_rt.jar com.intellij.rt.execution.application.AppMain com.github.GroupByTest
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/03/27 02:18:09 INFO SparkContext: Running Spark version 1.3.0
Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.(SparkContext.scala:206)
at com.github.GroupByTest$.main(GroupByTest.scala:18)
at com.github.GroupByTest.main(GroupByTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Process finished with exit code 1
我运行的程序很简单:
package com.github
import java.util.Random
import org.apache.spark.{SparkConf, SparkContext}
/**
Created by hadoop on 3/27/16.
*/
object GroupByTest {
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("GroupBy Test")
var numMappers = if (args.length > 0) args(0).toInt else 2
var numKVPairs = if (args.length > 1) args(1).toInt else 1000
var valSize = if (args.length > 2) args(2).toInt else 1000
var numReducers = if (args.length > 3) args(3).toInt else numMappersval sc = new SparkContext(sparkConf)
val pairs1 = sc.parallelize(0 until numMappers, numMappers).flatMap { p =>
val ranGen = new Random
var arr1 = new Array(Int, Array[Byte])
for (i <- 0 until numKVPairs) {
val byteArr = new ArrayByte
ranGen.nextBytes(byteArr)
arr1(i) = (ranGen.nextInt(Int.MaxValue), byteArr)
}
arr1
}.cache()
// Enforce that everything has been calculated and in cache
pairs1.count()println(pairs1.groupByKey(numReducers).count())
sc.stop()
}
}
原因很简单,我没有设置VM Options ,因此我只需点击 run---run configurations---配置参数即可,截图如下:
即可
4.在终端下运行我的第一个spark streaming的时候,提示:

原因是我的9999端口没有打开,
查看端口:netstat -an | grep 9999
打开端口:nc -lp 9999 &
spark 1.3.0下的问题的更多相关文章
- Spark快速入门 - Spark 1.6.0
Spark快速入门 - Spark 1.6.0 转载请注明出处:http://www.cnblogs.com/BYRans/ 快速入门(Quick Start) 本文简单介绍了Spark的使用方式.首 ...
- Apache Spark 2.2.0 中文文档 - 概述 | ApacheCN
Spark 概述 Apache Spark 是一个快速的, 多用途的集群计算系统. 它提供了 Java, Scala, Python 和 R 的高级 API,以及一个支持通用的执行图计算的优化过的引擎 ...
- Apache Spark 2.2.0 中文文档 - Spark 编程指南 | ApacheCN
Spark 编程指南 概述 Spark 依赖 初始化 Spark 使用 Shell 弹性分布式数据集 (RDDs) 并行集合 外部 Datasets(数据集) RDD 操作 基础 传递 Functio ...
- Apache Spark 2.2.0 中文文档 - Spark Streaming 编程指南 | ApacheCN
Spark Streaming 编程指南 概述 一个入门示例 基础概念 依赖 初始化 StreamingContext Discretized Streams (DStreams)(离散化流) Inp ...
- Apache Spark 2.2.0 中文文档 - Spark SQL, DataFrames and Datasets Guide | ApacheCN
Spark SQL, DataFrames and Datasets Guide Overview SQL Datasets and DataFrames 开始入门 起始点: SparkSession ...
- Apache Spark 2.2.0 中文文档 - SparkR (R on Spark) | ApacheCN
SparkR (R on Spark) 概述 SparkDataFrame 启动: SparkSession 从 RStudio 来启动 创建 SparkDataFrames 从本地的 data fr ...
- Apache Spark 2.2.0 中文文档 - Submitting Applications | ApacheCN
Submitting Applications 在 script in Spark的 bin 目录中的spark-submit 脚本用与在集群上启动应用程序.它可以通过一个统一的接口使用所有 Spar ...
- [Spark性能调优] 第三章 : Spark 2.1.0 中 Sort-Based Shuffle 产生的内幕
本課主題 Sorted-Based Shuffle 的诞生和介绍 Shuffle 中六大令人费解的问题 Sorted-Based Shuffle 的排序和源码鉴赏 Shuffle 在运行时的内存管理 ...
- Spark RDD的默认分区数:(spark 2.1.0)
本文基于Spark 2.1.0版本 新手首先要明白几个配置: spark.default.parallelism:(默认的并发数) 如果配置文件spark-default.conf中没有显示的配置,则 ...
随机推荐
- Android SQLite 数据库详细介绍
Android SQLite 数据库详细介绍 我们在编写数据库应用软件时,需要考虑这样的问题:因为我们开发的软件可能会安装在很多用户的手机上,如果应用使用到了SQLite数据库,我们必须在用户初次使用 ...
- Angular学习资源汇集
网站 官网:www.angularjs.org 中文官网(社区):www.angularjs.cn 博客 流浪猫的窝 粉丝日记:AngularJS体验式编程系列文章 一介布衣 文档 GitCafe A ...
- ganglia监控hadoop2.0配置方法
ganglia监控hadoop2.0配置方法前提:hadoop2.0集群已安装ganglia监控工具第一步:Hadoop用户登录集群每一个节点,修改文件:vi /opt/hadoop-2.0.0-cd ...
- [转]Web性能监控自动化探索之路–初识WebPageTest
本文转自:http://www.webryan.net/2013/01/use-webpagetest-to-analyze-web-performance/ 无论是从Velocity 2012还是在 ...
- 使用迭代器遍历List的时候修改List报ConcurrentModificationException异常的解决办法
为了避免这种异常,我们可以使用CopyOnWriteArrayList来代替ArrayList,CopyOnWriteArrayList支持并发访问,所以同时进行迭代和修改是没有问题的.
- asp.net post方法;对象转json
[System.Web.Services.WebMethod()] public static string GetPoints(string userId) { st ...
- sql常识-SQL 通配符
在搜索数据库中的数据时,您可以使用 SQL 通配符. SQL 通配符 在搜索数据库中的数据时,SQL 通配符可以替代一个或多个字符. SQL 通配符必须与 LIKE 运算符一起使用. 在 SQL 中, ...
- vs如何新建自己工程的环境变量(局部)和 Windows系统(全局).
来源:http://blog.csdn.net/jtop0/article/details/7574139 在vs2008的Project->Property设置里经常会看到类似$ ...
- sublime text3 针对于前端开发必备的插件
1.emmet--前身Zen coding:HTML/CSS代码快速编写神器 2.jQuery Package for sublime Text:如果你离不开jQuery的话,这个必备-- 3.JS ...
- Quartz2D 图像处理
首先感谢一片枫叶总结出这么好的文章,文章出处:http://www.cnblogs.com/smileEvday/archive/2013/05/25/IOSImageEdit.html 本文将为大家 ...