spark 1.3.0下的问题
1.在spark SQL的一个test中
无论是registerAsTable还是registerTempTable 都会有问题,经过查找各种资料,采用如下的方式:
val sqlCon=new org.apache.spark.sql.SQLContext(sc)
import sqlContext.
val data=sc.textFile("hdfs://spark-master.dragon.org:8020/user/a.csv")
case class Person(cname:String,age:Int )
val people=data.map(.split(",")).map(p=>Person(p(0),p(1).toInt))
people.toDF.registerTempTable("people")
sql("select * from people").collect
2.有一个问题待解决下边的程序中,如果我在ide里边打包,然后用spark-submit运行,会运行的很正确,如果放在ide里边直接运行,会报错,但是改成local模式怎没有问题报错内容在最下边展示着,这个问题还一直没有解决,待续
package com.san.spark.basic
import org.apache.spark.{SparkContext, SparkConf}
/**
- Created by hadoop on 3/23/16.
*/
object aaa {
def main(args: Array[String]) {
val logFile = "hdfs://spark-master:8020/user/sparkS/a.csv" // Should be some file on your system
val conf = new SparkConf().setAppName("aa") //
.setMaster("spark://spark-master:7077")
//.setJars(List("/opt/data01/myTes/aaa.jar"))
// .setMaster("local")
val sc = new SparkContext(conf)
//sc.addJar("/opt/data01/myTes/aaa.jar")
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
sc.stop()
}
}
报错提示:
/opt/data02/modules/jdk1.7.0_25/bin/java -Didea.launcher.port=7532 -Didea.launcher.bin.path=/opt/data01/idea1411/bin -Dfile.encoding=UTF-8 -classpath /opt/data02/modules/jdk1.7.0_25/jre/lib/jfr.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/rt.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/plugin.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/resources.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/jfxrt.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/deploy.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/charsets.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/jce.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/javaws.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/jsse.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/management-agent.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/sunjce_provider.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/zipfs.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/localedata.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/sunec.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/dnsns.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/sunpkcs11.jar:/opt/data01/myTest/out/production/myTest:/opt/data02/modules/scala-2.10.4/lib/scala-swing.jar:/opt/data02/modules/scala-2.10.4/lib/scala-reflect.jar:/opt/data02/modules/scala-2.10.4/lib/scala-library.jar:/opt/data02/modules/scala-2.10.4/lib/scala-actors-migration.jar:/opt/data02/modules/scala-2.10.4/lib/scala-actors.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/datanucleus-api-jdo-3.2.6.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/datanucleus-core-3.2.10.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/datanucleus-rdbms-3.2.9.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/spark-1.3.0-yarn-shuffle.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/spark-assembly-1.3.0-hadoop2.6.0-cdh5.4.0.jar:/opt/data01/idea1411/lib/idea_rt.jar com.intellij.rt.execution.application.AppMain com.san.spark.basic.aaa
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/03/23 20:34:23 INFO SparkContext: Running Spark version 1.3.0
16/03/23 20:34:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/03/23 20:34:34 INFO SecurityManager: Changing view acls to: hadoop
16/03/23 20:34:34 INFO SecurityManager: Changing modify acls to: hadoop
16/03/23 20:34:34 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
16/03/23 20:34:42 INFO Slf4jLogger: Slf4jLogger started
16/03/23 20:34:43 INFO Remoting: Starting remoting
16/03/23 20:34:45 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@spark-master.dragon.org:42809]
16/03/23 20:34:45 INFO Utils: Successfully started service 'sparkDriver' on port 42809.
16/03/23 20:34:45 INFO SparkEnv: Registering MapOutputTracker
16/03/23 20:34:45 INFO SparkEnv: Registering BlockManagerMaster
16/03/23 20:34:46 INFO DiskBlockManager: Created local directory at /tmp/spark-a5f85f13-60ff-4c6b-b304-6c4875c2050c/blockmgr-f4419025-ba24-4954-ad41-bcde60f2e30f
16/03/23 20:34:46 INFO MemoryStore: MemoryStore started with capacity 243.3 MB
16/03/23 20:34:47 INFO HttpFileServer: HTTP File server directory is /tmp/spark-3fa0b690-a8b2-4325-87db-15c1194d8edf/httpd-97b84871-cd63-4a64-ba36-b7ffdc6aecf7
16/03/23 20:34:47 INFO HttpServer: Starting HTTP Server
16/03/23 20:34:48 INFO Server: jetty-8.y.z-SNAPSHOT
16/03/23 20:34:48 INFO AbstractConnector: Started SocketConnector@0.0.0.0:46460
16/03/23 20:34:48 INFO Utils: Successfully started service 'HTTP file server' on port 46460.
16/03/23 20:34:48 INFO SparkEnv: Registering OutputCommitCoordinator
16/03/23 20:34:49 INFO Server: jetty-8.y.z-SNAPSHOT
16/03/23 20:34:49 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
16/03/23 20:34:49 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/03/23 20:34:49 INFO SparkUI: Started SparkUI at http://spark-master.dragon.org:4040
16/03/23 20:34:50 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@spark-master.dragon.org:7077/user/Master...
16/03/23 20:34:53 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160323203453-0006
16/03/23 20:34:53 INFO AppClient$ClientActor: Executor added: app-20160323203453-0006/0 on worker-20160323192033-spark-master.dragon.org-7078 (spark-master.dragon.org:7078) with 1 cores
16/03/23 20:34:53 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160323203453-0006/0 on hostPort spark-master.dragon.org:7078 with 1 cores, 512.0 MB RAM
16/03/23 20:34:53 INFO AppClient$ClientActor: Executor updated: app-20160323203453-0006/0 is now RUNNING
16/03/23 20:34:54 INFO AppClient$ClientActor: Executor updated: app-20160323203453-0006/0 is now LOADING
16/03/23 20:34:56 INFO NettyBlockTransferService: Server created on 42339
16/03/23 20:34:56 INFO BlockManagerMaster: Trying to register BlockManager
16/03/23 20:34:56 INFO BlockManagerMasterActor: Registering block manager spark-master.dragon.org:42339 with 243.3 MB RAM, BlockManagerId(, spark-master.dragon.org, 42339)
16/03/23 20:34:56 INFO BlockManagerMaster: Registered BlockManager
16/03/23 20:35:00 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
16/03/23 20:35:04 INFO MemoryStore: ensureFreeSpace(188959) called with curMem=0, maxMem=255087083
16/03/23 20:35:04 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 184.5 KB, free 243.1 MB)
16/03/23 20:35:05 INFO MemoryStore: ensureFreeSpace(26111) called with curMem=188959, maxMem=255087083
16/03/23 20:35:05 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 25.5 KB, free 243.1 MB)
16/03/23 20:35:05 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on spark-master.dragon.org:42339 (size: 25.5 KB, free: 243.2 MB)
16/03/23 20:35:05 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
16/03/23 20:35:05 INFO SparkContext: Created broadcast 0 from textFile at aaa.scala:15
16/03/23 20:35:23 INFO FileInputFormat: Total input paths to process : 1
16/03/23 20:35:25 INFO SparkContext: Starting job: count at aaa.scala:16
16/03/23 20:35:26 INFO DAGScheduler: Got job 0 (count at aaa.scala:16) with 2 output partitions (allowLocal=false)
16/03/23 20:35:26 INFO DAGScheduler: Final stage: Stage 0(count at aaa.scala:16)
16/03/23 20:35:26 INFO DAGScheduler: Parents of final stage: List()
16/03/23 20:35:27 INFO DAGScheduler: Missing parents: List()
16/03/23 20:35:27 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[2] at filter at aaa.scala:16), which has no missing parents
16/03/23 20:35:28 INFO MemoryStore: ensureFreeSpace(2856) called with curMem=215070, maxMem=255087083
16/03/23 20:35:28 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.8 KB, free 243.1 MB)
16/03/23 20:35:29 INFO MemoryStore: ensureFreeSpace(2068) called with curMem=217926, maxMem=255087083
16/03/23 20:35:29 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.0 KB, free 243.1 MB)
16/03/23 20:35:29 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on spark-master.dragon.org:42339 (size: 2.0 KB, free: 243.2 MB)
16/03/23 20:35:29 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
16/03/23 20:35:29 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:839
16/03/23 20:35:29 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MapPartitionsRDD[2] at filter at aaa.scala:16)
16/03/23 20:35:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/03/23 20:35:44 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
16/03/23 20:35:53 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@spark-master.dragon.org:52843/user/Executor#-985631278] with ID 0
16/03/23 20:35:54 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, spark-master.dragon.org, NODE_LOCAL, 1317 bytes)
16/03/23 20:35:57 INFO BlockManagerMasterActor: Registering block manager spark-master.dragon.org:40898 with 267.3 MB RAM, BlockManagerId(0, spark-master.dragon.org, 40898)
16/03/23 20:36:06 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on spark-master.dragon.org:40898 (size: 2.0 KB, free: 267.3 MB)
16/03/23 20:36:09 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, spark-master.dragon.org, NODE_LOCAL, 1317 bytes)
16/03/23 20:36:09 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, spark-master.dragon.org): java.lang.ClassNotFoundException: com.san.spark.basic.aaa$$anonfun$1
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1610)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1515)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1769)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
16/03/23 20:36:09 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 2, spark-master.dragon.org, NODE_LOCAL, 1317 bytes)
16/03/23 20:36:10 INFO TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) on executor spark-master.dragon.org: java.lang.ClassNotFoundException (com.san.spark.basic.aaa$$anonfun$1) [duplicate 1]
16/03/23 20:36:10 INFO TaskSetManager: Starting task 1.1 in stage 0.0 (TID 3, spark-master.dragon.org, NODE_LOCAL, 1317 bytes)
16/03/23 20:36:10 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 2) on executor spark-master.dragon.org: java.lang.ClassNotFoundException (com.san.spark.basic.aaa$$anonfun$1) [duplicate 2]
16/03/23 20:36:10 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 4, spark-master.dragon.org, NODE_LOCAL, 1317 bytes)
16/03/23 20:36:10 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 3) on executor spark-master.dragon.org: java.lang.ClassNotFoundException (com.san.spark.basic.aaa$$anonfun$1) [duplicate 3]
16/03/23 20:36:10 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID 5, spark-master.dragon.org, NODE_LOCAL, 1317 bytes)
16/03/23 20:36:10 INFO TaskSetManager: Lost task 0.2 in stage 0.0 (TID 4) on executor spark-master.dragon.org: java.lang.ClassNotFoundException (com.san.spark.basic.aaa$$anonfun$1) [duplicate 4]
16/03/23 20:36:10 INFO TaskSetManager: Starting task 0.3 in stage 0.0 (TID 6, spark-master.dragon.org, NODE_LOCAL, 1317 bytes)
16/03/23 20:36:10 INFO TaskSetManager: Lost task 1.2 in stage 0.0 (TID 5) on executor spark-master.dragon.org: java.lang.ClassNotFoundException (com.san.spark.basic.aaa$$anonfun$1) [duplicate 5]
16/03/23 20:36:11 INFO TaskSetManager: Starting task 1.3 in stage 0.0 (TID 7, spark-master.dragon.org, NODE_LOCAL, 1317 bytes)
16/03/23 20:36:11 INFO TaskSetManager: Lost task 0.3 in stage 0.0 (TID 6) on executor spark-master.dragon.org: java.lang.ClassNotFoundException (com.san.spark.basic.aaa$$anonfun$1) [duplicate 6]
16/03/23 20:36:11 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job
16/03/23 20:36:11 INFO TaskSetManager: Lost task 1.3 in stage 0.0 (TID 7) on executor spark-master.dragon.org: java.lang.ClassNotFoundException (com.san.spark.basic.aaa$$anonfun$1) [duplicate 7]
16/03/23 20:36:11 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/03/23 20:36:11 INFO TaskSchedulerImpl: Cancelling stage 0
16/03/23 20:36:11 INFO DAGScheduler: Job 0 failed: count at aaa.scala:16, took 45.782447 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, spark-master.dragon.org): java.lang.ClassNotFoundException: com.san.spark.basic.aaa$$anonfun$1
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1610)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1515)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1769)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Process finished with exit code 1
问题3:在IDEA中运行一个scala程序的时候出现了如下错误:
/opt/data02/modules/jdk1.7.0_25/bin/java -Didea.launcher.port=7535 -Didea.launcher.bin.path=/opt/data01/idea1411/bin -Dfile.encoding=UTF-8 -classpath /opt/data02/modules/jdk1.7.0_25/jre/lib/jfr.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/rt.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/plugin.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/resources.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/jfxrt.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/deploy.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/charsets.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/jce.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/javaws.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/jsse.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/management-agent.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/sunjce_provider.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/zipfs.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/localedata.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/sunec.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/dnsns.jar:/opt/data02/modules/jdk1.7.0_25/jre/lib/ext/sunpkcs11.jar:/opt/data01/myTest/out/production/myTest:/opt/data02/modules/scala-2.10.4/lib/scala-swing.jar:/opt/data02/modules/scala-2.10.4/lib/scala-reflect.jar:/opt/data02/modules/scala-2.10.4/lib/scala-library.jar:/opt/data02/modules/scala-2.10.4/lib/scala-actors-migration.jar:/opt/data02/modules/scala-2.10.4/lib/scala-actors.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/datanucleus-api-jdo-3.2.6.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/datanucleus-core-3.2.10.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/datanucleus-rdbms-3.2.9.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/spark-1.3.0-yarn-shuffle.jar:/opt/data02/modules/spark-1.3.0-bin-2.6.0-cdh5.4.0/lib/spark-assembly-1.3.0-hadoop2.6.0-cdh5.4.0.jar:/opt/data01/idea1411/lib/idea_rt.jar com.intellij.rt.execution.application.AppMain com.github.GroupByTest
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/03/27 02:18:09 INFO SparkContext: Running Spark version 1.3.0
Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.(SparkContext.scala:206)
at com.github.GroupByTest$.main(GroupByTest.scala:18)
at com.github.GroupByTest.main(GroupByTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Process finished with exit code 1
我运行的程序很简单:
package com.github
import java.util.Random
import org.apache.spark.{SparkConf, SparkContext}
/**
Created by hadoop on 3/27/16.
*/
object GroupByTest {
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("GroupBy Test")
var numMappers = if (args.length > 0) args(0).toInt else 2
var numKVPairs = if (args.length > 1) args(1).toInt else 1000
var valSize = if (args.length > 2) args(2).toInt else 1000
var numReducers = if (args.length > 3) args(3).toInt else numMappersval sc = new SparkContext(sparkConf)
val pairs1 = sc.parallelize(0 until numMappers, numMappers).flatMap { p =>
val ranGen = new Random
var arr1 = new Array(Int, Array[Byte])
for (i <- 0 until numKVPairs) {
val byteArr = new ArrayByte
ranGen.nextBytes(byteArr)
arr1(i) = (ranGen.nextInt(Int.MaxValue), byteArr)
}
arr1
}.cache()
// Enforce that everything has been calculated and in cache
pairs1.count()println(pairs1.groupByKey(numReducers).count())
sc.stop()
}
}
原因很简单,我没有设置VM Options ,因此我只需点击 run---run configurations---配置参数即可,截图如下:
即可
4.在终端下运行我的第一个spark streaming的时候,提示:
原因是我的9999端口没有打开,
查看端口:netstat -an | grep 9999
打开端口:nc -lp 9999 &
spark 1.3.0下的问题的更多相关文章
- Spark快速入门 - Spark 1.6.0
Spark快速入门 - Spark 1.6.0 转载请注明出处:http://www.cnblogs.com/BYRans/ 快速入门(Quick Start) 本文简单介绍了Spark的使用方式.首 ...
- Apache Spark 2.2.0 中文文档 - 概述 | ApacheCN
Spark 概述 Apache Spark 是一个快速的, 多用途的集群计算系统. 它提供了 Java, Scala, Python 和 R 的高级 API,以及一个支持通用的执行图计算的优化过的引擎 ...
- Apache Spark 2.2.0 中文文档 - Spark 编程指南 | ApacheCN
Spark 编程指南 概述 Spark 依赖 初始化 Spark 使用 Shell 弹性分布式数据集 (RDDs) 并行集合 外部 Datasets(数据集) RDD 操作 基础 传递 Functio ...
- Apache Spark 2.2.0 中文文档 - Spark Streaming 编程指南 | ApacheCN
Spark Streaming 编程指南 概述 一个入门示例 基础概念 依赖 初始化 StreamingContext Discretized Streams (DStreams)(离散化流) Inp ...
- Apache Spark 2.2.0 中文文档 - Spark SQL, DataFrames and Datasets Guide | ApacheCN
Spark SQL, DataFrames and Datasets Guide Overview SQL Datasets and DataFrames 开始入门 起始点: SparkSession ...
- Apache Spark 2.2.0 中文文档 - SparkR (R on Spark) | ApacheCN
SparkR (R on Spark) 概述 SparkDataFrame 启动: SparkSession 从 RStudio 来启动 创建 SparkDataFrames 从本地的 data fr ...
- Apache Spark 2.2.0 中文文档 - Submitting Applications | ApacheCN
Submitting Applications 在 script in Spark的 bin 目录中的spark-submit 脚本用与在集群上启动应用程序.它可以通过一个统一的接口使用所有 Spar ...
- [Spark性能调优] 第三章 : Spark 2.1.0 中 Sort-Based Shuffle 产生的内幕
本課主題 Sorted-Based Shuffle 的诞生和介绍 Shuffle 中六大令人费解的问题 Sorted-Based Shuffle 的排序和源码鉴赏 Shuffle 在运行时的内存管理 ...
- Spark RDD的默认分区数:(spark 2.1.0)
本文基于Spark 2.1.0版本 新手首先要明白几个配置: spark.default.parallelism:(默认的并发数) 如果配置文件spark-default.conf中没有显示的配置,则 ...
随机推荐
- Android广播接收者应用(电话拦截器)
一.电话拦截器应用说明 在我们输入完电话号码并拨打电话时,系统会发出一个有序广播(action="android.intent.action.NEW_OUTGOING_CALL") ...
- SQlServer2008 之 定时执行sql语句作业的制定
1.打开[SQL Server Management Studio],在[对象资源管理器]列表中选择[SQL Server 代理]: 2.鼠标右击[SQL Server 代理],选择[启动(S)],如 ...
- Android进阶笔记06:Android 实现扫描二维码实现网页登录
一. 扫描二维码登录的实现机制: 详细流程图: (1)PC端打开网页(显示出二维码),这时候会保存对应的randnumber(比如:12345678). (2)Android客户端扫码登录,Andro ...
- linux用户及组管理
useradd 添加用户 passwd 修改用户密码 userdel 删除用户,默认不删除用户主目录和email,如果想删除可加 –r 参数 groupadd 添 ...
- SQL Server select into/bulkcopy用法
select into/bulkcopy用法 一.背景 如何提升批量插入效率,各位看官,瞧好了! exec sys.sp_dboption '数据库名','select into/bulkcopy', ...
- 【推理,贪心】UVa 1319 - Maximum
看到了大神的代码.理解了好久...真是差距. 题意:给出m, p, a, b,然后xi满足已下两个公式, 求 xp1 + xp2 +...+ xpm 的最大值. 1.-1/sqrt(a) <= ...
- js中关于事件处理函数名后面是否带括号的问题
今天总结一个关于事件处理程序的小细节.首先回顾一下事件处理的一些概念. JS中的事件处理(事件绑定)就是让某种或某些事件触发某些活动.有两种常见的形式,分别是DOM Level 0 和DOM Leve ...
- Mac环境下svn的使用(转载)
在Windows环境中,我们一般使用TortoiseSVN来搭建svn环境.在Mac环境下,由于Mac自带了svn的服务器端和客户端功能,所以我们可以在不装任何第三方软件的前提下使用svn功能,不过还 ...
- Asp.net 后台调用js方法(转)
1. 用Response.Write方法 代码如下: Response.Write("<script type='text/javascript'>alert("XXX ...
- 如何找出MySQL数据库中的低效SQL语句
面对业务的迅猛发展,DBA的一项重要工作就是及时发现数据库中的低效SQL语句,有的可以立刻着手解决(比如缺少合适的索引),有的需要尽快反馈给开发人员进行修改. MySQL数据库有几个配置选项可以帮助我 ...