Intro

  • 这篇是对一个Spark (Streaming)作业的log进行分析。用来加深对Spark application运行过程,优化空间的各种理解。

Here to Start

  • 从我这个初学者写得一个Spark Streaming程序开始...
  • package com.wttttt.spark
    
    /**
    * Created with IntelliJ IDEA.
    * Description:
    * Author: wttttt
    * Github: https://github.com/wttttt-wang/hadoop_inaction
    * Date: 2017-05-19
    * Time: 09:56
    */
    import java.util.regex.Pattern import org.apache.spark.SparkConf
    import org.apache.spark.streaming.{Milliseconds, StreamingContext}
    import org.slf4j.LoggerFactory import scala.collection.mutable object LocalTest {
    val logger = LoggerFactory.getLogger("LocalTest")
    def main(args: Array[String]) { val batchInterval = Milliseconds(10000)
    val slideInterval = Milliseconds(5000) val conf = new SparkConf()
    .setMaster("local[2]")
    .setAppName("LocalTest")
    // WARN StreamingContext: spark.master should be set as local[n], n > 1 in local mode if you have receivers to get data,
    // otherwise Spark jobs will not get resources to process the received data.
    val sc = new StreamingContext(conf, Milliseconds(5000))
    sc.checkpoint("flumeCheckpoint/") val stream = sc.socketTextStream("localhost", 9998) val counts = stream.mapPartitions{ events =>
    val pattern = Pattern.compile("\\?Input=[^\\s]*\\s")
    val map = new mutable.HashMap[String, Int]()
    logger.info("Handling events, events is empty: " + events.isEmpty)
    while (events.hasNext){ // par is an Iterator!!!
    val line = events.next()
    val m = pattern.matcher(line)
    if (m.find()) {
    val words = line.substring(m.start(), m.end()).split("=")(1).toLowerCase()
    logger.info(s"Processing words $words")
    map.put(words, map.getOrElse(words, 0) + 1)
    }
    }
    map.iterator
    } val window = counts.reduceByKeyAndWindow(_+_, _-_, batchInterval, slideInterval)
    // window.print() // transform和它的变体trnasformWith运行在DStream上任意的RDD-to-RDD函数;
    // 可以用来使用那些不包含在DStrema API中RDD操作
    val sorted = window.transform(rdd =>{
    val sortRdd = rdd.map(t => (t._2, t._1)).sortByKey(false).map(t => (t._2, t._1))
    val more = sortRdd.take(2)
    more.foreach(println)
    sortRdd
    }) sorted.print() sc.start()
    sc.awaitTermination()
    }
    }
  • 看运行log (摘取的一部分)
  • 17/05/20 10:20:13 INFO Utils: Successfully started service 'sparkDriver' on port 52300.
    
    17/05/20 10:20:17 INFO ReducedWindowedDStream: Checkpoint interval automatically set to 10000 ms
    
    17/05/20 10:20:17 INFO SocketInputDStream: Slide time = 5000 ms
    
    17/05/20 10:20:17 INFO ReceiverTracker: Receiver 0 started
    
    17/05/20 10:20:17 INFO DAGScheduler: Got job 0 (start at LocalTest.scala:66) with 1 output partitions
    
    17/05/20 10:20:17 INFO JobScheduler: Started JobScheduler
    
    17/05/20 10:20:17 INFO StreamingContext: StreamingContext started
    
    17/05/20 10:20:20 INFO SparkContext: Starting job: sortByKey at LocalTest.scala:58
    
    17/05/20 10:20:20 INFO DAGScheduler: Registering RDD 2 (mapPartitions at LocalTest.scala:36)
    
    17/05/20 10:20:20 INFO DAGScheduler: Registering RDD 4 (reduceByKeyAndWindow at LocalTest.scala:52)
    
    17/05/20 10:20:20 INFO DAGScheduler: Got job 1 (sortByKey at LocalTest.scala:58) with 2 output partitions
    
    17/05/20 10:20:20 INFO DAGScheduler: Final stage: ResultStage 3 (sortByKey at LocalTest.scala:58)
    
    17/05/20 10:20:20 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 1, ShuffleMapStage 2)
    
    17/05/20 10:20:20 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 2)
    
    17/05/20 10:20:20 INFO DAGScheduler: Submitting ShuffleMapStage 2 (ParallelCollectionRDD[4] at reduceByKeyAndWindow at LocalTest.scala:52), which has no missing parents
    
    17/05/20 10:20:20 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 2 (ParallelCollectionRDD[4] at reduceByKeyAndWindow at LocalTest.scala:52)
    
    17/05/20 10:20:20 INFO DAGScheduler: Job 1 finished: sortByKey at LocalTest.scala:58, took 0.314005 s
    
    17/05/20 10:20:20 INFO SparkContext: Starting job: take at LocalTest.scala:59
    
    17/05/20 10:20:20 INFO DAGScheduler: Registering RDD 7 (map at LocalTest.scala:58)
    
    17/05/20 10:20:20 INFO DAGScheduler: Got job 2 (take at LocalTest.scala:59) with 1 output partitions
    
    17/05/20 10:20:20 INFO DAGScheduler: Final stage: ResultStage 7 (take at LocalTest.scala:59)
    
    17/05/20 10:20:20 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 6)
    
    17/05/20 10:20:20 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 6)
    
    17/05/20 10:20:20 INFO DAGScheduler: Submitting ShuffleMapStage 6 (MapPartitionsRDD[7] at map at LocalTest.scala:58), which has no missing parents
    
    17/05/20 10:20:20 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 6 (MapPartitionsRDD[7] at map at LocalTest.scala:58)
    
    17/05/20 10:20:20 INFO TaskSchedulerImpl: Adding task set 6.0 with 2 tasks
    
    17/05/20 10:20:20 INFO TaskSetManager: Starting task 0.0 in stage 6.0 (TID 5, localhost, executor driver, partition 0, PROCESS_LOCAL, 6109 bytes)
    
    17/05/20 10:20:20 INFO Executor: Running task 0.0 in stage 6.0 (TID 5)
    
    17/05/20 10:20:20 INFO BlockManager: Found block rdd60 locally
    
    17/05/20 10:20:20 INFO Executor: Finished task 0.0 in stage 6.0 (TID 5). 1062 bytes result sent to driver
    
    17/05/20 10:20:20 INFO TaskSetManager: Starting task 1.0 in stage 6.0 (TID 6, localhost, executor driver, partition 1, PROCESS_LOCAL, 6109 bytes)
    
    17/05/20 10:20:20 INFO Executor: Running task 1.0 in stage 6.0 (TID 6)
    
    17/05/20 10:20:20 INFO TaskSetManager: Finished task 0.0 in stage 6.0 (TID 5) in 14 ms on localhost (executor driver) (1/2)
    
    17/05/20 10:20:20 INFO BlockManager: Found block rdd61 locally
    
    17/05/20 10:20:20 INFO Executor: Finished task 1.0 in stage 6.0 (TID 6). 1062 bytes result sent to driver
    
    17/05/20 10:20:20 INFO TaskSetManager: Finished task 1.0 in stage 6.0 (TID 6) in 9 ms on localhost (executor driver) (2/2)
    
    17/05/20 10:20:20 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool 
    
    17/05/20 10:20:20 INFO DAGScheduler: ShuffleMapStage 6 (map at LocalTest.scala:58) finished in 0.024 s
    
    17/05/20 10:20:20 INFO DAGScheduler: looking for newly runnable stages
    
    17/05/20 10:20:20 INFO DAGScheduler: running: Set(ResultStage 0)
    
    17/05/20 10:20:20 INFO DAGScheduler: waiting: Set(ResultStage 7)
    
    17/05/20 10:20:20 INFO DAGScheduler: failed: Set()
    
    17/05/20 10:20:20 INFO DAGScheduler: Submitting ResultStage 7 (MapPartitionsRDD[11] at map at LocalTest.scala:58), which has no missing parents
    
    17/05/20 10:20:20 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 4.0 KB, free 912.2 MB)
    
    17/05/20 10:20:20 INFO MemoryStore: Block broadcast4piece0 stored as bytes in memory (estimated size 2.4 KB, free 912.2 MB)
    
    17/05/20 10:20:20 INFO BlockManagerInfo: Added broadcast4piece0 in memory on 192.168.6.90:52302 (size: 2.4 KB, free: 912.3 MB)
    
    17/05/20 10:20:20 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:996
    
    17/05/20 10:20:20 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 7 (MapPartitionsRDD[11] at map at LocalTest.scala:58)
    
    17/05/20 10:20:20 INFO TaskSchedulerImpl: Adding task set 7.0 with 1 tasks
    
    17/05/20 10:20:20 INFO TaskSetManager: Starting task 0.0 in stage 7.0 (TID 7, localhost, executor driver, partition 0, PROCESS_LOCAL, 5903 bytes)
    
    17/05/20 10:20:20 INFO Executor: Running task 0.0 in stage 7.0 (TID 7)
    
    17/05/20 10:20:20 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 2 blocks
    
    17/05/20 10:20:20 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
    
    17/05/20 10:20:20 INFO Executor: Finished task 0.0 in stage 7.0 (TID 7). 1718 bytes result sent to driver
    
    17/05/20 10:20:20 INFO TaskSetManager: Finished task 0.0 in stage 7.0 (TID 7) in 146 ms on localhost (executor driver) (1/1)
    
    17/05/20 10:20:20 INFO TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks have all completed, from pool 
    
    17/05/20 10:20:20 INFO DAGScheduler: Job 2 finished: take at LocalTest.scala:59, took 0.198460 s
    
    17/05/20 10:20:20 INFO SparkContext: Starting job: print at LocalTest.scala:64
    
    17/05/20 10:20:20 INFO DAGScheduler: Got job 3 (print at LocalTest.scala:64) with 1 output partitions
    
    17/05/20 10:20:20 INFO DAGScheduler: Final stage: ResultStage 11 (print at LocalTest.scala:64)
    
    17/05/20 10:20:20 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 10)
    
    17/05/20 10:20:20 INFO CheckpointWriter: Submitted checkpoint of time 1495246820000 ms to writer queue
    
    17/05/20 10:20:20 INFO DAGScheduler: Missing parents: List()
    
    17/05/20 10:20:20 INFO DAGScheduler: Submitting ResultStage 11 (MapPartitionsRDD[11] at map at LocalTest.scala:58), which has no missing parents
    
    17/05/20 10:20:20 INFO CheckpointWriter: Saving checkpoint for time 1495246820000 ms to file 'file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246820000'
    
    17/05/20 10:20:20 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 4.0 KB, free 912.2 MB)
    
    17/05/20 10:20:20 INFO MemoryStore: Block broadcast5piece0 stored as bytes in memory (estimated size 2.4 KB, free 912.2 MB)
    
    17/05/20 10:20:20 INFO BlockManagerInfo: Added broadcast5piece0 in memory on 192.168.6.90:52302 (size: 2.4 KB, free: 912.3 MB)
    
    17/05/20 10:20:20 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:996
    
    17/05/20 10:20:20 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 11 (MapPartitionsRDD[11] at map at LocalTest.scala:58)
    
    17/05/20 10:20:20 INFO TaskSchedulerImpl: Adding task set 11.0 with 1 tasks
    
    17/05/20 10:20:20 INFO TaskSetManager: Starting task 0.0 in stage 11.0 (TID 8, localhost, executor driver, partition 0, PROCESS_LOCAL, 6672 bytes)
    
    17/05/20 10:20:20 INFO Executor: Running task 0.0 in stage 11.0 (TID 8)
    
    17/05/20 10:20:20 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 2 blocks
    
    17/05/20 10:20:20 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
    
    17/05/20 10:20:20 INFO Executor: Finished task 0.0 in stage 11.0 (TID 8). 1718 bytes result sent to driver
    
    17/05/20 10:20:20 INFO TaskSetManager: Finished task 0.0 in stage 11.0 (TID 8) in 39 ms on localhost (executor driver) (1/1)
    
    17/05/20 10:20:20 INFO TaskSchedulerImpl: Removed TaskSet 11.0, whose tasks have all completed, from pool 
    
    17/05/20 10:20:20 INFO DAGScheduler: ResultStage 11 (print at LocalTest.scala:64) finished in 0.041 s
    
    17/05/20 10:20:20 INFO DAGScheduler: Job 3 finished: print at LocalTest.scala:64, took 0.053225 s
    
    Time: 1495246820000 ms
    
    17/05/20 10:20:20 INFO JobScheduler: Finished job streaming job 1495246820000 ms.0 from job set of time 1495246820000 ms
    
    17/05/20 10:20:20 INFO JobScheduler: Total delay: 0.954 s for time 1495246820000 ms (execution: 0.064 s)
    
    17/05/20 10:20:21 INFO CheckpointWriter: Deleting file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246110000.bk
    
    17/05/20 10:20:21 INFO JobGenerator: Checkpointing graph for time 1495246820000 ms
    
    17/05/20 10:20:21 INFO DStreamGraph: Updating checkpoint data for time 1495246820000 ms
    
    17/05/20 10:20:21 INFO CheckpointWriter: Checkpoint for time 1495246820000 ms saved to file 'file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246820000', took 3942 bytes and 125 ms
    
    17/05/20 10:20:21 INFO DStreamGraph: Updated checkpoint data for time 1495246820000 ms
    
    17/05/20 10:20:21 INFO CheckpointWriter: Submitted checkpoint of time 1495246820000 ms to writer queue
    
    17/05/20 10:20:21 INFO CheckpointWriter: Saving checkpoint for time 1495246820000 ms to file 'file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246820000'
    
    17/05/20 10:20:21 INFO BlockManagerInfo: Removed broadcast4piece0 on 192.168.6.90:52302 in memory (size: 2.4 KB, free: 912.3 MB)
    
    17/05/20 10:20:21 INFO BlockManagerInfo: Removed broadcast1piece0 on 192.168.6.90:52302 in memory (size: 1266.0 B, free: 912.3 MB)
    
    17/05/20 10:20:21 INFO BlockManagerInfo: Removed broadcast2piece0 on 192.168.6.90:52302 in memory (size: 2.7 KB, free: 912.3 MB)
    
    17/05/20 10:20:21 INFO BlockManagerInfo: Removed broadcast3piece0 on 192.168.6.90:52302 in memory (size: 2.9 KB, free: 912.3 MB)
    
    17/05/20 10:20:21 INFO BlockManagerInfo: Removed broadcast5piece0 on 192.168.6.90:52302 in memory (size: 2.4 KB, free: 912.3 MB)
    
    17/05/20 10:20:21 INFO CheckpointWriter: Deleting file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246110000
    
    17/05/20 10:20:21 INFO CheckpointWriter: Checkpoint for time 1495246820000 ms saved to file 'file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246820000', took 3938 bytes and 48 ms
    
    17/05/20 10:20:21 INFO DStreamGraph: Clearing checkpoint data for time 1495246820000 ms
    
    17/05/20 10:20:21 INFO DStreamGraph: Cleared checkpoint data for time 1495246820000 ms
    
    17/05/20 10:20:21 INFO ReceivedBlockTracker: Deleting batches: 
    
    17/05/20 10:20:21 INFO FileBasedWriteAheadLog_ReceivedBlockTracker: Attempting to clear 2 old log files in file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/receivedBlockMetadata older than 1495246790000: file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/receivedBlockMetadata/log-1495246055009-1495246115009
    
    file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/receivedBlockMetadata/log-1495246115167-1495246175167
    
    17/05/20 10:20:21 INFO InputInfoTracker: remove old batch metadata: 
    
    17/05/20 10:20:21 INFO FileBasedWriteAheadLog_ReceivedBlockTracker: Cleared log files in file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/receivedBlockMetadata older than 1495246790000
    
    17/05/20 10:20:21 INFO FileBasedWriteAheadLog_ReceivedBlockTracker: Cleared log files in file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/receivedBlockMetadata older than 1495246790000
    
    17/05/20 10:20:21 INFO MemoryStore: Block input-0-1495246821600 stored as bytes in memory (estimated size 284.0 B, free 912.2 MB)
    
    17/05/20 10:20:21 INFO BlockManagerInfo: Added input-0-1495246821600 in memory on 192.168.6.90:52302 (size: 284.0 B, free: 912.3 MB)
    
    17/05/20 10:20:21 WARN RandomBlockReplicationPolicy: Expecting 1 replicas with only 0 peer/s.
    
    17/05/20 10:20:21 WARN BlockManager: Block input-0-1495246821600 replicated to only 0 peer(s) instead of 1 peers
    
    17/05/20 10:20:21 INFO BlockGenerator: Pushed block input-0-1495246821600
    
    17/05/20 10:20:24 INFO MemoryStore: Block input-0-1495246824600 stored as bytes in memory (estimated size 284.0 B, free 912.2 MB)
    
    17/05/20 10:20:24 INFO BlockManagerInfo: Added input-0-1495246824600 in memory on 192.168.6.90:52302 (size: 284.0 B, free: 912.3 MB)
    
    17/05/20 10:20:24 WARN RandomBlockReplicationPolicy: Expecting 1 replicas with only 0 peer/s.
    
    17/05/20 10:20:24 WARN BlockManager: Block input-0-1495246824600 replicated to only 0 peer(s) instead of 1 peers
    
    17/05/20 10:20:24 INFO BlockGenerator: Pushed block input-0-1495246824600
    
    17/05/20 10:20:25 INFO ShuffledDStream: Slicing from 1495246815000 ms to 1495246815000 ms (aligned to 1495246815000 ms and 1495246815000 ms)
    
    17/05/20 10:20:25 INFO ShuffledDStream: Time 1495246815000 ms is invalid as zeroTime is 1495246815000 ms , slideDuration is 5000 ms and difference is 0 ms
    
    17/05/20 10:20:25 INFO ShuffledDStream: Slicing from 1495246825000 ms to 1495246825000 ms (aligned to 1495246825000 ms and 1495246825000 ms)
    
    17/05/20 10:20:25 INFO ReducedWindowedDStream: Marking RDD 16 for time 1495246825000 ms for checkpointing
    
    17/05/20 10:20:25 INFO SparkContext: Starting job: sortByKey at LocalTest.scala:58
    
    17/05/20 10:20:25 INFO DAGScheduler: Registering RDD 13 (mapPartitions at LocalTest.scala:36)
    
    17/05/20 10:20:25 INFO DAGScheduler: Got job 4 (sortByKey at LocalTest.scala:58) with 2 output partitions
    
    17/05/20 10:20:25 INFO DAGScheduler: Final stage: ResultStage 15 (sortByKey at LocalTest.scala:58)
    
    17/05/20 10:20:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 12, ShuffleMapStage 13, ShuffleMapStage 14)
    
    17/05/20 10:20:25 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 13)
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting ShuffleMapStage 13 (MapPartitionsRDD[13] at mapPartitions at LocalTest.scala:36), which has no missing parents
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast_6 stored as values in memory (estimated size 2.6 KB, free 912.2 MB)
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast6piece0 stored as bytes in memory (estimated size 1667.0 B, free 912.2 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast6piece0 in memory on 192.168.6.90:52302 (size: 1667.0 B, free: 912.3 MB)
    
    17/05/20 10:20:25 INFO SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:996
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 13 (MapPartitionsRDD[13] at mapPartitions at LocalTest.scala:36)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 13.0 with 2 tasks
    
    17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 13.0 (TID 9, localhost, executor driver, partition 0, ANY, 6020 bytes)
    
    17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 13.0 (TID 9)
    
    17/05/20 10:20:25 INFO BlockManager: Found block input-0-1495246821600 locally
    
    17/05/20 10:20:25 INFO LocalTest: Handling events, events is empty: false
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test1 
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test2 
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test2 
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test3 
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test3 
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test3 
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test3 
    
    17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 13.0 (TID 9). 1498 bytes result sent to driver
    
    17/05/20 10:20:25 INFO TaskSetManager: Starting task 1.0 in stage 13.0 (TID 10, localhost, executor driver, partition 1, ANY, 6020 bytes)
    
    17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 13.0 (TID 9) in 28 ms on localhost (executor driver) (1/2)
    
    17/05/20 10:20:25 INFO Executor: Running task 1.0 in stage 13.0 (TID 10)
    
    17/05/20 10:20:25 INFO BlockManager: Found block input-0-1495246824600 locally
    
    17/05/20 10:20:25 INFO LocalTest: Handling events, events is empty: false
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test1 
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test2 
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test2 
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test3 
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test3 
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test3 
    
    17/05/20 10:20:25 INFO LocalTest: Processing words test3 
    
    17/05/20 10:20:25 INFO Executor: Finished task 1.0 in stage 13.0 (TID 10). 1498 bytes result sent to driver
    
    17/05/20 10:20:25 INFO TaskSetManager: Finished task 1.0 in stage 13.0 (TID 10) in 22 ms on localhost (executor driver) (2/2)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 13.0, whose tasks have all completed, from pool 
    
    17/05/20 10:20:25 INFO DAGScheduler: ShuffleMapStage 13 (mapPartitions at LocalTest.scala:36) finished in 0.048 s
    
    17/05/20 10:20:25 INFO DAGScheduler: looking for newly runnable stages
    
    17/05/20 10:20:25 INFO DAGScheduler: running: Set(ResultStage 0)
    
    17/05/20 10:20:25 INFO DAGScheduler: waiting: Set(ResultStage 15)
    
    17/05/20 10:20:25 INFO DAGScheduler: failed: Set()
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting ResultStage 15 (MapPartitionsRDD[19] at sortByKey at LocalTest.scala:58), which has no missing parents
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 6.4 KB, free 912.2 MB)
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast7piece0 stored as bytes in memory (estimated size 3.6 KB, free 912.2 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast7piece0 in memory on 192.168.6.90:52302 (size: 3.6 KB, free: 912.3 MB)
    
    17/05/20 10:20:25 INFO SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:996
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 15 (MapPartitionsRDD[19] at sortByKey at LocalTest.scala:58)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 15.0 with 2 tasks
    
    17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 15.0 (TID 11, localhost, executor driver, partition 0, PROCESS_LOCAL, 6185 bytes)
    
    17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 15.0 (TID 11)
    
    17/05/20 10:20:25 INFO BlockManager: Found block rdd60 locally
    
    17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
    
    17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
    
    17/05/20 10:20:25 INFO MemoryStore: Block rdd140 stored as bytes in memory (estimated size 155.0 B, free 912.2 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added rdd140 in memory on 192.168.6.90:52302 (size: 155.0 B, free: 912.3 MB)
    
    17/05/20 10:20:25 INFO MemoryStore: Block rdd160 stored as bytes in memory (estimated size 155.0 B, free 912.2 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added rdd160 in memory on 192.168.6.90:52302 (size: 155.0 B, free: 912.3 MB)
    
    17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 15.0 (TID 11). 2799 bytes result sent to driver
    
    17/05/20 10:20:25 INFO TaskSetManager: Starting task 1.0 in stage 15.0 (TID 12, localhost, executor driver, partition 1, PROCESS_LOCAL, 6185 bytes)
    
    17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 15.0 (TID 11) in 31 ms on localhost (executor driver) (1/2)
    
    17/05/20 10:20:25 INFO Executor: Running task 1.0 in stage 15.0 (TID 12)
    
    17/05/20 10:20:25 INFO BlockManager: Found block rdd61 locally
    
    17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
    
    17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
    
    17/05/20 10:20:25 INFO MemoryStore: Block rdd141 stored as bytes in memory (estimated size 180.0 B, free 912.2 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added rdd141 in memory on 192.168.6.90:52302 (size: 180.0 B, free: 912.3 MB)
    
    17/05/20 10:20:25 INFO MemoryStore: Block rdd161 stored as bytes in memory (estimated size 180.0 B, free 912.2 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added rdd161 in memory on 192.168.6.90:52302 (size: 180.0 B, free: 912.3 MB)
    
    17/05/20 10:20:25 INFO Executor: Finished task 1.0 in stage 15.0 (TID 12). 2803 bytes result sent to driver
    
    17/05/20 10:20:25 INFO TaskSetManager: Finished task 1.0 in stage 15.0 (TID 12) in 39 ms on localhost (executor driver) (2/2)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 15.0, whose tasks have all completed, from pool 
    
    17/05/20 10:20:25 INFO DAGScheduler: ResultStage 15 (sortByKey at LocalTest.scala:58) finished in 0.069 s
    
    17/05/20 10:20:25 INFO DAGScheduler: Job 4 finished: sortByKey at LocalTest.scala:58, took 0.156155 s
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast_8 stored as values in memory (estimated size 127.1 KB, free 912.1 MB)
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast8piece0 stored as bytes in memory (estimated size 14.3 KB, free 912.1 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast8piece0 in memory on 192.168.6.90:52302 (size: 14.3 KB, free: 912.3 MB)
    
    17/05/20 10:20:25 INFO SparkContext: Created broadcast 8 from sortByKey at LocalTest.scala:58
    
    17/05/20 10:20:25 INFO SparkContext: Starting job: sortByKey at LocalTest.scala:58
    
    17/05/20 10:20:25 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 1 is 83 bytes
    
    17/05/20 10:20:25 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 3 is 157 bytes
    
    17/05/20 10:20:25 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 156 bytes
    
    17/05/20 10:20:25 INFO DAGScheduler: Got job 5 (sortByKey at LocalTest.scala:58) with 2 output partitions
    
    17/05/20 10:20:25 INFO DAGScheduler: Final stage: ResultStage 19 (sortByKey at LocalTest.scala:58)
    
    17/05/20 10:20:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 16, ShuffleMapStage 17, ShuffleMapStage 18)
    
    17/05/20 10:20:25 INFO DAGScheduler: Missing parents: List()
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting ResultStage 19 (MapPartitionsRDD[16] at reduceByKeyAndWindow at LocalTest.scala:52), which has no missing parents
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast_9 stored as values in memory (estimated size 6.5 KB, free 912.1 MB)
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast9piece0 stored as bytes in memory (estimated size 3.6 KB, free 912.1 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast9piece0 in memory on 192.168.6.90:52302 (size: 3.6 KB, free: 912.3 MB)
    
    17/05/20 10:20:25 INFO SparkContext: Created broadcast 9 from broadcast at DAGScheduler.scala:996
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 19 (MapPartitionsRDD[16] at reduceByKeyAndWindow at LocalTest.scala:52)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 19.0 with 2 tasks
    
    17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 19.0 (TID 13, localhost, executor driver, partition 0, PROCESS_LOCAL, 6122 bytes)
    
    17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 19.0 (TID 13)
    
    17/05/20 10:20:25 INFO BlockManager: Found block rdd160 locally
    
    17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 19.0 (TID 13). 1085 bytes result sent to driver
    
    17/05/20 10:20:25 INFO TaskSetManager: Starting task 1.0 in stage 19.0 (TID 14, localhost, executor driver, partition 1, PROCESS_LOCAL, 6122 bytes)
    
    17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 19.0 (TID 13) in 30 ms on localhost (executor driver) (1/2)
    
    17/05/20 10:20:25 INFO Executor: Running task 1.0 in stage 19.0 (TID 14)
    
    17/05/20 10:20:25 INFO BlockManager: Found block rdd161 locally
    
    17/05/20 10:20:25 INFO Executor: Finished task 1.0 in stage 19.0 (TID 14). 1172 bytes result sent to driver
    
    17/05/20 10:20:25 INFO TaskSetManager: Finished task 1.0 in stage 19.0 (TID 14) in 85 ms on localhost (executor driver) (2/2)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 19.0, whose tasks have all completed, from pool 
    
    17/05/20 10:20:25 INFO DAGScheduler: ResultStage 19 (sortByKey at LocalTest.scala:58) finished in 0.115 s
    
    17/05/20 10:20:25 INFO DAGScheduler: Job 5 finished: sortByKey at LocalTest.scala:58, took 0.129602 s
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast_10 stored as values in memory (estimated size 127.1 KB, free 912.0 MB)
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast10piece0 stored as bytes in memory (estimated size 14.3 KB, free 911.9 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast10piece0 in memory on 192.168.6.90:52302 (size: 14.3 KB, free: 912.2 MB)
    
    17/05/20 10:20:25 INFO SparkContext: Created broadcast 10 from sortByKey at LocalTest.scala:58
    
    17/05/20 10:20:25 INFO ReliableRDDCheckpointData: Done checkpointing RDD 16 to file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/874437fe-0c21-4a9d-8793-8ae914f3f38b/rdd-16, new parent is RDD 20
    
    17/05/20 10:20:25 INFO SparkContext: Starting job: take at LocalTest.scala:59
    
    17/05/20 10:20:25 INFO DAGScheduler: Registering RDD 17 (map at LocalTest.scala:58)
    
    17/05/20 10:20:25 INFO DAGScheduler: Got job 6 (take at LocalTest.scala:59) with 1 output partitions
    
    17/05/20 10:20:25 INFO DAGScheduler: Final stage: ResultStage 21 (take at LocalTest.scala:59)
    
    17/05/20 10:20:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 20)
    
    17/05/20 10:20:25 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 20)
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting ShuffleMapStage 20 (MapPartitionsRDD[17] at map at LocalTest.scala:58), which has no missing parents
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast_11 stored as values in memory (estimated size 6.1 KB, free 911.9 MB)
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast11piece0 stored as bytes in memory (estimated size 3.5 KB, free 911.9 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast11piece0 in memory on 192.168.6.90:52302 (size: 3.5 KB, free: 912.2 MB)
    
    17/05/20 10:20:25 INFO SparkContext: Created broadcast 11 from broadcast at DAGScheduler.scala:996
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 20 (MapPartitionsRDD[17] at map at LocalTest.scala:58)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 20.0 with 2 tasks
    
    17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 20.0 (TID 15, localhost, executor driver, partition 0, PROCESS_LOCAL, 6169 bytes)
    
    17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 20.0 (TID 15)
    
    17/05/20 10:20:25 INFO BlockManager: Found block rdd160 locally
    
    17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 20.0 (TID 15). 1413 bytes result sent to driver
    
    17/05/20 10:20:25 INFO TaskSetManager: Starting task 1.0 in stage 20.0 (TID 16, localhost, executor driver, partition 1, PROCESS_LOCAL, 6169 bytes)
    
    17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 20.0 (TID 15) in 44 ms on localhost (executor driver) (1/2)
    
    17/05/20 10:20:25 INFO Executor: Running task 1.0 in stage 20.0 (TID 16)
    
    17/05/20 10:20:25 INFO BlockManager: Found block rdd161 locally
    
    17/05/20 10:20:25 INFO Executor: Finished task 1.0 in stage 20.0 (TID 16). 1413 bytes result sent to driver
    
    17/05/20 10:20:25 INFO TaskSetManager: Finished task 1.0 in stage 20.0 (TID 16) in 58 ms on localhost (executor driver) (2/2)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 20.0, whose tasks have all completed, from pool 
    
    17/05/20 10:20:25 INFO DAGScheduler: ShuffleMapStage 20 (map at LocalTest.scala:58) finished in 0.101 s
    
    17/05/20 10:20:25 INFO DAGScheduler: looking for newly runnable stages
    
    17/05/20 10:20:25 INFO DAGScheduler: running: Set(ResultStage 0)
    
    17/05/20 10:20:25 INFO DAGScheduler: waiting: Set(ResultStage 21)
    
    17/05/20 10:20:25 INFO DAGScheduler: failed: Set()
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting ResultStage 21 (MapPartitionsRDD[22] at map at LocalTest.scala:58), which has no missing parents
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast_12 stored as values in memory (estimated size 4.0 KB, free 911.9 MB)
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast12piece0 stored as bytes in memory (estimated size 2.4 KB, free 911.9 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast12piece0 in memory on 192.168.6.90:52302 (size: 2.4 KB, free: 912.2 MB)
    
    17/05/20 10:20:25 INFO SparkContext: Created broadcast 12 from broadcast at DAGScheduler.scala:996
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 21 (MapPartitionsRDD[22] at map at LocalTest.scala:58)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 21.0 with 1 tasks
    
    17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 21.0 (TID 17, localhost, executor driver, partition 0, ANY, 5903 bytes)
    
    17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 21.0 (TID 17)
    
    17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 2 blocks
    
    17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
    
    17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 21.0 (TID 17). 1869 bytes result sent to driver
    
    17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 21.0 (TID 17) in 28 ms on localhost (executor driver) (1/1)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 21.0, whose tasks have all completed, from pool 
    
    17/05/20 10:20:25 INFO DAGScheduler: ResultStage 21 (take at LocalTest.scala:59) finished in 0.029 s
    
    17/05/20 10:20:25 INFO DAGScheduler: Job 6 finished: take at LocalTest.scala:59, took 0.150449 s
    
    17/05/20 10:20:25 INFO SparkContext: Starting job: take at LocalTest.scala:59
    
    17/05/20 10:20:25 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 4 is 160 bytes
    
    17/05/20 10:20:25 INFO DAGScheduler: Got job 7 (take at LocalTest.scala:59) with 1 output partitions
    
    17/05/20 10:20:25 INFO DAGScheduler: Final stage: ResultStage 23 (take at LocalTest.scala:59)
    
    17/05/20 10:20:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 22)
    
    17/05/20 10:20:25 INFO DAGScheduler: Missing parents: List()
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting ResultStage 23 (MapPartitionsRDD[22] at map at LocalTest.scala:58), which has no missing parents
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast_13 stored as values in memory (estimated size 4.0 KB, free 911.9 MB)
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast13piece0 stored as bytes in memory (estimated size 2.4 KB, free 911.9 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast13piece0 in memory on 192.168.6.90:52302 (size: 2.4 KB, free: 912.2 MB)
    
    17/05/20 10:20:25 INFO SparkContext: Created broadcast 13 from broadcast at DAGScheduler.scala:996
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 23 (MapPartitionsRDD[22] at map at LocalTest.scala:58)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 23.0 with 1 tasks
    
    17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 23.0 (TID 18, localhost, executor driver, partition 1, ANY, 5903 bytes)
    
    17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 23.0 (TID 18)
    
    17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
    
    17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
    
    17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 23.0 (TID 18). 1869 bytes result sent to driver
    
    17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 23.0 (TID 18) in 75 ms on localhost (executor driver) (1/1)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 23.0, whose tasks have all completed, from pool 
    
    17/05/20 10:20:25 INFO DAGScheduler: ResultStage 23 (take at LocalTest.scala:59) finished in 0.077 s
    
    17/05/20 10:20:25 INFO DAGScheduler: Job 7 finished: take at LocalTest.scala:59, took 0.087657 s
    
    17/05/20 10:20:25 INFO JobScheduler: Added jobs for time 1495246825000 ms
    
    17/05/20 10:20:25 INFO JobGenerator: Checkpointing graph for time 1495246825000 ms
    
    17/05/20 10:20:25 INFO DStreamGraph: Updating checkpoint data for time 1495246825000 ms
    
    17/05/20 10:20:25 INFO JobScheduler: Starting job streaming job 1495246825000 ms.0 from job set of time 1495246825000 ms
    
    (test3 ,8)
    
    (test2 ,4)
    
    17/05/20 10:20:25 INFO DStreamGraph: Updated checkpoint data for time 1495246825000 ms
    
    17/05/20 10:20:25 INFO SparkContext: Starting job: print at LocalTest.scala:64
    
    17/05/20 10:20:25 INFO CheckpointWriter: Saving checkpoint for time 1495246825000 ms to file 'file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246825000'
    
    17/05/20 10:20:25 INFO CheckpointWriter: Submitted checkpoint of time 1495246825000 ms to writer queue
    
    17/05/20 10:20:25 INFO DAGScheduler: Got job 8 (print at LocalTest.scala:64) with 1 output partitions
    
    17/05/20 10:20:25 INFO DAGScheduler: Final stage: ResultStage 25 (print at LocalTest.scala:64)
    
    17/05/20 10:20:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 24)
    
    17/05/20 10:20:25 INFO DAGScheduler: Missing parents: List()
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting ResultStage 25 (MapPartitionsRDD[22] at map at LocalTest.scala:58), which has no missing parents
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast_14 stored as values in memory (estimated size 4.0 KB, free 911.9 MB)
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast14piece0 stored as bytes in memory (estimated size 2.4 KB, free 911.9 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast14piece0 in memory on 192.168.6.90:52302 (size: 2.4 KB, free: 912.2 MB)
    
    17/05/20 10:20:25 INFO SparkContext: Created broadcast 14 from broadcast at DAGScheduler.scala:996
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 25 (MapPartitionsRDD[22] at map at LocalTest.scala:58)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 25.0 with 1 tasks
    
    17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 25.0 (TID 19, localhost, executor driver, partition 0, ANY, 6672 bytes)
    
    17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 25.0 (TID 19)
    
    17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 2 blocks
    
    17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
    
    17/05/20 10:20:25 INFO CheckpointWriter: Deleting file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246115000.bk
    
    17/05/20 10:20:25 INFO CheckpointWriter: Checkpoint for time 1495246825000 ms saved to file 'file:/Users/wttttt/programming/flume/RealTimeLog/flumeCheckpoint/checkpoint-1495246825000', took 4010 bytes and 37 ms
    
    17/05/20 10:20:25 INFO Executor: Finished task 0.0 in stage 25.0 (TID 19). 1869 bytes result sent to driver
    
    17/05/20 10:20:25 INFO TaskSetManager: Finished task 0.0 in stage 25.0 (TID 19) in 37 ms on localhost (executor driver) (1/1)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Removed TaskSet 25.0, whose tasks have all completed, from pool 
    
    17/05/20 10:20:25 INFO DAGScheduler: ResultStage 25 (print at LocalTest.scala:64) finished in 0.039 s
    
    17/05/20 10:20:25 INFO DAGScheduler: Job 8 finished: print at LocalTest.scala:64, took 0.046366 s
    
    17/05/20 10:20:25 INFO SparkContext: Starting job: print at LocalTest.scala:64
    
    17/05/20 10:20:25 INFO DAGScheduler: Got job 9 (print at LocalTest.scala:64) with 1 output partitions
    
    17/05/20 10:20:25 INFO DAGScheduler: Final stage: ResultStage 27 (print at LocalTest.scala:64)
    
    17/05/20 10:20:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 26)
    
    17/05/20 10:20:25 INFO DAGScheduler: Missing parents: List()
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting ResultStage 27 (MapPartitionsRDD[22] at map at LocalTest.scala:58), which has no missing parents
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast_15 stored as values in memory (estimated size 4.0 KB, free 911.9 MB)
    
    17/05/20 10:20:25 INFO MemoryStore: Block broadcast15piece0 stored as bytes in memory (estimated size 2.4 KB, free 911.9 MB)
    
    17/05/20 10:20:25 INFO BlockManagerInfo: Added broadcast15piece0 in memory on 192.168.6.90:52302 (size: 2.4 KB, free: 912.2 MB)
    
    17/05/20 10:20:25 INFO SparkContext: Created broadcast 15 from broadcast at DAGScheduler.scala:996
    
    17/05/20 10:20:25 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 27 (MapPartitionsRDD[22] at map at LocalTest.scala:58)
    
    17/05/20 10:20:25 INFO TaskSchedulerImpl: Adding task set 27.0 with 1 tasks
    
    17/05/20 10:20:25 INFO TaskSetManager: Starting task 0.0 in stage 27.0 (TID 20, localhost, executor driver, partition 1, ANY, 6672 bytes)
    
    17/05/20 10:20:25 INFO Executor: Running task 0.0 in stage 27.0 (TID 20)
    
    17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
    
    17/05/20 10:20:25 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
    
    Time: 1495246825000 ms
    
    (test3 ,8)
    
    (test2 ,4)
    
    (test1 ,2)
    

配置JobHistory

  • spark-defaults.conf:
spark.eventLog.enabled  true
spark.eventLog.dir hdfs://host99:9000/sparkLogHistory
spark.eventLog.compress true
  • eventLog.dir要手动创建: hadoop fs -mkdir /sparkLogHistory
  • spark-env.sh
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=7777 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://host99:9000/sparkLogHistory"

  默认只保留最近三次的jobHistory,端口号是7777

  • 运行一个job试试

<Spark><Spark Streaming><作业分析><JobHistory>的更多相关文章

  1. 简单物联网:外网访问内网路由器下树莓派Flask服务器

    最近做一个小东西,大概过程就是想在教室,宿舍控制实验室的一些设备. 已经在树莓上搭了一个轻量的flask服务器,在实验室的路由器下,任何设备都是可以访问的:但是有一些限制条件,比如我想在宿舍控制我种花 ...

  2. 利用ssh反向代理以及autossh实现从外网连接内网服务器

    前言 最近遇到这样一个问题,我在实验室架设了一台服务器,给师弟或者小伙伴练习Linux用,然后平时在实验室这边直接连接是没有问题的,都是内网嘛.但是回到宿舍问题出来了,使用校园网的童鞋还是能连接上,使 ...

  3. 外网访问内网Docker容器

    外网访问内网Docker容器 本地安装了Docker容器,只能在局域网内访问,怎样从外网也能访问本地Docker容器? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动Docker容器 ...

  4. 外网访问内网SpringBoot

    外网访问内网SpringBoot 本地安装了SpringBoot,只能在局域网内访问,怎样从外网也能访问本地SpringBoot? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装Java 1 ...

  5. 外网访问内网Elasticsearch WEB

    外网访问内网Elasticsearch WEB 本地安装了Elasticsearch,只能在局域网内访问其WEB,怎样从外网也能访问本地Elasticsearch? 本文将介绍具体的实现步骤. 1. ...

  6. 怎样从外网访问内网Rails

    外网访问内网Rails 本地安装了Rails,只能在局域网内访问,怎样从外网也能访问本地Rails? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动Rails 默认安装的Rails端口 ...

  7. 怎样从外网访问内网Memcached数据库

    外网访问内网Memcached数据库 本地安装了Memcached数据库,只能在局域网内访问,怎样从外网也能访问本地Memcached数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装 ...

  8. 怎样从外网访问内网CouchDB数据库

    外网访问内网CouchDB数据库 本地安装了CouchDB数据库,只能在局域网内访问,怎样从外网也能访问本地CouchDB数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动Cou ...

  9. 怎样从外网访问内网DB2数据库

    外网访问内网DB2数据库 本地安装了DB2数据库,只能在局域网内访问,怎样从外网也能访问本地DB2数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动DB2数据库 默认安装的DB2 ...

  10. 怎样从外网访问内网OpenLDAP数据库

    外网访问内网OpenLDAP数据库 本地安装了OpenLDAP数据库,只能在局域网内访问,怎样从外网也能访问本地OpenLDAP数据库? 本文将介绍具体的实现步骤. 1. 准备工作 1.1 安装并启动 ...

随机推荐

  1. codeforces 578a//A Problem about Polyline// Codeforces Round #320 (Div. 1)

    题意:一个等腰直角三角形一样的周期函数(只有x+轴),经过给定的点(a,b),并且半周期为X,使X尽量大,问X最大为多少? 如果a=b,结果就为b 如果a<b无解. 否则,b/(2*k*x-a) ...

  2. android主流开源库

    网络框架:Volley 和 Async Volley特点:能使网络通信更快,更简单.更健壮 Get,Post网络请求及网络图像的高效率 Async:高效的网络数据请求, 解析成json 持久化cook ...

  3. DP 传球问题

    洛谷P1057 传球问题 分析:经过m次传球到第i个人的方法可以由经过m-1次传球到第i个人和到第i-1个人传递得来 设dp[i][j]为经过j次传球后到达第i个人的方法数,可得到状态转移方程为: d ...

  4. for + setTimeout

    一.背景 最近在翻看以前的老书<node.js开发指南>,恰好碰到 for 循环 + setTimeout 的经典例子,于是重新梳理了思路并记录下. 二.写在前面,setTimeout 和 ...

  5. miniui 使用心得

    MiniUI demo实例使用心得:1.渲染速度很快2快速维护数据 3多种编辑方式 如 弹窗 直接下方显示form 下方显示tab 等4.树形 编辑 联动 5验证表单6文本框内 选择框 保存的多个选项 ...

  6. HomeBrew的安装和简单使用

    homebrew 官网 https://brew.sh/ 转自:http://blog.csdn.NET/maojudong/article/details/7918291 1.  前言 作为Linu ...

  7. 数据结构与算法之PHP实现队列、栈

    一.队列 1)队列(Queue)是一种先进先出(FIFO)的线性表,它只允许在表的前端进行删除操作,在表的后端进行插入操作,进行插入操作的端称为队尾,进行删除操作的端称为队头.即入队只能从队尾入,出队 ...

  8. MySQL Workbench中如何开启MySQL的远程帐号

    '; /* 上面的语句表示将 discuz 数据库的所有权限授权给 root这个用户,允许 root用户在192.168.119.120进行远程登陆,并设置 root用户的密码为 123456 . 下 ...

  9. 接口测试--postman简介

    一.什么是接口测试 接口测试是测试系统组件间接口的一种测试.接口测试主要用于检测外部系统与系统之间以及内部各个子系统之间的交互点.测试的重点是要检查数据的交换,传递和控制管理过程,以及系统间的相互逻辑 ...

  10. pycham相关+Python基础

    pycham 1.设置pycham和线上服务器同步代码 a.Tools--->Deployment--->Configuration   b.设置连接的服务器IP c.设置连接的用户名+密 ...