RDD RDD初始參数:上下文和一组依赖 abstract class RDD[T: ClassTag]( @transient private var sc: SparkContext, @transient private var deps: Seq[Dependency[_]] ) extends Serializable 下面须要细致理清: A list of Partitions Function to compute split (sub RDD impl) A list of De…
关于RDD, 详细可以参考Spark的论文, 下面看下源码 A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. Represents an immutable, partitioned collection of elements that can be operated on in parallel. * Internally, each RDD is characterized by five main…