package com.jason.example import org.apache.spark.rdd.RDD class RddTest extends SparkInstance { val sc = spark.sparkContext val rdd = sc.parallelize( to ) val rdd2 = sc.parallelize( to ) val pairRdd = rdd2.map(x => (x, x * )) def trans(): Unit = { pr
了解RDD之前,必读UCB的论文,个人认为这是最好的资料,没有之一. http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. Represents an immutable,* partitioned collection of elements that can be operated o
def main(args: Array[String]): Unit = { val sparkConf = new SparkConf().setAppName("DecisionTree1").setMaster("local[2]") sparkConf.set("es.index.auto.create", "true") sparkConf.set("es.nodes", "10.3.
https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/programming_guide.html Example Program 编程的风格和spark很类似, ExecutionEnvironment -- SparkContext DataSet – RDD Transformations 这里用Java的接口,所以传入function需要用FlatMapFunction类对象 public clas