文章标题 Deep Dive into Spark SQL’s Catalyst Optimizer 作者介绍 Michael Armbrust, Yin Huai, Cheng Liang, Reynold Xin and Matei Zaharia 文章正文 参考文献 https://databricks.com/blog/2015/04/13/deep-dive-into-spark-sqls-catalyst-optimizer.html…
Spark SQL是Spark最新和技术最为复杂的组件之一.它支持SQL查询和新的DataFrame API.Spark SQL的核心是Catalyst优化器,它以一种新颖的方式利用高级编程语言特性(例如Scala的模式匹配和quasiquotes)来构建可扩展查询优化器. 我们最近发布了一篇关于Spark SQL的论文,该论文将出现在SIGMOD 2015(由Davies Liu,Joseph K. Bradley,Xiangrui Meng,Tomer Kaftan,Michael J. F…
文章标题 Introducing DataFrames in Apache Spark for Large Scale Data Science 一个用于大规模数据科学的API——DataFrame 作者介绍 Reynold Xin, Michael Armbrust and Davies Liu 文章正文 Today, we are excited to announce a new DataFrame API designed to make big data processing even…
文章标题 What’s new for Spark SQL in Apache Spark 1.3 作者介绍 Michael Armbrust 文章正文 The Apache Spark 1.3 release represents a major milestone for Spark SQL.  In addition to several major features, we are very excited to announce that the project has officia…
文章标题 Introducing Apache Spark Datasets 作者介绍 Michael Armbrust, Wenchen Fan, Reynold Xin and Matei Zaharia 文章正文 Developers have always loved Apache Spark for providing APIs that are simple yet powerful, a combination of traits that makes complex analys…
文章标题 A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets 且谈Apache Spark的API三剑客:RDD.DataFrame和Dataset When to use them and why 什么时候用他们,为什么? tale [tel] 传说,传言;(尤指充满惊险的)故事;坏话,谣言;〈古〉计算,总计 作者介绍 Jules S. Damji是Databricks在Apache Spark社区的布道者.他也是…
文章标题 Apache Spark as a Compiler: Joining a Billion Rows per Second on a Laptop Deep dive into the new Tungsten execution engine 作者介绍 Sameer Agarwal, Davies Liu and Reynold Xin 文章正文 参考文献 https://databricks.com/blog/2016/05/23/apache-spark-as-a-compile…
文章标题 One SQL to Rule Them All – an Efficient and Syntactically Idiomatic Approach to Management of Streams and Tables 用SQL统一所有:一种有效的.语法惯用的流和表管理方法 syntactically 句法上;语法上;句法;句法性地;句法特征 idiomatic [ˌɪdiəˈmætɪk] 惯用的;合乎语言习惯的;习语的 approach [əˈproʊtʃ] v.(在距离或时间…
论文内容: 待整理 参考文献: Spark: Cluster Computing with Working Sets. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, Ion Stoica. HotCloud 2010. June 2010. Spark :工作组上的集群计算的框架…
论文内容: 待整理 参考文献: Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica. NS…