9.spark Core 进阶2--Cashe
|
Storage Level
|
Meaning
|
|
MEMORY_ONLY
|
Store RDD as deserialized Java objects in the JVM. If the RDD does not fit in memory, some partitions will not be cached and will be recomputed on the fly each time they're needed. This is the default level.
|
|
MEMORY_AND_DISK
|
Store RDD as deserialized Java objects in the JVM. If the RDD does not fit in memory, store the partitions that don't fit on disk, and read them from there when they're needed.
|
|
MEMORY_ONLY_SER (Java and Scala)
|
Store RDD as serialized Java objects (one byte array per partition). This is generally more space-efficient than deserialized objects, especially when using a fast serializer, but more CPU-intensive to read.
|
|
MEMORY_AND_DISK_SER (Java and Scala)
|
Similar to MEMORY_ONLY_SER, but spill partitions that don't fit in memory to disk instead of recomputing them on the fly each time they're needed.
|
|
DISK_ONLY
|
Store the RDD partitions only on disk.
|
|
MEMORY_ONLY_2, MEMORY_AND_DISK_2, etc.
|
Same as the levels above, but replicate each partition on two cluster nodes.
|
|
OFF_HEAP (experimental)
|
Similar to MEMORY_ONLY_SER, but store the data in off-heap memory. This requires off-heap memory to be enabled.
|
- If your RDDs fit comfortably with the default storage level (MEMORY_ONLY), leave them that way. This is the most CPU-efficient option, allowing operations on the RDDs to run as fast as possible.
- If not, try using MEMORY_ONLY_SER and selecting a fast serialization library to make the objects much more space-efficient, but still reasonably fast to access. (Java and Scala)
- Don’t spill to disk unless the functions that computed your datasets are expensive, or they filter a large amount of the data. Otherwise, recomputing a partition may be as fast as reading it from disk.
- Use the replicated storage levels if you want fast fault recovery (e.g. if using Spark to serve requests from a web application). All the storage levels provide full fault tolerance by recomputing lost data, but the replicated ones let you continue running tasks on the RDD without waiting to recompute a lost partition.
9.spark Core 进阶2--Cashe的更多相关文章
- 8.spark Core 进阶1
(e.g. standalone manager, Mesos, YARN) In "cluster" mode, the framework launches the ...
- Spark 3.x Spark Core详解 & 性能优化
Spark Core 1. 概述 Spark 是一种基于内存的快速.通用.可扩展的大数据分析计算引擎 1.1 Hadoop vs Spark 上面流程对应Hadoop的处理流程,下面对应着Spark的 ...
- Spark Streaming揭秘 Day35 Spark core思考
Spark Streaming揭秘 Day35 Spark core思考 Spark上的子框架,都是后来加上去的.都是在Spark core上完成的,所有框架一切的实现最终还是由Spark core来 ...
- 【Spark Core】任务运行机制和Task源代码浅析1
引言 上一小节<TaskScheduler源代码与任务提交原理浅析2>介绍了Driver側将Stage进行划分.依据Executor闲置情况分发任务,终于通过DriverActor向exe ...
- TypeError: Error #1034: 强制转换类型失败:无法将 mx.controls::DataGrid@9a7c0a1 转换为 spark.core.IViewport。
1.错误描述 TypeError: Error #1034: 强制转换类型失败:无法将 mx.controls::DataGrid@9aa90a1 转换为 spark.core.IViewport. ...
- Spark Core
Spark Core DAG概念 有向无环图 Spark会根据用户提交的计算逻辑中的RDD的转换(变换方法)和动作(action方法)来生成RDD之间的依赖关系,同时 ...
- Spark Streaming 进阶与案例实战
Spark Streaming 进阶与案例实战 1.带状态的算子: UpdateStateByKey 2.实战:计算到目前位置累积出现的单词个数写入到MySql中 1.create table CRE ...
- spark core (二)
一.Spark-Shell交互式工具 1.Spark-Shell交互式工具 Spark-Shell提供了一种学习API的简单方式, 以及一个能够交互式分析数据的强大工具. 在Scala语言环境下或Py ...
- Spark Core 资源调度与任务调度(standalone client 流程描述)
Spark Core 资源调度与任务调度(standalone client 流程描述) Spark集群启动: 集群启动后,Worker会向Master汇报资源情况(实际上将Worker的资 ...
随机推荐
- SVM-SVR
高频率的接触到了SVM模型,而且还有使用SVM模型做回归的情况,即SVR.另外考虑到自己从第一次知道这个模型到现在也差不多两年时间了,从最开始的腾云驾雾到现在有了一点直观的认识,花费了不少时间.因此在 ...
- POJ 2135 /// 最小费用流最大流 非负花费 BellmanFord模板
题目大意: 给定一个n个点m条边的无向图 求从点1去点n再从点n回点1的不重叠(同一条边不能走两次)的最短路 挑战P239 求去和回的两条最短路很难保证不重叠 直接当做是由1去n的两条不重叠的最短路 ...
- docker容器和宿主机时间不一致的问题
第1种:复制宿主机的localtime文件,到容器里docker cp /etc/localtime threg:/etc/ 注:这里 threg为容器名称,复制完后需重启容器 第2种在构建docke ...
- python--reflect
一.反射 python 中用字符串的方式操作对象的相关属性,python 中一切皆对象,都可以使用反射 用eval 有安全隐患,用 反射就很安全 1.反射对象中的属性和方法 class A: a_cl ...
- Pregel的执行过程
- 关闭swap
#(1)临时关闭swap分区, 重启失效; swapoff -a #(2)永久关闭swap分区 sed -ri 's/.*swap.*/#&/' /etc/fstab
- ajax 工作原理
Ajax的优缺点及工作原理? 定义和用法: AJAX = Asynchronous JavaScript and XML(异步的 JavaScript 和 XML).Ajax 是一种用于创建快速动态网 ...
- 树上莫比乌斯反演+分层图并查集——cf990G
/* 树上莫比乌斯反演 求树上 满足 d|gcd(au,av) gcd(au,av)的对数f(d) 如何求: 建立200000层新图,即对于每个数建立一个新图 在加边时,给gcd(au,av)的约数层 ...
- NX二次开发-将工程图视图+尺寸的最大边界导出图片
/***************************************************************************** ** ** ExportPicture.c ...
- iOS7 AVAudioRecorder不能录音
今天写录音代码的时候,在iOS7以下就可以录音,但是iOS7上不可以,后来才知道iOS7录音方式变了,加上下面的代码就可以了,bingo AVAudioSession *audioSession = ...