8.spark Core 进阶1

(e.g. standalone manager, Mesos, YARN)

In "cluster" mode, the framework launches the driver inside of the cluster.

In "client" mode, the submitter launches the driver outside of the cluster.

Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContext object in your main program (called the driver program).

Specifically, to run on a cluster, the SparkContext can connect to several types of cluster managers (either Spark’s own standalone cluster manager, Mesos or YARN), which allocate resources across applications. Once connected, Spark acquires executors on nodes in the cluster, which are processes that run computations and store data for your application. Next, it sends your application code (defined by JAR or Python files passed to SparkContext) to the executors. Finally, SparkContext sends tasks to the executors to run.

There are several useful things to note about this architecture:

Each application gets its own executor processes, which stay up for the duration of the whole application and run tasks in multiple threads. This has the benefit of isolating applications from each other, on both the scheduling side (each driver schedules its own tasks) and executor side (tasks from different applications run in different JVMs). However, it also means that data cannot be shared across different Spark applications (instances of SparkContext) without writing it to an external storage system.
Spark is agnostic to the underlying cluster manager. As long as it can acquire executor processes, and these communicate with each other, it is relatively easy to run it even on a cluster manager that also supports other applications (e.g. Mesos/YARN).
The driver program must listen for and accept incoming connections from its executors throughout its lifetime (e.g., see spark.driver.port in the network config section). As such, the driver program must be network addressable from the worker nodes.
Because the driver schedules tasks on the cluster, it should be run close to the worker nodes, preferably on the same local area network. If you’d like to send requests to the cluster remotely, it’s better to open an RPC to the driver and have it submit operations from nearby than to run a driver far away from the worker nodes.

8.spark Core 进阶1的更多相关文章

9.spark Core 进阶2--Cashe
RDD Persistence One of the most important capabilities in Spark is persisting (or caching) a d ...
Spark 3.x Spark Core详解 & 性能优化
Spark Core 1. 概述 Spark 是一种基于内存的快速.通用.可扩展的大数据分析计算引擎 1.1 Hadoop vs Spark 上面流程对应Hadoop的处理流程,下面对应着Spark的 ...
Spark Streaming揭秘 Day35 Spark core思考
Spark Streaming揭秘 Day35 Spark core思考 Spark上的子框架,都是后来加上去的.都是在Spark core上完成的,所有框架一切的实现最终还是由Spark core来 ...
【Spark Core】任务运行机制和Task源代码浅析1
引言上一小节<TaskScheduler源代码与任务提交原理浅析2>介绍了Driver側将Stage进行划分.依据Executor闲置情况分发任务,终于通过DriverActor向exe ...
TypeError: Error #1034: 强制转换类型失败:无法将 mx.controls::DataGrid@9a7c0a1 转换为 spark.core.IViewport。
1.错误描述 TypeError: Error #1034: 强制转换类型失败:无法将 mx.controls::DataGrid@9aa90a1 转换为 spark.core.IViewport. ...
Spark Core
Spark Core DAG概念有向无环图 Spark会根据用户提交的计算逻辑中的RDD的转换(变换方法)和动作(action方法)来生成RDD之间的依赖关系,同时 ...
Spark Streaming 进阶与案例实战
Spark Streaming 进阶与案例实战 1.带状态的算子: UpdateStateByKey 2.实战:计算到目前位置累积出现的单词个数写入到MySql中 1.create table CRE ...
spark core （二）
一.Spark-Shell交互式工具 1.Spark-Shell交互式工具 Spark-Shell提供了一种学习API的简单方式, 以及一个能够交互式分析数据的强大工具. 在Scala语言环境下或Py ...
Spark Core 资源调度与任务调度（standalone client 流程描述）
Spark Core 资源调度与任务调度(standalone client 流程描述) Spark集群启动: 集群启动后,Worker会向Master汇报资源情况(实际上将Worker的资 ...

随机推荐

js实现禁止页面拖拽图片
document.ondragstart = function() { return false;};
angularJS 绑定操作
<button type="submit" ng-disabled="editForm.$invalid || vm.isSaving" class=&q ...
leetcode.字符串.409最长回文串-Java
1. 具体题目给定一个包含大写字母和小写字母的字符串,找到通过这些字母构造成的最长的回文串.在构造过程中,请注意区分大小写.比如 "Aa" 不能当做一个回文字符串. 注意: 假设 ...
pymysql连接mysql报错
pymysql模块操作数据库及连接报错解决方法 import pymysql sql = "select host,user,password from user" #想要执行 ...
vue-router 使用二级路由去实现子组件的显示和隐藏
在需求中有一个这样的情况:一个组件在主组件和另外的组件中引用,且点击主组件和这个组件分别有相应得切换事件. 一开始的时候我是没有划分组件,把它们放到主组件内,这样便于切换,但是主主件内有独立的部分需要 ...
Install .NET Core Runtime on Linux Ubuntu 16.04 x64
原文链接https://www.microsoft.com/net/download/linux-package-manager/ubuntu16-04/runtime-current nstall ...
Use on Git
Preface The document is about to introduce some specialties on PLM development and mainte ...
MVC--MVP?
第一部分:什么是MVP?什么是MVC? 1.什么是MVP? M:数据层(数据库.网络.文件存储等等...) V:View和Activity和Fragment以及它们的子类 P:中介->Prese ...
(Struts2学习系列二)Struts2动态方法调用：指定method属性
紧接着上一篇,在HelloWorldAction.java中添加add和update方法: public class HelloWorldAction extends ActionSupport{ p ...
ASP.NET MVC 解决账号重复登录问题
解决重复登录用到了 .net 身份票证和Global全局处理文件第一步登录方法传入用户名 private void GetOnline(string Name) { Hashtable S ...

8.spark Core 进阶1

8.spark Core 进阶1的更多相关文章

随机推荐

热门专题