Requirements

Software Requirements

Flink runs on all UNIX-like environments, e.g. Linux, Mac OS X, and Cygwin (for Windows) and expects the cluster to consist of one master node and one or more worker nodes. Before you start to setup the system, make sure you have the following software installed on each node:

  • Java 1.8.x or higher,
  • ssh (sshd must be running to use the Flink scripts that manage remote components)

If your cluster does not fulfill these software requirements you will need to install/upgrade it.

Having passwordless SSH and the same directory structure on all your cluster nodes will allow you to use our scripts to control everything.

ssh免密登录:

在源机器执行

ssh-keygen  //一路回车,生成公钥,位置~/.ssh/id_rsa.pub

ssh-copy-id -i ~/.ssh/id_rsa.pub root@目的机器  //将源机器生成的公钥拷贝到目的机器的~/.ssh/authorized_keys

或者 手动将源机器生成的公钥拷贝到目的机器的~/.ssh/authorized_keys。注意:不要有换行

完成!

Back to top

JAVA_HOME Configuration

系统环境配置了JAVA_HOME即可,无需如下操作

Flink requires the JAVA_HOME environment variable to be set on the master and all worker nodes and point to the directory of your Java installation.

You can set this variable in conf/flink-conf.yaml via the env.java.home key.

Back to top

Flink Setup

Go to the downloads page and get the ready-to-run package. Make sure to pick the Flink package matching your Hadoop version. If you don’t plan to use Hadoop, pick any version.

After downloading the latest release, copy the archive to your master node and extract it:

tar xzf flink-*.tgz
cd flink-*

Configuring Flink  (不做任何设置,在单节点上直接运行bin/start-cluster.sh,可在单机启动单个jobmanager和单个taskmanager,方便debug代码)

After having extracted the system files, you need to configure Flink for the cluster by editing conf/flink-conf.yaml.

Set the jobmanager.rpc.address key to point to your master node. You should also define the maximum amount of main memory the JVM is allowed to allocate on each node by setting the jobmanager.heap.mb and taskmanager.heap.mb keys.

These values are given in MB. If some worker nodes have more main memory which you want to allocate to the Flink system you can overwrite the default value by setting the environment variable FLINK_TM_HEAP on those specific nodes.

Finally, you must provide a list of all nodes in your cluster which shall be used as worker nodes. Therefore, similar to the HDFS configuration, edit the file conf/slaves and enter the IP/host name of each worker node. Each worker node will later run a TaskManager.

The following example illustrates the setup with three nodes (with IP addresses from 10.0.0.1 to 10.0.0.3 and hostnames masterworker1worker2) and shows the contents of the configuration files (which need to be accessible at the same path on all machines):

/path/to/flink/conf/flink-conf.yaml
jobmanager.rpc.address: 10.0.0.1
注意:需要在所有需要启动taskmanager的机器进行如上配置

/path/to/flink/conf/slaves

10.0.0.2
10.0.0.3

The Flink directory must be available on every worker under the same path. You can use a shared NFS directory, or copy the entire Flink directory to every worker node.

Please see the configuration page for details and additional configuration options.

In particular,

  • the amount of available memory per JobManager (jobmanager.heap.mb),
  • the amount of available memory per TaskManager (taskmanager.heap.mb),
  • the number of available CPUs per machine (taskmanager.numberOfTaskSlots),
  • the total number of CPUs in the cluster (parallelism.default) and
  • the temporary directories (taskmanager.tmp.dirs)

are very important configuration values.

Back to top

Starting Flink

The following script starts a JobManager on the local node and connects via SSH to all worker nodes listed in the slaves file to start the TaskManager on each node. Now your Flink system is up and running. The JobManager running on the local node will now accept jobs at the configured RPC port.

Assuming that you are on the master node and inside the Flink directory:

bin/start-cluster.sh

To stop Flink, there is also a stop-cluster.sh script.

Back to top

Adding JobManager/TaskManager Instances to a Cluster

You can add both JobManager and TaskManager instances to your running cluster with the bin/jobmanager.sh and bin/taskmanager.shscripts.

Adding a JobManager

bin/jobmanager.sh ((start|start-foreground) cluster)|stop|stop-all

Adding a TaskManager

bin/taskmanager.sh start|start-foreground|stop|stop-all

Make sure to call these scripts on the hosts on which you want to start/stop the respective instance.

flink Standalone Cluster的更多相关文章

  1. flink初识及安装flink standalone集群

    flink architecture 1.可以看出,flink可以运行在本地,也可以类似spark一样on yarn或者standalone模式(与spark standalone也很相似),此外fl ...

  2. Apache Spark源码走读之19 -- standalone cluster模式下资源的申请与释放

    欢迎转载,转载请注明出处,徽沪一郎. 概要 本文主要讲述在standalone cluster部署模式下,Spark Application在整个运行期间,资源(主要是cpu core和内存)的申请与 ...

  3. Spark Standalone cluster try

    Spark Standalone cluster node*-- stop firewalldsystemctl stop firewalldsystemctl disable firewalld-- ...

  4. flink部署操作-flink standalone集群安装部署

    flink集群安装部署 standalone集群模式 必须依赖 必须的软件 JAVA_HOME配置 flink安装 配置flink 启动flink 添加Jobmanager/taskmanager 实 ...

  5. Spark运行模式_spark自带cluster manager的standalone cluster模式(集群)

    这种运行模式和"Spark自带Cluster Manager的Standalone Client模式(集群)"还是有很大的区别的.使用如下命令执行应用程序(前提是已经启动了spar ...

  6. Flink standalone模式作业执行流程

    宏观流程如下图: client端 生成StreamGraph env.addSource(new SocketTextStreamFunction(...)) .flatMap(new FlatMap ...

  7. 【转帖】两年Flink迁移之路:从standalone到on yarn,处理能力提升五倍

    两年Flink迁移之路:从standalone到on yarn,处理能力提升五倍 https://segmentfault.com/a/1190000020209179 flink 1.7k 次阅读 ...

  8. Apache Flink 的迁移之路,2 年处理效果提升 5 倍

    一.背景与痛点 在 2017 年上半年以前,TalkingData 的 App Analytics 和 Game Analytics 两个产品,流式框架使用的是自研的 td-etl-framework ...

  9. 重磅!解锁Apache Flink读写Apache Hudi新姿势

    感谢阿里云 Blink 团队Danny Chan的投稿及完善Flink与Hudi集成工作. 1. 背景 Apache Hudi 是目前最流行的数据湖解决方案之一,Data Lake Analytics ...

随机推荐

  1. 使用ML.NET和Azure Function进行机器学习 - 第1部分

    介绍 一提到机器学习,总是让人望而生畏.幸运的是,Azure正在想方设法让开发人员更容易进入机器学习.ML.NET是Microsoft Research专为.NET开发人员开发的机器学习框架,因此您可 ...

  2. 一条项目中常用的linux命令引发的经典算法题

    小时候家里定了<读者>的月刊,里面记录一个故事:说有有个偏僻的乡村一日突然来了一个美女,她携着万贯家财子女在当地安家落户,成了当地的乡绅.她让她的子女世世代代的保守这个秘密,直到这个秘密不 ...

  3. 关于AJAX异步请求

    一个HTTP请求由4个部分组成: *HTTP请求方法或“动作” *正在请求的URL *一个可选的请求头集合,其中可能包括身份验证信息 *一个可选的请求主体 服务器返回的HTTP响应包含3部分: *一个 ...

  4. wsl中使用原生docker

    之前介绍过windows中安装docker,但是它需要用到hyper-v.hyper-v与vm不兼容非常之不方便.不过发现windows有wsl(linux子系统)遂试验,结果非常nice功能一应俱全 ...

  5. ELK-安装kibana

    注意:在下载tar包的时候需要注意下安装的es版本号,按照官网的说明版本是对应一致的. #下载tar包$ wget https://artifacts.elastic.co/downloads/kib ...

  6. git版本控制工具的使用

    目录 git版本管理工具使用 一丶Git的下载与安装 1.windows下的git的下载与安装 2.linux下的git安装 二丶常用命令 三丶Git仓库 1.配置仓库信息 2.仓库的创建于管理 四丶 ...

  7. 云原生实践之 RSocket 从入门到落地:Servlet vs RSocket

    技术实践的作用在于:除了用于构建业务,也是为了验证某项技术或框架是否值得大规模推广. 本期开始,我们推出<RSocket 从入门到落地>系列文章,通过实例和对比来介绍RSocket.主要围 ...

  8. 打造自己的.NET Core项目模板

    前言 每个人都有自己习惯的项目结构,有人的喜欢在项目里面建解决方案文件夹:有的人喜欢传统的三层命名:有的人喜欢单一,简单的项目一个csproj就搞定.. 反正就是萝卜青菜,各有所爱. 可能不同的公司对 ...

  9. 组合模式 合成模式 COMPOSITE 结构型 设计模式(十一)

    组合模式(合成模式 COMPOSITE) 意图 将对象组合成树形结构以表示“部分-整体”的层次结构. Composite使得用户对单个对象和组合对象的使用具有一致性.   树形结构介绍 为了便于理解, ...

  10. AppBoxFuture(三): 分而治之

      系统数据量达到一定程度后必将采用分库分表的方式来提高系统性能,但传统的分库分表方式也必将带来更高的开发复杂程度.新一代的NewSql及NoSql数据库由于天生的分布式存储基因,既保证了能够横向扩展 ...