Spark2.x（五十九）：yarn-cluster模式提交Spark任务，如何关闭client进程?

问题：

最近现场反馈采用yarn-cluster方式提交spark application后，在提交节点机上依然会存在一个yarn的client进程不关闭，又由于spark application都是spark structured streaming程序（application常年累月的执行），最终导致spark application提交节点服务器资源被占满，当执行其他操作时，会出现以下错误：

[dx@my-linux-01 bin]$ yarn logs -applicationId application_15644802175503_0189

Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000c000000, 702021632, 0) failed; error='Cannot allocate memory' (errno=12)

#

# There is insufficient memory for the Java Runtime Environment to continue.

# Native memory allocation (mmap) failed to map 702021632 bytes to committing reserved memory.

# An error report file with more information is saved as:

# /home/dx/myProj/appApp/bin/hs_err_pid53561.log

[dx@my-linux-01 bin]$

现场对spark application提交节点进行分析发现占用进程主要是（yarn client集成占用）：

[dx@my-linux-01 bin]$ top

PID     USER  PR  NI    VIRT     RES  SHR   S  %CPU   %MEM   TIME+    COMMAND

122236  dx    20  0  20.629g  1.347g  3520  S   0.3    2.1   7:02.42     java

122246  dx    20  0  20.629g  1.311g  3520  S   0.3    2.0   7:03.42     java

122236  dx    20  0  20.629g  1.288g  3520  S   0.3    2.2   7:05.83     java

122346  dx    20  0  20.629g  1.344g  3520  S   0.3    2.1   7:10.42     java

121246  dx    20  0  20.629g  1.343g  3520  S   0.3    2.3   7:01.42     java

122346  dx    20  0  20.629g  1.341g  3520  S   0.3    2.4   7:03.39     java

112246  dx    20  0  20.629g  1.344g  3520  S   0.3    2.0   7:02.42     java

............

112260  dx    20  0  20.629g  1.344g  3520  S   0.3    2.0   7:02.02     java

112260  dx    20  0  113116      200     0  S   0.0    0.0   0:00.00     sh

............

Yarn提交Spark任务分析：

yarn方式提交spark application包含两种：

1）yarn-client（spark-submit --master yarn --deploy-mode client ...）：

这种方式spark提交application任务之后，driver运行在提交服务器节点，且driver运行yarn的client进程中，因此如果关闭了提交服务器节点上client进程会导致driver被关闭，进而导致application被关闭。

2）yarn-cluster（spark-submit --master yarn --deploy-mode cluster）：

这种方式spark提交application任务之后，driver运行yarn分配container内，container内分配一个AM(Application Master)进程，SparkContext(driver)运行在该AM内，在yarn提交时，在提交节点上也会启动一个yarn的client进程，默认yarn-client方式提交完application后会等待任务结束（failed,finished等），否则会一直运行。

解决方案：

yarn.client的参数

spark.yarn.submit.waitAppCompletion

如果设置这个参数为true 的话，client将会一直运行并且报告application的状态直到application退出（无论何种原因）；

如果设置这个参数为false的话，client的进程将会在application提交后退出。

在spark-submit 参数添加参数

./bin/spark-submit.sh \

--master yarn \

--deploy-mode cluster \

--conf spark.yarn.submit.waitAppCompletion=false

....

对应yarn.client类中代码位置：

  /**

   * Submit an application to the ResourceManager.

   * If set spark.yarn.submit.waitAppCompletion to true, it will stay alive

   * reporting the application's status until the application has exited for any reason.

   * Otherwise, the client process will exit after submission.

   * If the application finishes with a failed, killed, or undefined status,

   * throw an appropriate SparkException.

   */

  def run(): Unit = {

    this.appId = submitApplication()

    if (!launcherBackend.isConnected() && fireAndForget) {

      val report = getApplicationReport(appId)

      val state = report.getYarnApplicationState

      logInfo(s"Application report for $appId (state: $state)")

      logInfo(formatReportDetails(report))

      if (state == YarnApplicationState.FAILED || state == YarnApplicationState.KILLED) {

        throw new SparkException(s"Application $appId finished with status: $state")

      }

    } else {

      val (yarnApplicationState, finalApplicationStatus) = monitorApplication(appId)

      if (yarnApplicationState == YarnApplicationState.FAILED ||

        finalApplicationStatus == FinalApplicationStatus.FAILED) {

        throw new SparkException(s"Application $appId finished with failed status")

      }

      if (yarnApplicationState == YarnApplicationState.KILLED ||

        finalApplicationStatus == FinalApplicationStatus.KILLED) {

        throw new SparkException(s"Application $appId is killed")

      }

      if (finalApplicationStatus == FinalApplicationStatus.UNDEFINED) {

        throw new SparkException(s"The final status of application $appId is undefined")

      }

    }

  }