Spark(五十):使用JvisualVM监控Spark Executor JVM
引导
Windows环境下JvisulaVM一般存在于安装了JDK的目录${JAVA_HOME}/bin/JvisualVM.exe,它支持(本地和远程)jstatd和JMX两种方式连接远程JVM。
jstatd (Java Virtual Machine jstat Daemon)——监听远程服务器的CPU,内存,线程等信息
JMX(Java Management Extensions,即Java管理扩展)是一个为应用程序、设备、系统等植入管理功能的框架。JMX可以跨越一系列异构操作系统平台、系统体系结构和网络传输协议,灵活的开发无缝集成的系统、网络和服务管理应用。
备注:针对jstatd我尝试未成功,因此也不在这里误导别人。
JMX监控
正常配置:
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Djava.rmi.server.hostname=<ip>
-Dcom.sun.management.jmxremote.port=<port>
添加JMX配置:
在Spark中监控executor时,需要先配置jmx然后再启动spark应用程序,配置方式有三种:
1)在spark-defaults.conf中配置那三个参数
2)在spark-env.sh中配置:配置master,worker的JavaOptions
3)在spark-submit提交时配置
这里采用以下spark-submit提交时配置:
spark-submit \
--class myTest.KafkaWordCount \
--master yarn \
--deploy-mode cluster \
--conf "spark.executor.extraJavaOptions=-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=0 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false" \
--verbose \
--executor-memory 1G \
--total-executor-cores \
/hadoop/spark/app/spark//testSpark.jar *.*.*.*:* test3 wordcount kafkawordcount3 checkpoint4
注意:
1)不能指定具体的 ip 和 port------因为spark中运行时,很可能一个节点上分配多个container进程,此时占用同一个端口,会导致spark应用程序通过spark-submit提交失败。
2)因为不指定具体的ip和port,所以在任务提交阶段会自动分配端口。
3)上边三种配置方式可能会导致监控级别不同(比如spark-submit只针对一个应用程序,spark-env.sh可能是全局一个节点所有executor监控【未验证】,请读者注意。)
查找JMX分配端口
通过yarn applicationattempt -list appicationId查找到applicationattemptid
[root@cdh- bin]# yarn applicationattempt -list application_1559203334026_0015
// :: INFO client.RMProxy: Connecting to ResourceManager at CDH-/10.dx.dx.143:
Total number of application attempts :
ApplicationAttempt-Id State AM-Container-Id Tracking-URL
appattempt_1559203334026_0015_000001 RUNNING container_1559203334026_0015_01_000001 http://CDH-143:8088/proxy/application_1559203334026_0015/
通过yarn container -list aaplicationattemptId查找container id list
[root@cdh- bin]# yarn container -list appattempt_1559203334026_0015_000001
// :: INFO client.RMProxy: Connecting to ResourceManager at CDH-/10.dx.dx.143:
Total number of containers :
Container-Id Start Time Finish Time State Host LOG-URL
container_1559203334026_0015_01_000012 Sat Jun :: + N/A RUNNING CDH-: http://CDH-146:8042/node/containerlogs/container_1559203334026_0015_01_000012/dx
container_1559203334026_0015_01_000013 Sat Jun :: + N/A RUNNING CDH-: http://CDH-146:8042/node/containerlogs/container_1559203334026_0015_01_000013/dx
container_1559203334026_0015_01_000010 Sat Jun :: + N/A RUNNING CDH-: http://CDH-146:8042/node/containerlogs/container_1559203334026_0015_01_000010/dx
container_1559203334026_0015_01_000011 Sat Jun :: + N/A RUNNING CDH-: http://CDH-146:8042/node/containerlogs/container_1559203334026_0015_01_000011/dx
container_1559203334026_0015_01_000016 Sat Jun :: + N/A RUNNING CDH-: http://CDH-146:8042/node/containerlogs/container_1559203334026_0015_01_000016/dx
container_1559203334026_0015_01_000014 Sat Jun :: + N/A RUNNING CDH-: http://CDH-146:8042/node/containerlogs/container_1559203334026_0015_01_000014/dx
container_1559203334026_0015_01_000015 Sat Jun :: + N/A RUNNING CDH-: http://CDH-146:8042/node/containerlogs/container_1559203334026_0015_01_000015/dx
container_1559203334026_0015_01_000004 Sat Jun :: + N/A RUNNING CDH-: http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000004/dx
container_1559203334026_0015_01_000005 Sat Jun :: + N/A RUNNING CDH-: http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000005/dx
container_1559203334026_0015_01_000002 Sat Jun :: + N/A RUNNING CDH-: http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000002/dx
container_1559203334026_0015_01_000003 Sat Jun :: + N/A RUNNING CDH-: http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000003/dx
container_1559203334026_0015_01_000008 Sat Jun :: + N/A RUNNING CDH-: http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000008/dx
container_1559203334026_0015_01_000009 Sat Jun :: + N/A RUNNING CDH-: http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000009/dx
container_1559203334026_0015_01_000006 Sat Jun :: + N/A RUNNING CDH-: http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000006/dx
container_1559203334026_0015_01_000007 Sat Jun :: + N/A RUNNING CDH-: http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000007/dx
container_1559203334026_0015_01_000001 Sat Jun :: + N/A RUNNING CDH-: http://CDH-142:8042/node/containerlogs/container_1559203334026_0015_01_000001/dx
到具体executor所在节点服务器上,使用如下命令找到运行的线程,和 pid
[root@cdh- ~]# ps -axu | grep container_1559203334026_0015_01_000013
yarn 0.0 0.0 ? S : : bash /data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/default_container_executor.sh
yarn 0.0 0.0 ? Ss : : /bin/bash -c /usr/java/jdk1..0_171-amd64/bin/java -server -Xmx6144m '-Dcom.sun.management.jmxremote' '-Dcom.sun.management.jmxremote.port=0' '-Dcom.sun.management.jmxremote.authenticate=false' '-Dcom.sun.management.jmxremote.ssl=false' -Djava.io.tmpdir=/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/tmp '-Dspark.network.timeout=10000000' '-Dspark.driver.port=47564' '-Dspark.port.maxRetries=32' -Dspark.yarn.app.container.log.dir=/data6/yarn/container-logs/application_1559203334026_0015/container_1559203334026_0015_01_000013 -XX:OnOutOfMemoryError='kill %p' org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@CDH-143:47564 --executor-id 12 --hostname CDH-146 --cores 2 --app-id application_1559203334026_0015 --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/__app__.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/streaming-dx-perf-3.0.0.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/dx-common-3.0.0.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/spark-sql-kafka-0-10_2.11-2.4.0.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/spark-avro_2.11-3.2.0.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/shc-core-1.1.2-2.2-s_2.11-SNAPSHOT.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/rocksdbjni-5.17.2.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/kafka-clients-0.10.0.1.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/elasticsearch-spark-20_2.11-6.4.1.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/dx_Spark_State_Store_Plugin-1.0-SNAPSHOT.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/bijection-core_2.11-0.9.5.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/bijection-avro_2.11-0.9.5.jar 1>/data6/yarn/container-logs/application_1559203334026_0015/container_1559203334026_0015_01_000013/stdout 2>/data6/yarn/container-logs/application_1559203334026_0015/container_1559203334026_0015_01_000013/stderr
yarn 3.3 ? Sl : : /usr/java/jdk1..0_171-amd64/bin/java -server -Xmx6144m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port= -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.io.tmpdir=/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/tmp -Dspark.network.timeout= -Dspark.driver.port= -Dspark.port.maxRetries= -Dspark.yarn.app.container.log.dir=/data6/yarn/container-logs/application_1559203334026_0015/container_1559203334026_0015_01_000013 -XX:OnOutOfMemoryError=kill %p org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@CDH-143:47564 --executor-id 12 --hostname CDH-146 --cores 2 --app-id application_1559203334026_0015 --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/__app__.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/dx-domain-perf-3.0.0.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/dx-common-3.0.0.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/spark-sql-kafka-0-10_2.11-2.4.0.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/spark-avro_2.11-3.2.0.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/shc-core-1.1.2-2.2-s_2.11-SNAPSHOT.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/rocksdbjni-5.17.2.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/kafka-clients-0.10.0.1.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/elasticsearch-spark-20_2.11-6.4.1.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/dx_Spark_State_Store_Plugin-1.0-SNAPSHOT.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/bijection-core_2.11-0.9.5.jar --user-class-path file:/data6/yarn/nm/usercache/dx/appcache/application_1559203334026_0015/container_1559203334026_0015_01_000013/bijection-avro_2.11-0.9.5.jar
root 0.0 0.0 pts/ S+ : : grep --color=auto container_1559203334026_0015_01_000013
然后通过 pid 找到对应JMX的端口
[root@cdh- ~]# sudo netstat -antp | grep
tcp 10.dx.dx.146: 0.0.0.0:* LISTEN /python2.
tcp6 ::: :::* LISTEN /java
tcp6 ::: :::* LISTEN /java
tcp6 10.dx.dx.146: :::* LISTEN /java
tcp6 10.dx.dx.146: 10.dx.dx.142: ESTABLISHED /java
tcp6 10.dx.dx.146: 10.206.186.35: ESTABLISHED /java
tcp6 10.dx.dx.146: 10.dx.dx.143: ESTABLISHED /java
结果中看,疑似为48169或37692,稍微尝试一下即可连上对应的 spark executor
使用JvisulaVM.exe工具添加监控
在本地windows服务器上找到JDK的目录,找到文件${JAVA_HOME}/bin/JvisualVM.exe,并运行它。启动后选择“远程”右键,添加JMX监控

填写监控executor所在节点ip

然后就可以启动监控:

Spark(五十):使用JvisualVM监控Spark Executor JVM的更多相关文章
- Spark(五十二):Spark Scheduler模块之DAGScheduler流程
导入 从一个Job运行过程中来看DAGScheduler是运行在Driver端的,其工作流程如下图: 图中涉及到的词汇概念: 1. RDD——Resillient Distributed Datase ...
- 使用 JvisualVM 监控 spark executor
使用 JvisualVM,需要先配置 java 的启动参数 jmx 正常情况下,如下配置 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmx ...
- Spark2.x(五十五):在spark structured streaming下sink file(parquet,csv等),正常运行一段时间后:清理掉checkpoint,重新启动app,无法sink记录(file)到hdfs。
场景: 在spark structured streaming读取kafka上的topic,然后将统计结果写入到hdfs,hdfs保存目录按照month,day,hour进行分区: 1)程序放到spa ...
- Spark2.x(五十九):yarn-cluster模式提交Spark任务,如何关闭client进程?
问题: 最近现场反馈采用yarn-cluster方式提交spark application后,在提交节点机上依然会存在一个yarn的client进程不关闭,又由于spark application都是 ...
- Spark(四十九):Spark On YARN启动流程源码分析(一)
引导: 该篇章主要讲解执行spark-submit.sh提交到将任务提交给Yarn阶段代码分析. spark-submit的入口函数 一般提交一个spark作业的方式采用spark-submit来提交 ...
- 监控Spark应用方法简介
监控Spark应用有很多种方法. Web接口每一个SparkContext启动一个web UI用来展示应用相关的一些非常有用的信息,默认在4040端口.这些信息包括: 任务和调度状态的列表RDD大小和 ...
- Spark2.3(四十二):Spark Streaming和Spark Structured Streaming更新broadcast总结(二)
本次此时是在SPARK2,3 structured streaming下测试,不过这种方案,在spark2.2 structured streaming下应该也可行(请自行测试).以下是我测试结果: ...
- Spark(十四)SparkStreaming的官方文档
一.SparkCore.SparkSQL和SparkStreaming的类似之处 二.SparkStreaming的运行流程 2.1 图解说明 2.2 文字解说 1.我们在集群中的其中一台机器上提交我 ...
- Spark应用监控解决方案--使用Prometheus和Grafana监控Spark应用
Spark任务启动后,我们通常都是通过跳板机去Spark UI界面查看对应任务的信息,一旦任务多了之后,这将会是让人头疼的问题.如果能将所有任务信息集中起来监控,那将会是很完美的事情. 通过Spark ...
随机推荐
- Java之Math类使用小结
Java的Math类封装了很多与数学有关的属性和方法,大致如下: public class Main { public static void main(String[] args) { // TOD ...
- 【转】国内CPU现状
首页 博客 学院 下载 图文课 论坛 APP CSDN CSDN学院 问答 商城 VIP会员 ...
- 私有容器镜像仓库harbor
私有镜像仓库Harbor 1.Harbor概述 Habor是由VMWare公司开源的容器镜像仓库.事实上,Habor是在Docker Registry上进行了相应的企业级扩展,从而获得了更加广泛的应用 ...
- pyspark读取hdfs 二进制文件
程序如下: from pyspark import SparkConf, SparkContext conf = SparkConf().setAppName("My test App&qu ...
- Java精通并发-锁粗化与锁消除技术实例演示与分析
在上一次https://www.cnblogs.com/webor2006/p/11446473.html中对锁的升级进行了一个比较详细的理论化的学习,先回忆一下: 编译器对于锁的优化措施: 锁消除技 ...
- Excel技巧大全
1.一列数据同时除以10000 复制10000所在单元格,选取数据区域 - 选择粘性粘贴 - 除 2.同时冻结第1行和第1列 选取第一列和第一行交汇处的墙角位置B2,窗口 - 冻结窗格 3.快速把公式 ...
- 十五.ProtoBuf3的基础总结
转自: https://blog.csdn.net/u011518120/article/details/54604615 定义一个消息类型 指定字段类型 分配标识号 指定字段规则 添加更多消息类型 ...
- windows部署tomcat
一.下载相应的JDK以及tomcat的版本 JDK:jdk-8u131-windows-x64 tomcat:apache-tomcat-8.5.23-windows-x64.zip 二.JDK的安装 ...
- xml文件整理
xml 97-2003 格式 \s*\n\s*\n\s*\n\s*\n\n(^个人补充信息.*)\n(.*)\n(^总成绩.*)$1$2\n$3(^个人补充信息.*)\n(.*)\n(.*)\n(^总 ...
- 模拟赛 提米树 题解 (DP+思维)
题意: 有一棵棵提米树,满足这样的性质: 每个点上长了一定数量的Temmie 薄片,薄片数量记为这个点的权值,这些点被标记为 1 到 n 的整数,其 中 1 号点是树的根,没有孩子的点是树上的叶子. ...