Metrics是以MetricsGroup来组织的

MetricGroup

MetricGroup

这就是个metric容器,里面可以放subGroup,或者各种metric

所以主要的接口就是注册,

/**
* A MetricGroup is a named container for {@link Metric Metrics} and further metric subgroups.
*
* <p>Instances of this class can be used to register new metrics with Flink and to create a nested
* hierarchy based on the group names.
*
* <p>A MetricGroup is uniquely identified by it's place in the hierarchy and name.
*/
public interface MetricGroup {
<C extends Counter> C counter(int name, C counter);
<T, G extends Gauge<T>> G gauge(int name, G gauge);
<H extends Histogram> H histogram(String name, H histogram);
MetricGroup addGroup(String name);
}

 

AbstractMetricGroup

关键是实现MetricGroup,逻辑很简单,在注册或close的时候都需要加锁互斥

/**
* Abstract {@link MetricGroup} that contains key functionality for adding metrics and groups.
*
*/ public abstract class AbstractMetricGroup implements MetricGroup { /** The registry that this metrics group belongs to */
protected final MetricRegistry registry; /** All metrics that are directly contained in this group */
private final Map<String, Metric> metrics = new HashMap<>(); /** All metric subgroups of this group */
private final Map<String, AbstractMetricGroup> groups = new HashMap<>(); /** The metrics scope represented by this group.
* For example ["host-7", "taskmanager-2", "window_word_count", "my-mapper" ]. */
private final String[] scopeComponents; //命名空间 /** The metrics scope represented by this group, as a concatenated string, lazily computed.
* For example: "host-7.taskmanager-2.window_word_count.my-mapper" */
private String scopeString; @Override
public <C extends Counter> C counter(String name, C counter) {
addMetric(name, counter);
return counter;
} /**
* Adds the given metric to the group and registers it at the registry, if the group
* is not yet closed, and if no metric with the same name has been registered before.
*
* @param name the name to register the metric under
* @param metric the metric to register
*/
protected void addMetric(String name, Metric metric) {
// add the metric only if the group is still open
synchronized (this) { //加锁
if (!closed) {
// immediately put without a 'contains' check to optimize the common case (no collition)
// collisions are resolved later
Metric prior = metrics.put(name, metric); // check for collisions with other metric names
if (prior == null) {
// no other metric with this name yet registry.register(metric, name, this);
}
else {
// we had a collision. put back the original value
metrics.put(name, prior); }
}
}
}
}

 

MetricReporter

采集好的Metrics需要用reporter才能发送出去,

/**
* Reporters are used to export {@link Metric Metrics} to an external backend.
*
* <p>Reporters are instantiated via reflection and must be public, non-abstract, and have a
* public no-argument constructor.
*/
public interface MetricReporter { // ------------------------------------------------------------------------
// life cycle
// ------------------------------------------------------------------------ /**
* Configures this reporter. Since reporters are instantiated generically and hence parameter-less,
* this method is the place where the reporters set their basic fields based on configuration values.
*
* <p>This method is always called first on a newly instantiated reporter.
*
* @param config The configuration with all parameters.
*/
void open(MetricConfig config); /**
* Closes this reporter. Should be used to close channels, streams and release resources.
*/
void close(); void notifyOfAddedMetric(Metric metric, String metricName, MetricGroup group);
void notifyOfRemovedMetric(Metric metric, String metricName, MetricGroup group);
}

 

AbstractReporter实现MetricReport接口,

/**
* Base interface for custom metric reporters.
*/
public abstract class AbstractReporter implements MetricReporter, CharacterFilter {
protected final Logger log = LoggerFactory.getLogger(getClass()); protected final Map<Gauge<?>, String> gauges = new HashMap<>();
protected final Map<Counter, String> counters = new HashMap<>();
protected final Map<Histogram, String> histograms = new HashMap<>(); @Override
public void notifyOfAddedMetric(Metric metric, String metricName, MetricGroup group) {
final String name = group.getMetricIdentifier(metricName, this); //group只是用来获取metrics完整的name synchronized (this) {
if (metric instanceof Counter) {
counters.put((Counter) metric, name);
} else if (metric instanceof Gauge) {
gauges.put((Gauge<?>) metric, name);
} else if (metric instanceof Histogram) {
histograms.put((Histogram) metric, name);
} else {
log.warn("Cannot add unknown metric type {}. This indicates that the reporter " +
"does not support this metric type.", metric.getClass().getName());
}
}
} @Override
public void notifyOfRemovedMetric(Metric metric, String metricName, MetricGroup group) {
synchronized (this) {
if (metric instanceof Counter) {
counters.remove(metric);
} else if (metric instanceof Gauge) {
gauges.remove(metric);
} else if (metric instanceof Histogram) {
histograms.remove(metric);
} else {
log.warn("Cannot remove unknown metric type {}. This indicates that the reporter " +
"does not support this metric type.", metric.getClass().getName());
}
}
}
}

 

MetricRegistry

MetricRegistry用于连接MetricGroups和MetricReporters,

会把需要report的metric加到MetricReporters,并启动定时的report线程

/**
* A MetricRegistry keeps track of all registered {@link Metric Metrics}. It serves as the
* connection between {@link MetricGroup MetricGroups} and {@link MetricReporter MetricReporters}.
*/
public class MetricRegistry { private List<MetricReporter> reporters;
private ScheduledExecutorService executor; private final ScopeFormats scopeFormats; private final char delimiter; /**
* Creates a new MetricRegistry and starts the configured reporter.
*/
public MetricRegistry(Configuration config) {
// first parse the scope formats, these are needed for all reporters
ScopeFormats scopeFormats;
try {
scopeFormats = createScopeConfig(config); //从配置中读到scope的格式,即监控数据的namespace的格式是什么
}
catch (Exception e) {
LOG.warn("Failed to parse scope format, using default scope formats", e);
scopeFormats = new ScopeFormats();
}
this.scopeFormats = scopeFormats; char delim;
try {
delim = config.getString(ConfigConstants.METRICS_SCOPE_DELIMITER, ".").charAt(0); //从配置里面读出分隔符
} catch (Exception e) {
LOG.warn("Failed to parse delimiter, using default delimiter.", e);
delim = '.';
}
this.delimiter = delim; // second, instantiate any custom configured reporters
this.reporters = new ArrayList<>(); final String definedReporters = config.getString(ConfigConstants.METRICS_REPORTERS_LIST, null); //读出配置的Reporters if (definedReporters == null) {
// no reporters defined
// by default, don't report anything
LOG.info("No metrics reporter configured, no metrics will be exposed/reported.");
this.executor = null;
} else {
// we have some reporters so
String[] namedReporters = definedReporters.split("\\s*,\\s*");
for (String namedReporter : namedReporters) { //对于配置的每个reporter DelegatingConfiguration reporterConfig = new DelegatingConfiguration(config, ConfigConstants.METRICS_REPORTER_PREFIX + namedReporter + ".");
final String className = reporterConfig.getString(ConfigConstants.METRICS_REPORTER_CLASS_SUFFIX, null); //reporter class名配置 try {
String configuredPeriod = reporterConfig.getString(ConfigConstants.METRICS_REPORTER_INTERVAL_SUFFIX, null); //report interval配置
TimeUnit timeunit = TimeUnit.SECONDS;
long period = 10; if (configuredPeriod != null) {
try {
String[] interval = configuredPeriod.split(" ");
period = Long.parseLong(interval[0]);
timeunit = TimeUnit.valueOf(interval[1]);
}
catch (Exception e) {
LOG.error("Cannot parse report interval from config: " + configuredPeriod +
" - please use values like '10 SECONDS' or '500 MILLISECONDS'. " +
"Using default reporting interval.");
}
} Class<?> reporterClass = Class.forName(className);
MetricReporter reporterInstance = (MetricReporter) reporterClass.newInstance(); //实例化reporter MetricConfig metricConfig = new MetricConfig();
reporterConfig.addAllToProperties(metricConfig);
reporterInstance.open(metricConfig); //open reporter if (reporterInstance instanceof Scheduled) {
if (this.executor == null) {
executor = Executors.newSingleThreadScheduledExecutor(); //创建Executor
}
LOG.info("Periodically reporting metrics in intervals of {} {} for reporter {} of type {}.", period, timeunit.name(), namedReporter, className); executor.scheduleWithFixedDelay(
new ReporterTask((Scheduled) reporterInstance), period, period, timeunit); //Scheduled report
}
reporters.add(reporterInstance); //加入reporters列表
}
catch (Throwable t) {
shutdownExecutor();
LOG.error("Could not instantiate metrics reporter" + namedReporter + ". Metrics might not be exposed/reported.", t);
}
}
}
} // ------------------------------------------------------------------------
// Metrics (de)registration
// ------------------------------------------------------------------------ /**
* Registers a new {@link Metric} with this registry.
*
* @param metric the metric that was added
* @param metricName the name of the metric
* @param group the group that contains the metric
*/
public void register(Metric metric, String metricName, MetricGroup group) { //在AbstractMetricGroup.addMetric中被调用,metric被加到group的同时也会加到reporter中
        try {
if (reporters != null) {
for (MetricReporter reporter : reporters) {
if (reporter != null) {
reporter.notifyOfAddedMetric(metric, metricName, group); //把metric加到每个reporters上面
}
}
}
} catch (Exception e) {
LOG.error("Error while registering metric.", e);
}
} /**
* Un-registers the given {@link org.apache.flink.metrics.Metric} with this registry.
*
* @param metric the metric that should be removed
* @param metricName the name of the metric
* @param group the group that contains the metric
*/
public void unregister(Metric metric, String metricName, MetricGroup group) {
try {
if (reporters != null) {
for (MetricReporter reporter : reporters) {
if (reporter != null) {
reporter.notifyOfRemovedMetric(metric, metricName, group);
}
}
}
} catch (Exception e) {
LOG.error("Error while registering metric.", e);
}
} // ------------------------------------------------------------------------ /**
* This task is explicitly a static class, so that it does not hold any references to the enclosing
* MetricsRegistry instance.
*
* This is a subtle difference, but very important: With this static class, the enclosing class instance
* may become garbage-collectible, whereas with an anonymous inner class, the timer thread
* (which is a GC root) will hold a reference via the timer task and its enclosing instance pointer.
* Making the MetricsRegistry garbage collectible makes the java.util.Timer garbage collectible,
* which acts as a fail-safe to stop the timer thread and prevents resource leaks.
*/
private static final class ReporterTask extends TimerTask { private final Scheduled reporter; private ReporterTask(Scheduled reporter) {
this.reporter = reporter;
} @Override
public void run() {
try {
reporter.report(); //Task的核心就是调用reporter.report
} catch (Throwable t) {
LOG.warn("Error while reporting metrics", t);
}
}
}
}

 

TaskManager

在TaskManager中,

associateWithJobManager
metricsRegistry = new FlinkMetricRegistry(config.configuration)

taskManagerMetricGroup =
new TaskManagerMetricGroup(metricsRegistry, this.runtimeInfo.getHostname, id.toString) TaskManager.instantiateStatusMetrics(taskManagerMetricGroup)

创建metricsRegistry 和TaskManagerMetricGroup

可以看到instantiateStatusMetrics,只是注册各种taskManager的status metrics,

private def instantiateStatusMetrics(taskManagerMetricGroup: MetricGroup) : Unit = {
val jvm = taskManagerMetricGroup
.addGroup("Status")
.addGroup("JVM") instantiateClassLoaderMetrics(jvm.addGroup("ClassLoader"))
instantiateGarbageCollectorMetrics(jvm.addGroup("GarbageCollector"))
instantiateMemoryMetrics(jvm.addGroup("Memory"))
instantiateThreadMetrics(jvm.addGroup("Threads"))
instantiateCPUMetrics(jvm.addGroup("CPU"))
} private def instantiateClassLoaderMetrics(metrics: MetricGroup) {
val mxBean = ManagementFactory.getClassLoadingMXBean //从ManagementFactory可以取出表示JVM指标的MXBean metrics.gauge[Long, FlinkGauge[Long]]("ClassesLoaded", new FlinkGauge[Long] {
override def getValue: Long = mxBean.getTotalLoadedClassCount
})
metrics.gauge[Long, FlinkGauge[Long]]("ClassesUnloaded", new FlinkGauge[Long] {
override def getValue: Long = mxBean.getUnloadedClassCount
})
}

 

在submitTask的时候,

submitTask
  val taskMetricGroup = taskManagerMetricGroup.addTaskForJob(tdd)

  val task = new Task(
tdd,
memoryManager,
ioManager,
network,
bcVarManager,
selfGateway,
jobManagerGateway,
config.timeout,
libCache,
fileCache,
runtimeInfo,
taskMetricGroup)

看到会为每个task,创建taskMetricGroup

并在创建Task对象的时候传入该对象,

Environment env = new RuntimeEnvironment(jobId, vertexId, executionId,
executionConfig, taskInfo, jobConfiguration, taskConfiguration,
userCodeClassLoader, memoryManager, ioManager,
broadcastVariableManager, accumulatorRegistry,
splitProvider, distributedCacheEntries,
writers, inputGates, jobManager, taskManagerConfig, metrics, this); // let the task code create its readers and writers
invokable.setEnvironment(env);

在Task中, 关键的就是把这个taskMetricGroup,加入RuntimeEnvironment,这样在实际逻辑中,就可以通过RuntimeEnvironment获取到metrics

而StreamTask就是一种Invokable,接口定义如下

public abstract class AbstractInvokable {

    /** The environment assigned to this invokable. */
private Environment environment; /**
* Starts the execution.
*
* <p>Must be overwritten by the concrete task implementation. This method
* is called by the task manager when the actual execution of the task
* starts.
*
* <p>All resources should be cleaned up when the method returns. Make sure
* to guard the code with <code>try-finally</code> blocks where necessary.
*
* @throws Exception
* Tasks may forward their exceptions for the TaskManager to handle through failure/recovery.
*/
public abstract void invoke() throws Exception; /**
* Sets the environment of this task.
*
* @param environment
* the environment of this task
*/
public final void setEnvironment(Environment environment) {
this.environment = environment;
} /**
* Returns the environment of this task.
*
* @return The environment of this task.
*/
public Environment getEnvironment() {
return this.environment;
}
}

 

所以在StreamTask里面可以这样使用metrics,

getEnvironment().getMetricGroup().gauge("lastCheckpointSize", new Gauge<Long>() {
@Override
public Long getValue() {
return StreamTask.this.lastCheckpointSize;
}
});

Flink - metrics的更多相关文章

  1. Flink Metrics 源码解析

    Flink Metrics 有如下模块: Flink Metrics 源码解析 -- Flink-metrics-core Flink Metrics 源码解析 -- Flink-metrics-da ...

  2. 深入理解Flink ---- Metrics的内部结构

    从Metrics的使用说起 Flink的Metrics种类有四种Counters, Gauges, Histograms和Meters. 如何使用Metrics呢? 以Counter为例, publi ...

  3. Flink – metrics V1.2

    WebRuntimeMonitor   .GET("/jobs/:jobid/vertices/:vertexid/metrics", handler(new JobVertexM ...

  4. Apache Flink 进阶(八):详解 Metrics 原理与实战

    本文由 Apache Flink Contributor 刘彪分享,本文对两大问题进行了详细的介绍,即什么是 Metrics.如何使用 Metrics,并对 Metrics 监控实战进行解释说明. 什 ...

  5. Flink写入kafka时,只写入kafka的部分Partitioner,无法写所有的Partitioner问题

    1. 写在前面 在利用flink实时计算的时候,往往会从kafka读取数据写入数据到kafka,但会发现当kafka多个Partitioner时,特别在P量级数据为了kafka的性能kafka的节点有 ...

  6. flink metric库的使用和自定义metric-reporter

    简单介绍 flink内部实现了一套metric数据收集库. 同时flink自身系统有一些固定的metric数据, 包括系统的一些指标,CPU,内存, IO 或者各个task运行的一些指标.具体包含那些 ...

  7. Flink知识点

    1. Flink.Storm.Sparkstreaming对比 Storm只支持流处理任务,数据是一条一条的源源不断地处理,而MapReduce.spark只支持批处理任务,spark-streami ...

  8. Flink 灵魂两百问,这谁顶得住?

    Flink 学习 https://github.com/zhisheng17/flink-learning 麻烦路过的各位亲给这个项目点个 star,太不易了,写了这么多,算是对我坚持下来的一种鼓励吧 ...

  9. Flink 从0到1学习 —— Flink 中如何管理配置?

    前言 如果你了解 Apache Flink 的话,那么你应该熟悉该如何像 Flink 发送数据或者如何从 Flink 获取数据.但是在某些情况下,我们需要将配置数据发送到 Flink 集群并从中接收一 ...

随机推荐

  1. zjoi2016 day1【bzoj4455】【bzoj4456】

    首先做了T2的旅行者,看到bz上面过的人数比较多.. 考试的时候完全没有想太多.一闪而过了分块思想,然后就没有然后了.. 大视野上面有题解,竟然是一个初中生写的..? 正解其实是“分治”,每次选择中轴 ...

  2. Azkaban 2.5.0 搭建

    一.前言 最近试着参照官方文档搭建 Azkaban,发现文档很多地方有坑,所以在此记录一下. 二.环境及软件 安装环境: 系统环境: ubuntu-12.04.2-server-amd64 安装目录: ...

  3. requestAnimationFrame制作动画:旋转风车

    在以往,我们在网页上制作动画效果的时候,如果是用javascript实现,一般都是通过定时器和间隔来实现的,出现HTML5之后,我们还可以用CSS3 的transitions和animations很方 ...

  4. Win10 利用安装盘启用 .NET Framework 3.5

    以管理员身份运行命令提示符,在“管理员:命令提示符”窗口中输入以下命令:dism.exe /online /enable-feature /featurename:netfx3 /Source:D:\ ...

  5. linux命令之 top, free,ps

    linux终端查看cpu和内存使用情况 t一.op进入全屏实时系统资源使用信息查看 PID:进程的ID USER:进程所有者 PR:进程的优先级别,越小越优先被执行 NInice:值 VIRT:进程占 ...

  6. window通过mstsc远程连接其它计算机

    1.Windows远程连接树莓派 1.1.Win + r 出现下面界面. 1.2.输入mstsc今日下面界面 1.3.出现警告,选“是” 1.4.输入账户密码,点“OK”

  7. 复制 VS 复用 -04

    (续上篇)         小菜:“我明白了,他说用任意一种面向对象语言实现,那意思就是要用面向对象的编程方法去实现,对吗?OK,这个我学过,只不过当时我没想到而已.” 大鸟:“所有编程初学者都会有这 ...

  8. checkbox全选与反选

    用原生js跟jquery实现checkbox全选反选的一个例子 原生js: <!DOCTYPE html> <html lang="en"> <hea ...

  9. Optimizely:在线网站A/B测试平台

    Optimizely:在线网站A/B测试平台是一家提供 A/B 测试服务的公司.A/B 测试能够对比不同版本的设计,选取更吸引用户眼球的那一款,从而带来更为优化的个人体验.让网站所有者易于对不同版本的 ...

  10. CAS单点登录中文用户名乱码问题

    CAS单点登录中文用户名乱码问题,有两种情况 1. CAS server乱码 即在向server端提交用户名和密码时,发生了乱码,解决方法是: 打开WEB-INF/web.xml,在其它的Filter ...