In this blog post I will show you kafka integration with ganglia, this is very interesting & important topic for those who want to do bench-marking, measure performance by monitoring specific Kafka metrics via ganglia.

Before going ahead let me briefly explain about what is Kafka and Ganglia.

Kafka – Kafka is open source distributed message broker project developed by Apache, Kafka provides a unified, high-throughput, low-latency platform for handling real-time data feeds.

Ganglia – Ganglia is distributed system for monitoring high performance computing systems such as grids, clusters etc.

Now lets get started, In this example we have a Hadoop cluster with 3 Kafka brokers, First we will see how to install and configure ganglia on these machines.

Step 1: Setup and Configure Ganglia gmetad and gmond

First thing is you need to install EPEL repo on all the nodes

yum install epel-release

On master node (ganglia-server) download below packages

yum install rrdtool ganglia ganglia-gmetad ganglia-gmond ganglia-web httpdphpaprapr-util

On slave nodes (ganglia-client) download below packages

yum install ganglia-gmond

On master node do the following

chown apache:apache -R /var/www/html/ganglia

Edit below config file and allow ganglia webpage from any IP

vi /etc/httpd/conf.d/ganglia.conf

It should look like below:

#
# Ganglia monitoring system php web frontend
#
Alias /ganglia /usr/share/ganglia
<Location /ganglia>
Order deny,allow
Allow from all                    #this is very important or else you won’t be able to see ganglia web UI
Allow from 127.0.0.1
Allow from ::1
# Allow from .example.com
</Location>

On master node edit gmetadconfig file and it should look like below (Please change highlighted IP address to your ganglia-server private IP address)

#cat /etc/ganglia/gmetad.conf |grep -v ^#
data_source "hadoopkafka" 172.30.0.81:8649
gridname "Hadoop-Kafka"
setuid_username ganglia
case_sensitive_hostnames 0

On master node edit gmond.conf, keep other parameters to default except below ones

Copy gmond.conf to all other nodes in the cluster

cluster {
name = "hadoopkafka"
owner = "unspecified"
latlong = "unspecified"
url = "unspecified"
}
/* The host section describes attributes of the host, like the location */
host {
location = "unspecified"
}
/* Feel free to specify as many udp_send_channels as you like. Gmond
used to only support having a single channel */
udp_send_channel {
#bind_hostname = yes # Highly recommended, soon to be default.
                       # This option tells gmond to use a source address
                       # that resolves to the machine's hostname. Without
                       # this, the metrics may appear to come from any
                       # interface and the DNS names associated with
                       # those IPs will be used to create the RRDs.
#mcast_join = 239.2.11.71
host = 172.30.0.81
port = 8649
#ttl = 1
}
/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
#mcast_join = 239.2.11.71
port = 8649
#bind = 239.2.11.71
#retry_bind = true
# Size of the UDP buffer. If you are handling lots of metrics you really
# should bump it up to e.g. 10MB or even higher.
# buffer = 10485760
}

Start apache service on master node

service httpd start

Start gmetad service on master node

service gmetad start

Start gmond service on every node in the server

service gmond start

This is it! Now you can see basic ganglia metrics by visiting web UI at http://IP-address-of-ganglia-server/ganglia

Step 2: Ganglia Integration with Kafka

Enable JMX Monitoring for Kafka Brokers

In order to get custom Kafka metrics we need to enable JMX monitoring for Kafka Broker Daemon.

To enable JMX Monitoring for Kafka broker, please follow below instructions:

Edit kafka-run-class.sh and modify KAFKA_JMX_OPTS variable like below (please replace red text with your Kafka Broker hostname)

KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=kafka.broker.hostname -Djava.net.preferIPv4Stack=true"

Add below line in kafka-server-start.sh (in case of Hortonworks hadoop, path is /usr/hdp/current/kafka-broker/bin/kafka-server-start.sh)

export JMX_PORT=${JMX_PORT:-9999}

That’s it! Please do the above steps on all Kafka brokers and restart the kafka brokers ( manually or via management UI whatever applicable)

Verify that JMX port has been enabled!

You can use jconsole to do so.

Download, install and configure jmxtrans

Download jmxtrans rpm from below link and install it using rpm command

http://code.google.com/p/jmxtrans/downloads/detail?name=jmxtrans-250-0.noarch.rpm&can=2&q=

Once you have installed jmxtrans, please make sure that java &jps configured in $PATH variable

Write a JSON for fetching MBeans on each Kafka Broker.

I have written JSON for monitoring custom Kafka metrics, please download it from here.

Please note that, you need to replace “IP_address_of_kafka_broker” with your kafka broker’s IP address in downloaded JSON, same is the case for ganglia server’s IP address.

Once you are done with writing JSON, please verify the syntax using any online JSON validator( http://jsonlint.com/ ).

Start the jmxtrans using below command

cd /usr/share/jmxtrans/
sh jmxtrans.sh start $name-of-the-json-file

Verify that jmxtrans has started successfully using simple “ps” command

Repeat above procedure on all Kafka brokers

Verify custom metrics

Login to ganglia server and go to rrd directory ( by default it is /var/lib/ganglia/rrds/ ) and check if there are new rrd files for kafka metrics.

You should see output like below (output is truncated)

Go to ganglia web UI –> select hadoopkafka from below highlighted dropdown

Select “custom.metrics” from below highlighted dropdown

That’s all!

Kafka integration with Ganglia的更多相关文章

Structured Streaming + Kafka Integration Guide 结构化流+Kafka集成指南 (Kafka broker version 0.10.0 or higher)
用于Kafka 0.10的结构化流集成从Kafka读取数据并将数据写入到Kafka. 1. Linking 对于使用SBT/Maven项目定义的Scala/Java应用程序,用以下工件artifact ...
Spark Streaming + Kafka Integration Guide原文翻译及解析
前面写了关于kafka和spark streaming的结合使用(https://www.cnblogs.com/qfxydtk/p/11662591.html),其具体使用用法其实来自于原文:htt ...
Spark踩坑记——Spark Streaming+Kafka
[TOC] 前言在WeTest舆情项目中,需要对每天千万级的游戏评论信息进行词频统计,在生产者一端,我们将数据按照每天的拉取时间存入了Kafka当中,而在消费者一端,我们利用了spark strea ...
Spark Streaming+Kafka
Spark Streaming+Kafka 前言在WeTest舆情项目中,需要对每天千万级的游戏评论信息进行词频统计,在生产者一端,我们将数据按照每天的拉取时间存入了Kafka当中,而在消费者一端, ...
Spark集群 + Akka + Kafka + Scala 开发(4) : 开发一个Kafka + Spark的应用
前言在Spark集群 + Akka + Kafka + Scala 开发(1) : 配置开发环境中,我们已经部署好了一个Spark的开发环境. 在Spark集群 + Akka + Kafka + S ...
5分钟spark streaming实践之与kafka联姻
你:kafka是什么? 我:嗯,这个嘛..看官网. Apache Kafka® is a distributed streaming platform Kafka is generally used ...
Offset Management For Apache Kafka With Apache Spark Streaming
An ingest pattern that we commonly see being adopted at Cloudera customers is Apache Spark Streaming ...
Spark streaming消费Kafka的正确姿势
前言在游戏项目中,需要对每天千万级的游戏评论信息进行词频统计,在生产者一端,我们将数据按照每天的拉取时间存入了Kafka当中,而在消费者一端,我们利用了spark streaming从kafka中不 ...
【Spark】SparkStreaming-输出到Kafka
SparkStreaming-输出到Kafka sparkstreaming output kafka_百度搜索 SparkStreaming采用直连方式(Direct Approach)获取Kafk ...

随机推荐

cmd中输入net start mysql 提示：服务名无效或者MySQL正在启动 MySQL无法启动
在DOS窗口.gitbush以及一些可以使用的命令行工具的界面上,输入:net stop mysql.net start mysql时,总是提示:服务名无效. 出现提示如下: 原因是:因为net st ...
nginx部署静态网站
实验环境服务器:centos7.5 1核1G Nginx版本:nginx-1.14.2 主题部署静态文件根据不同url请求路径,定向到不同的系统文件夹部署静态文件假设nginx安装在“/us ...
[Nuget]Nuget命令行工具安装
下载地址:https://www.nuget.org/downloads 直接下最新推荐版本(recommended latest)就好了. 是个单一的nuget.exe文件. 安装配置想要在wi ...
springboot集合jpa使用
现目前java中用较多的数据库操作框架主要有:ibatis,mybatis,hibernate:今天分享的是jpa框架,在springboot框架中能够很快并方便的使用它,就我个人而言觉得如果是做业务 ...
【Caffe篇】--Caffe solver层从初始到应用
一.前述 solve主要是定义求解过程,超参数的二.具体 #往往loss function是非凸的,没有解析解,我们需要通过优化方法来求解. #caffe提供了六种优化算法来求解最优参数,在solv ...
ToastMiui【仿MIUI的带有动画的Toast】
版权声明:本文为HaiyuKing原创文章,转载请注明出处! 前言仿MIUI的带有动画的Toast 效果图代码分析 ToastMiui类基于WindowManager 为了和Toast用法保持一致 ...
stylus 详解与引入
Stylus介绍及特点Stylus 是一个基于Node.js的CSS的预处理框架,诞生于2010年,比较年轻,可以说是一种新型语言,其本质上做的事情与 Sass/LESS 等类似, 可以以近似脚本的方 ...
Python编程从入门到实践笔记——异常和存储数据
Python编程从入门到实践笔记——异常和存储数据 #coding=gbk #Python编程从入门到实践笔记——异常和存储数据 #10.3异常 #Python使用被称为异常的特殊对象来管理程序执行期 ...
从PRISM开始学WPF（三）Prism-Region-更新至Prism7.1
[7.1update]在开始前,我们先看下版本7.1中在本实例中的改动. 首先,项目文件中没有了Bootstrapper.cs,在上一篇的末尾,我们说过了,在7.1中,不见推荐使用Bootstrapp ...
SLAM+语音机器人DIY系列：（八）高阶拓展——1.miiboo机器人安卓手机APP开发
android要与ROS通讯,一种是基于rosbridge,另一种是基于rosjava库. 相关参考例子工程 rosbridge例子: https://github.com/hibernate2011 ...

Kafka integration with Ganglia