How to manage and balance “Huge Data Load” for Big Kafka Clusters---reference
1. Add Partition Tool
Partitions act as unit of parallelism. Messages of a single topic are distributed to multiple partitions that can be stored and served on different servers. Upon creation of a topic, the number of partitions for this topic has to be specified. Later on more partitions may be needed for this topic when the volume of this topic increases. This tool helps to add more partitions for a specific topic and also allow manual replica assignment of the added partitions. You can refer to the previous blog Quick steps : Have a Kafka Cluster Up & Running in 3 minutes to setup kafka cluster and create topics.
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
bin/kafka-add-partitions.shOption Description------ -------------partition <Integer: # of partitions> REQUIRED: Number of partitions to addto the topic--replica-assignment-list For manually assigning replicas to<broker_id_for_part1_replica1 : brokers for the new partitionsbroker_id_for_part1_replica2, (default: )broker_id_for_part2_replica1 :broker_id_for_part2_replica2, ...>--topic <topic> REQUIRED: The topic for whichpartitions need to be added.--zookeeper <urls> REQUIRED: The connection string forthe zookeeper connection in the formhost:port. Multiple URLS can begiven to allow fail-over. |
2. Reassign Partitions Tool
What does the tool do?
The goal of this tool is similar to the Referred Replica Leader Election Tool as to achieve load balance across brokers. But instead of only electing a new leader from the assigned replicas of a partition, this tool allows to change the assigned replicas of partitions – remember that followers also need to fetch from leaders in order to keep in sync, hence sometime only balance the leadership load is not enough.
A summary of the steps that the tool does is shown below -
1. The tool updates the zookeeper path "/admin/reassign_partitions" with the list of topic partitions and (if specified in the Json file) the list of their new assigned replicas.
2. The controller listens to the path above. When a data change update is triggered, the controller reads the list of topic partitions and their assigned replicas from zookeeper.
3. For each topic partition, the controller does the following:
3.1. Start new replicas in RAR - AR (RAR = Reassigned Replicas, AR = original list of Assigned Replicas)
3.2. Wait until new replicas are in sync with the leader
3.3. If the leader is not in RAR, elect a new leader from RAR
3.4 4. Stop old replicas AR - RAR
3.5. Write new AR
3.6. Remove partition from the /admin/reassign_partitions path
How to use the tool?
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
|
bin/kafka-reassign-partitions.shbin/kafka-reassign-partitions.shOption Description------ -------------broker-list <brokerlist> The list of brokers to which thepartitions need to be reassigned inthe form "0,1,2". This is requiredfor automatic topic reassignment.--execute [execute] This option does the actualreassignment. By default, the tooldoes a dry run--manual-assignment-json-file <manual The JSON file with the list of manualassignment json file path> reassignmentsThis option or topics-to-move-json-file needs to bespecified. The format to use is -{"partitions":[{"topic": "foo","partition": 1,"replicas": [1,2,3] }],"version":1}--topics-to-move-json-file <topics to The JSON file with the list of topicsreassign json file path> to reassign.This option or manual-assignment-json-file needs to bespecified. The format to use is -{"topics":[{"topic": "foo"},{"topic": "foo1"}],"version":1}--zookeeper <urls> REQUIRED: The connection string forthe zookeeper connection in the formhost:port. Multiple URLS can begiven to allow fail-over. |
3. Add Brokers(Cluster Expansion)
Cluster expansion involves including brokers with new broker ids in a Kafka 08 cluster. Typically, when you add new brokers to a cluster, they will not receive any data from existing topics until this tool is run to assign existing topics/partitions to the new brokers. The tool allows 2 options to make it easier to move some topics in bulk to the new brokers. These 2 options are a) topics to move b) list of newly added brokers. Using these 2 options, the tool automatically figures out the placements of partitions for the topics on the new brokers.
The following example moves 2 topics (foo1, foo2) to newly added brokers in a cluster (5,6,7).
|
1
2
3
4
5
6
7
|
> ./bin/kafka-reassign-partitions.sh --topics-to-move-json-file topics-to-move.json --broker-list "5,6,7" --execute> cat topics-to-move.json{"topics":[{"topic": "foo1"},{"topic": "foo2"}],"version":1} |
Selectively moving some partitions to a broker
The partition movement tool can also be moved to selectively move some replicas for certain partitions over to a particular broker. Typically, if you end up with an unbalanced cluster, you can use the tool in this mode to selectively move partitions around. In this mode, the tool takes a single file which has a list of partitions to move and the replicas that each of those partitions should be assigned to.
The following example moves 1 partition (foo-1) from replicas 1,2,3 to 1,2,4
|
1
2
3
4
5
6
7
8
9
10
|
> ./bin/kafka-reassign-partitions.sh --manual-assignment-json-file partitions-to-move.json --execute> cat partitions-to-move.json{"partitions":[{"topic": "foo","partition": 1,"replicas": [1,2,4] }],}],"version":1} |
Note : These tools are available in version 0.8 , not prior versions.
How to manage and balance “Huge Data Load” for Big Kafka Clusters---reference的更多相关文章
- The package 'MySql.Data' tried to add a framework reference to 'System.Runtime' which was not found in the GAC
最近在学习Visual Studio连接mysql EF模型.在nuget中安装mysql.data时总是提示The package 'MySql.Data' tried to add a frame ...
- Automatically migrating data to new machines kafka集群扩充迁移topic
The partition reassignment tool can be used to move some topics off of the current set of brokers to ...
- kafaka quickstart
http://kafka.apache.org/ http://kafka.apache.org/downloads cd /root/kafuka/kafka_2.12-0.11.0.0 nohup ...
- load data(sql)
一般对于数据库表的插入操作,我们都会写程序执行插入sql,插入的数据少还可以,如果数据多了.执行效率上可能就不太理想了.load data语句用于高速地从一个文本文件中读取数据,装载到一个表中,相比于 ...
- How Network Load Balancing Technology Works--reference
http://technet.microsoft.com/en-us/library/cc756878(v=ws.10).aspx In this section Network Load Balan ...
- Transferring Data Between ASP.NET Web Pages
14 July 2012 20:24 http://www.mikesdotnetting.com/article/192/transferring-data-between-asp-net-web- ...
- Managing Spark data handles in R
When working with big data with R (say, using Spark and sparklyr) we have found it very convenient t ...
- Building the Unstructured Data Warehouse: Architecture, Analysis, and Design
Building the Unstructured Data Warehouse: Architecture, Analysis, and Design earn essential techniqu ...
- Android开发训练之第五章第三节——Transferring Data Without Draining the Battery
Transferring Data Without Draining the Battery GET STARTED DEPENDENCIES AND PREREQUISITES Android 2. ...
随机推荐
- [转]HTTP Error 400. The request hostname is invalid.
一般看到网站提示Bad Request(Invalid Hostname)错误我们都会说是iis,apache出问题了,iis出现这种问题解决办法大概是:IIS> 默认网站> > 属 ...
- 【HDU 1828】 Picture (矩阵周长并,线段树,扫描法)
[题目] Picture Problem Description A number of rectangular posters, photographs and other pictures of ...
- 透过表象看本质!?之二——除了最小p乘,还有PCA
如图1所示,最小p乘法求得是,而真实值到拟合曲线的距离为.那么,对应的是什么样的数据分析呢? 图1 最小p乘法的使用的误差是.真实值到拟合曲线的距离为 假如存在拟合曲线,设直线方程为.真实值到该曲线的 ...
- java学习多线程之线程状态
- C# Winform 实现自定义半透明loading加载遮罩层
在网页中通过div+css实现半透明效果不难,今天我们看看一种在winfrom中实现的方法: 效果图如下,正常时: 显示遮罩层时: 自定义遮罩层控件的源码如下: View Row Code 1 usi ...
- 阮老师讲解TF-IDF算法
TF-IDF与余弦相似性的应用(一):自动提取关键词 作者: 阮一峰 日期: 2013年3月15日 这个标题看上去好像很复杂,其实我要谈的是一个很简单的问题. 有一篇很长的文章,我要用计算机提取它 ...
- 【转】qtp-learn
1.计算器的例子(手动添加,将结果写到日志文件中) SystemUtil.Run "C:\WINDOWS\system32\calc.exe",""," ...
- UART(串口)
(1)串行通信线路三种工作方式:单工通信.半双工通信.全双工通信 单工:单工就是指A只能发信号,而B只能接收信号,通信是单向的. 半双工:半双工就是指A能发信号给B,B也能发信号给A,但这两个过程不能 ...
- Unity 中关于 BuildSetting 中 “Optimize Mesh Data” 选项的“坑”
Unity 在底层默认希望为你做尽可能多的优化,降低使用门槛,比如 BuildSetting 中的 Optimize Mesh Data 选项就是一个典型的例子. 这个选项到底有什么用呢?文档描述为: ...
- poj 1039 Pipe(几何基础)
Pipe Time Limit: 1000MS Memory Limit: 10000K Total Submissions: 9932 Accepted: 3045 Description ...