一、快速搭建Kafka环境

基于Docker容器创建(供参考):

https://www.cnblogs.com/mindzone/p/15608984.html

这里简要写一下命令:

# 拉取zk + kafka的镜像
docker pull wurstmeister/zookeeper
docker pull wurstmeister/kafka # 创建zk容器
docker run -d --name zookeeper -p 2181:2181 -t wurstmeister/zookeeper # 创建kafka容器
docker run -d --name kafka \
-p 9092:9092 \
-e KAFKA_BROKER_ID=0 \
-e KAFKA_ZOOKEEPER_CONNECT=Linux主机IP:2181 \
-e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://Linux主机IP:9092 \
-e KAFKA_LISTENERS=PLAINTEXT://0.0.0.0:9092 \
-t wurstmeister/kafka # 检查kafka运行情况
docker ps

测试Topic消息是否正常生产和消费(注意终端是阻塞的,需要多开终端窗口测试):

#窗口1 生产
[root@centos-linux ~]# docker exec -it kafka /bin/bash
bash-4.4# kafka-console-producer.sh --broker-list localhost:9092 --topic topic名称 #窗口2 消费
[root@centos-linux ~]# docker exec -it kafka /bin/bash
bash-4.4# kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic topic名称 --from-beginning # 样例
bash-4.4# kafka-console-producer.sh --broker-list localhost:9092 --topic producer

bash-4.4# kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic producer --from-beginning

二、配置Maxwell 绑定Kafka

1、方式一,简单命令参数启动:

cd /usr/local/maxwell-1.29.2
./bin/maxwell \
--user='maxwell' \
--password='123456' \
--host='192.168.2.225' \
--port='3308' \
--producer=kafka \
--kafka.bootstrap.servers=localhost:9092 \
--kafka_topic=producer \
--jdbc_options='useSSL=false&serverTimezone=Asia/Shanghai'

Maxwell运行成功的输出:

[root@localhost maxwell-1.29.2]# ./bin/maxwell \
> --user='maxwell' \
> --password='123456' \
> --host='192.168.2.225' \
> --port='3308' \
> --producer=kafka \
> --kafka.bootstrap.servers=localhost:9092 \
> --kafka_topic=producer \
> --jdbc_options='useSSL=false&serverTimezone=Asia/Shanghai'
Using kafka version: 1.0.0
14:13:50,533 INFO Maxwell - Starting Maxwell. maxMemory: 247332864 bufferMemoryUsage: 0.25
14:13:50,783 INFO ProducerConfig - ProducerConfig values:
acks = 1
batch.size = 16384
bootstrap.servers = [localhost:9092]
buffer.memory = 33554432
client.id =
compression.type = snappy
connections.max.idle.ms = 540000
enable.idempotence = false
interceptor.classes = null
key.serializer = class org.apache.kafka.common.serialization.StringSerializer
linger.ms = 0
max.block.ms = 60000
max.in.flight.requests.per.connection = 5
max.request.size = 1048576
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retries = 0
retry.backoff.ms = 100
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
transaction.timeout.ms = 60000
transactional.id = null
value.serializer = class org.apache.kafka.common.serialization.StringSerializer
14:13:50,847 INFO AppInfoParser - Kafka version : 1.0.0
14:13:50,847 INFO AppInfoParser - Kafka commitId : aaa7af6d4a11b29d
14:13:50,871 INFO Maxwell - Maxwell v1.29.2 is booting (MaxwellKafkaProducer), starting at Position[BinlogPosition[mysql-bin.000005:225424], lastHeartbeat=1642486284932]
14:13:51,040 INFO MysqlSavedSchema - Restoring schema id 1 (last modified at Position[BinlogPosition[mysql-bin.000005:16191], lastHeartbeat=0])
14:13:51,205 INFO BinlogConnectorReplicator - Setting initial binlog pos to: mysql-bin.000005:225424
14:13:51,235 INFO BinaryLogClient - Connected to 192.168.2.225:3308 at mysql-bin.000005/225424 (sid:6379, cid:215)
14:13:51,235 INFO BinlogConnectorReplicator - Binlog connected.

2、方式二、写在配置文件中:

cd /usr/local/maxwell-1.29.2
vim config.properties

参数项:

kafka_topic=maxwell
producer=kafka
kafka.bootstrap.servers=localhost:9092
host=192.168.2.225
user=maxwell
password=123456
port=3308

启动:

cd /usr/local/maxwell-1.29.2

./bin/maxwell \
--config ./config.properties \
--jdbc_options='useSSL=false&serverTimezone=Asia/Shanghai'

三、Kafka监听测试

由Kafka监听后,maxwell不再打印信息,后台运行,交由kafka发送
在DB操作非查询SQL时,可以发现Kafka消费者能够收到消息

消费者终端的消息:

[root@localhost maxwell-1.29.2]# docker exec -it kafka /bin/bash
bash-5.1# bash-4.4# kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic producer --from-beginning
bash: bash-4.4#: command not found
bash-5.1# kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic producer --from-beginning
[2022-01-18 06:09:16,853] WARN [Consumer clientId=consumer-console-consumer-5789-1, groupId=console-consumer-5789] Error while fetching metadata with correlation id 2 : {producer=LEADER_NOT_AVAILABLE} (org.apache.kafka.cli
[2022-01-18 06:09:16,987] WARN [Consumer clientId=consumer-console-consumer-5789-1, groupId=console-consumer-5789] Error while fetching metadata with correlation id 4 : {producer=LEADER_NOT_AVAILABLE} (org.apache.kafka.cli
hello
aaaaaaaaaaaaaaa
{"database":"test-db","table":"day_sale","type":"delete","ts":1642486851,"xid":71876,"commit":true,"data":{"ID":166,"PRODUCT":"产品C","CHANNEL":"淘宝","AMOUNT":2497.0000,"SALE_DATE":"2022-01-18 13:48:48"}}

四、Kafka分区控制

1、用途:

希望kakfa能够并行执行,因为监听的消息都只送到一个分区的队列上,效率太慢
让Kafka进行并发发送,就多开分区进行,每个分区同时执行消息发送

2、问题:

教程并没有说明是如何关联库和分区的关系,只是会有不同

3、技术要点:

如何配置maxwell对kafka的分区?

参考config.properties对kafka配置的说明:

#       *** kafka ***
# list of kafka brokers
#kafka.bootstrap.servers=hosta:9092,hostb:9092
# kafka topic to write to
# this can be static, e.g. 'maxwell', or dynamic, e.g. namespace_%{database}_%{table}
# in the latter case 'database' and 'table' will be replaced with the values for the row being processed
#kafka_topic=maxwell
# alternative kafka topic to write DDL (alter/create/drop) to. Defaults to kafka_topic
#ddl_kafka_topic=maxwell_ddl
-- 这段是关于分区的配置信息:
# *** partitioning ***
# 按照什么方式对数据进行划分?
# What part of the data do we partition by?
# 参数项:库 表 主键 事务ID 线程ID 字段
# producer_partition_by=database # [database, table, primary_key, transaction_id, thread_id, column]
# 如果选用字段来对数据进行划分, 指定在使用producer\u partition\u by=column时,分区依据的字段
# specify what fields to partition by when using producer_partition_by=column
# column separated list.
# 指明字段使用的是哪些
# producer_partition_columns=id,foo,bar
# 如果指明的字段不存在,则会分区规则回退到库名进行划分
# when using producer_partition_by=column, partition by this when
# the specified column(s) don't exist.
# producer_partition_by_fallback=database
# *** kinesis ***
# kinesis_stream=maxwell
# AWS places a 256 unicode character limit on the max key length of a record
# http://docs.aws.amazon.com/kinesis/latest/APIReference/API_PutRecord.html
#
# Setting this option to true enables hashing the key with the md5 algorithm
# before we send it to kinesis so all the keys work within the key size limit.
# Values: true, false
# Default: false
#kinesis_md5_keys=true

4、分区测试案例:

- 1、创建新的Topic并分配6个分区

# 进入kafka容器
docker exec -it kafka /bin/bash

# 创建主题并分配分区 (必须添加副本参数)
kafka-topics.sh --zookeeper 192.168.177.129:2181 --topic maxwell --create --replication-factor 1 --partitions 6 副本数量 1
--replication-factor 1
分区数量 6
--partitions 6

- 2、更新maxwell配置(按字段配置很少,就按照库划分配置即可)

# kafka配置
producer=kafka
kafka.bootstrap.servers=localhost:9092 # 改Topic名称
kafka_topic=maxwell # 改分区配置
producer_partition_by=database

- 3、重新启动maxwell

cd /usr/local/maxwell-1.29.2

./bin/maxwell \
--config ./config.properties \
--jdbc_options='useSSL=false&serverTimezone=Asia/Shanghai'

- 4、向库中写入数据,然后查看kafka消息(使用Kafka tool可视化工具)

这一步省略具体步骤,只要是DML操作就行,效果查看使用【Kafka Tool】工具 (offset explorer)

五、关于Kafka分区配置的命令补充

Kafka基于这些命令脚本实现功能:

[root@localhost maxwell-1.29.2]# docker exec -it kafka ls /opt/kafka_2.13-2.8.1/bin
connect-distributed.sh kafka-preferred-replica-election.sh
connect-mirror-maker.sh kafka-producer-perf-test.sh
connect-standalone.sh kafka-reassign-partitions.sh
kafka-acls.sh kafka-replica-verification.sh
kafka-broker-api-versions.sh kafka-run-class.sh
kafka-cluster.sh kafka-server-start.sh
kafka-configs.sh kafka-server-stop.sh
kafka-console-consumer.sh kafka-storage.sh
kafka-console-producer.sh kafka-streams-application-reset.sh
kafka-consumer-groups.sh kafka-topics.sh
kafka-consumer-perf-test.sh kafka-verifiable-consumer.sh
kafka-delegation-tokens.sh kafka-verifiable-producer.sh
kafka-delete-records.sh trogdor.sh
kafka-dump-log.sh windows
kafka-features.sh zookeeper-security-migration.sh
kafka-leader-election.sh zookeeper-server-start.sh
kafka-log-dirs.sh zookeeper-server-stop.sh
kafka-metadata-shell.sh zookeeper-shell.sh
kafka-mirror-maker.sh

语句执行报错

kafka-topics.sh --zookeeper 192.168.177.129:2181 --topic maxwell --create --replication-factor 2 --partitions 3
[2022-01-18 08:19:44,532] ERROR org.apache.kafka.common.errors.InvalidReplicationFactorException:
Replication factor: 4 larger than available brokers: 1.

报错思路分析

https://www.cnblogs.com/tyoutetu/p/10855283.html

# 即,需要Kafka集群, 一个Kafka代表一个broker,副本必须小于等于集群的数量
--replication-factor (指定数量必须小于等于Kafka集群数,如果单个,写1即可)

不能修改分区数量的原因:

# 分区的数量只能增加,不能减少
bash-5.1# kafka-topics.sh --zookeeper 192.168.177.129:2181 -alter --partitions 3 --topic maxwell
WARNING: If partitions are increased for a topic that has a key, the partition logic or ordering of the messages will be affected
Error while executing topic command : The number of partitions for a topic can only be increased. Topic maxwell currently has 6 partitions, 3 would not be
[2022-01-18 08:28:42,743] ERROR org.apache.kafka.common.errors.InvalidPartitionsException: The number of partitions for a topic can only be increased. Topi
(kafka.admin.TopicCommand$)

解决办法:

删除主题 -> 重建主题

【Maxwell】02 Kafka配置的更多相关文章

  1. 【转】解决Maxwell发送Kafka消息数据倾斜问题

    最近用Maxwell解析MySQL的Binlog,发送到Kafka进行处理,测试的时候发现一个问题,就是Kafka的Offset严重倾斜,三个partition,其中一个的offset已经快200万了 ...

  2. kafka 配置启动

    Kafka配置(注意log.dirs不要配置在tmp目录下,因为该目录会被linux定时任务删除,会导致kafka崩溃)需要三个Kafka实例,分别安装在下面三个机器上:192.168.240.167 ...

  3. hadoop生态搭建(3节点)-08.kafka配置

    如果之前没有安装jdk和zookeeper,安装了的请直接跳过 # https://www.oracle.com/technetwork/java/javase/downloads/java-arch ...

  4. Kafka配置信息

    Kafka配置信息 broker配置信息 属性 默认值 描述 broker.id 必填参数,broker的唯一标识 log.dirs /tmp/kafka-logs Kafka数据存放的目录.可以指定 ...

  5. Hadoop集群搭建-02安装配置Zookeeper

    Hadoop集群搭建-05安装配置YARN Hadoop集群搭建-04安装配置HDFS  Hadoop集群搭建-03编译安装hadoop Hadoop集群搭建-02安装配置Zookeeper Hado ...

  6. windows下kafka配置入门 示例

    实验平台与软件: 操作系统:windows7 32  位 java 开发包: jdk1.8.0_144 集群: zookeeper-3.3.6 消息队列: kafka_2.11-0.11.0.1 安装 ...

  7. kafka配置参数

    Kafka为broker,producer和consumer提供了很多的配置参数. 了解并理解这些配置参数对于我们使用kafka是非常重要的.本文列出了一些重要的配置参数. 官方的文档 Configu ...

  8. kafka配置

    官网:http://kafka.apache.org/ 主要有3种安装方式: 1. 单机单broker 2. 单机多broker 3. 多机多broker 1. wget http://mirror. ...

  9. Kafka配置及简单命令使用

    一. Kafka中的相关概念的介绍 Kafka是一个scala实现的分布式消息中间件,其中涉及到的相关概念如下: Kafka中传递的内容称为message(消息),message 是通过topic(话 ...

  10. kafka配置记录

    1. 准备三台机器,系统CentOs6 2. 安装好JDK和zookeeper 参考: zookeeper配置记录 3. 解压安装包到指定目录 tar -zxvf kafka_2.12-2.1.0.t ...

随机推荐

  1. SQL练习之打卡记录数据统计类问题

    最近老婆的公司,关闭了OA系统中,各类打卡时间数据统计的功能,为了不麻烦老婆手算,就做了一个简单的打卡系统,方便自动统计老婆想要知道的各类数据. 做的过程中就遇到了几个还挺有意思的SQL,这里写成一篇 ...

  2. 语义化结构标签 多媒体标签 H5新增表单内容

    语义化结构标签: section  更偏向于一个区域类似div(块) article 更偏向于显示内容(块) aside 标签作为article呢绒的辅助板块(块) header 标签做为一个网页头部 ...

  3. 「TAOI-2」Ciallo~(∠・ω< )⌒★ 题解

    「TAOI-2」Ciallo-(∠・ω< )⌒★ 题解 不难发现,答案可以分成两种: 整段的 中间删一点,两端凑一起的 考虑分开计算贡献. 如果 \(s\) 中存在子串等于 \(t\),那么自然 ...

  4. @Valid + BindingResult 拦截接口错误信息

    @Valid + BindingResult 拦截接口错误信息###测试发现: HttpServletRequest request, HttpServletResponse response, 需要 ...

  5. 《最新出炉》系列入门篇-Python+Playwright自动化测试-51- 字符串操作 - 上篇

    1.简介 在日常的自动化测试工作中进行断言的时候,我们可能经常遇到的场景.从一个字符串中找出一组数字或者其中的某些关键字,而不是将这一串字符串作为结果进行断言.这个时候就需要我们对字符串进行操作,宏哥 ...

  6. python 二次封装logging,打印日志文件名正确,且正确写入/结合pytest执行,日志不输出的问题

    基于之前日志问题,二次封装日志后,导致日志输出的文件名不对,取到的文件一直都是当前二次封装的log的文件名,基于这个问题,做了优化,详细看 https://www.cnblogs.com/cuitan ...

  7. version `GLIBC_2.14' not found 问题解决

    参考连接:https://blog.csdn.net/u011262253/article/details/99056385

  8. Typora行内公式识别不了

    Typora行内公式识别不了,主要是因为行内公式属于LaTeX扩展语法,并非Markdown的通用标准 需要在Typora的"文件"-"偏好设置"-" ...

  9. 『vulnhub系列』Hack Me Please-1

    『vulnhub系列』Hack Me Please-1 下载地址: https://www.vulnhub.com/entry/hack-me-please-1,731/ 信息搜集: 使用nmap进行 ...

  10. joigsc2022_e 题解

    翻译 有长度为 \(n\) 的序列 \(a\) 和 \(L\),你需要对于每个 \(x \in[1,n]\) 求出若把第 \(x\) 个数到第 \(n\) 个数依次装入容量为 \(L\) 的箱子中(每 ...