1. 背景

上一篇演示了split操作算子的用法。今天展示一下split的逆操作:merge。Merge算子的作用是把多股实时消息流合并到一个单一的流中。

2. 功能演示说明

假设我们有多个Kafka topic,每个topic表示某类特定音乐类型的歌曲,比如有摇滚乐、古典乐等。本例中我们演示如何使用Kafka Streams将这些歌曲合并到一个Kafka topic中。我们依然使用Protocol Buffer对歌曲进行序列化和反序列化。你大概可以认为歌曲可以用下面的格式来表示:

{"artist": "Metallica", "title": "Fade to Black"}

{"artist": "Smashing Pumpkins", "title": "Today"}

3. 初始化项目

首先创建项目目录:

$ mkdir merge-streams
$ cd merge-streams/

4. 配置项目

创建Gradle项目配置文件build.gradle:

buildscript {

    repositories {
jcenter()
}
dependencies {
classpath 'com.github.jengelman.gradle.plugins:shadow:4.0.2'
}
} plugins {
id 'java'
id "com.google.protobuf" version "0.8.10"
}
apply plugin: 'com.github.johnrengelman.shadow' repositories {
mavenCentral()
jcenter() maven {
url 'http://packages.confluent.io/maven'
}
} group 'huxihx.kafkastreams' sourceCompatibility = 1.8
targetCompatibility = '1.8'
version = '0.0.1' dependencies {
implementation 'com.google.protobuf:protobuf-java:3.0.0'
implementation 'org.slf4j:slf4j-simple:1.7.26'
implementation 'org.apache.kafka:kafka-streams:2.3.0'
implementation 'com.google.protobuf:protobuf-java:3.9.1' testCompile group: 'junit', name: 'junit', version: '4.12'
} protobuf {
generatedFilesBaseDir = "$projectDir/src/"
protoc {
artifact = 'com.google.protobuf:protoc:3.0.0'
}
} jar {
manifest {
attributes(
'Class-Path': configurations.compile.collect { it.getName() }.join(' '),
'Main-Class': 'huxihx.kafkastreams.MergeStreams'
)
}
} shadowJar {
archiveName = "kstreams-transform-standalone-${version}.${extension}"
}

保存上面的文件,然后执行下列命令下载Gradle的wrapper套件:

$ gradle wrapper

之后在merge-streams目录下创建一个名为configuration的文件夹用于保存我们的参数配置文件dev.properties:

application.id=merging-app
bootstrap.servers=localhost:9092 input.rock.topic.name=rock-song-events
input.rock.topic.partitions=1
input.rock.topic.replication.factor=1 input.classical.topic.name=classical-song-events
input.classical.topic.partitions=1
input.classical.topic.replication.factor=1 output.topic.name=all-song-events
output.topic.partitions=1
output.topic.replication.factor=1

这里我们配置了两个输入topic,分别表示摇滚乐和古典乐歌曲。同时我们还创建了一个输出topic,用于保存merge之后的歌曲流。

5. 创建消息Schema

接下来创建用到的topic的schema。在merge-streams下执行命令创建保存schema的文件夹:

$ mkdir -p src/main/proto

之后在proto文件夹下创建名为song_event.proto文件,内容如下:

syntax = "proto3";

package huxihx.kafkastreams.proto;

message SongEvent {
string name = 1;
string title = 2;
}

保存之后在merge-stream下运行gradlew命令:

$ ./gradlew build

此时,你应该可以在merge-streams/src/main/java/huxihx/kafkastreams/proto下看到生成的Java类:SongEventOuterClass。

6. 创建Serdes

这一步我们为所需的topic消息创建各自的Serdes。首先执行下面的命令创建对应的文件夹目录:

$ mkdir -p src/main/java/huxihx/kafkastreams/serdes

之后在新创建的serdes文件夹下创建ProtobufSerializer.java:

package huxihx.kafkastreams.serdes;

import com.google.protobuf.MessageLite;
import org.apache.kafka.common.serialization.Serializer; public class ProtobufSerializer<T extends MessageLite> implements Serializer<T> {
@Override
public byte[] serialize(String topic, T data) {
return data == null ? new byte[0] : data.toByteArray();
}
}

然后是ProtobufDeserializer.java:

package huxihx.kafkastreams.serdes;

import com.google.protobuf.InvalidProtocolBufferException;
import com.google.protobuf.MessageLite;
import com.google.protobuf.Parser;
import org.apache.kafka.common.errors.SerializationException;
import org.apache.kafka.common.serialization.Deserializer; import java.util.Map; public class ProtobufDeserializer<T extends MessageLite> implements Deserializer<T> { private Parser<T> parser; @Override
public void configure(Map<String, ?> configs, boolean isKey) {
parser = (Parser<T>) configs.get("parser");
} @Override
public T deserialize(String topic, byte[] data) {
try {
return parser.parseFrom(data);
} catch (InvalidProtocolBufferException e) {
throw new SerializationException("Failed to deserialize from a protobuf byte array.", e);
}
}
}

最后是ProtobufSerdes.java:

package huxihx.kafkastreams.serdes;

import com.google.protobuf.MessageLite;
import com.google.protobuf.Parser;
import org.apache.kafka.common.serialization.Deserializer;
import org.apache.kafka.common.serialization.Serde;
import org.apache.kafka.common.serialization.Serializer; import java.util.HashMap;
import java.util.Map; public class ProtobufSerdes<T extends MessageLite> implements Serde<T> { private final Serializer<T> serializer;
private final Deserializer<T> deserializer; public ProtobufSerdes(Parser<T> parser) {
serializer = new ProtobufSerializer<>();
deserializer = new ProtobufDeserializer<>();
Map<String, Parser<T>> config = new HashMap<>();
config.put("parser", parser);
deserializer.configure(config, false);
} @Override
public Serializer<T> serializer() {
return serializer;
} @Override
public Deserializer<T> deserializer() {
return deserializer;
}
}

7. 开发主流程

在src/main/java/huxihx/kafkastreams下创建MergeStreams.java文件:

package huxihx.kafkastreams;

import huxihx.kafkastreams.proto.SongEventOuterClass;
import huxihx.kafkastreams.serdes.ProtobufSerdes;
import org.apache.kafka.clients.admin.AdminClient;
import org.apache.kafka.clients.admin.AdminClientConfig;
import org.apache.kafka.clients.admin.NewTopic;
import org.apache.kafka.clients.admin.TopicListing;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.Topology;
import org.apache.kafka.streams.kstream.Consumed;
import org.apache.kafka.streams.kstream.KStream; import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.CountDownLatch;
import java.util.stream.Collectors; public class MergeStreams {
public static void main(String[] args) throws Exception {
if (args.length < 1) {
throw new IllegalArgumentException("Config file path must be specified.");
} MergeStreams app = new MergeStreams();
Properties envProps = app.loadEnvProperties(args[0]);
Properties streamProps = app.createStreamsProperties(envProps);
Topology topology = app.buildTopology(envProps); app.preCreateTopics(envProps); final KafkaStreams streams = new KafkaStreams(topology, streamProps);
final CountDownLatch latch = new CountDownLatch(1); Runtime.getRuntime().addShutdownHook(new Thread("streams-shutdown-hook") {
@Override
public void run() {
streams.close();
latch.countDown();
}
}); try {
streams.start();
latch.await();
} catch (Exception e) {
System.exit(1);
}
System.exit(0);
} private Topology buildTopology(Properties envProps) {
final StreamsBuilder builder = new StreamsBuilder();
final String rockEvents = envProps.getProperty("input.rock.topic.name");
final String classicalEvents = envProps.getProperty("input.classical.topic.name");
final String allEvents = envProps.getProperty("output.topic.name"); KStream<String, SongEventOuterClass.SongEvent> rockStreams =
builder.stream(rockEvents, Consumed.with(Serdes.String(), songEventProtobufSerdes()));
KStream<String, SongEventOuterClass.SongEvent> classicalStreams =
builder.stream(classicalEvents, Consumed.with(Serdes.String(), songEventProtobufSerdes()));
KStream<String, SongEventOuterClass.SongEvent> allStreams = rockStreams.merge(classicalStreams); allStreams.to(allEvents, Produced.with(Serdes.String(), songEventProtobufSerdes()));
return builder.build();
} private Properties createStreamsProperties(Properties envProps) {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, envProps.getProperty("application.id"));
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, envProps.getProperty("bootstrap.servers"));
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
return props;
} private void preCreateTopics(Properties envProps) throws Exception {
Map<String, Object> config = new HashMap<>();
config.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, envProps.getProperty("bootstrap.servers"));
String inputTopic1 = envProps.getProperty("input.rock.topic.name");
String inputTopic2 = envProps.getProperty("input.classical.topic.name");
String outputTopic = envProps.getProperty("output.topic.name");
try (AdminClient client = AdminClient.create(config)) {
Collection<TopicListing> existingTopics = client.listTopics().listings().get(); List<NewTopic> topics = new ArrayList<>();
List<String> topicNames = existingTopics.stream().map(TopicListing::name).collect(Collectors.toList());
if (!topicNames.contains(inputTopic1))
topics.add(new NewTopic(
envProps.getProperty("input.rock.topic.name"),
Integer.parseInt(envProps.getProperty("input.rock.topic.partitions")),
Short.parseShort(envProps.getProperty("input.rock.topic.replication.factor")))); if (!topicNames.contains(inputTopic2))
topics.add(new NewTopic(
envProps.getProperty("input.classical.topic.name"),
Integer.parseInt(envProps.getProperty("input.classical.topic.partitions")),
Short.parseShort(envProps.getProperty("input.classical.topic.replication.factor")))); if (!topicNames.contains(outputTopic))
topics.add(new NewTopic(
envProps.getProperty("output.topic.name"),
Integer.parseInt(envProps.getProperty("output.topic.partitions")),
Short.parseShort(envProps.getProperty("output.topic.replication.factor")))); if (!topics.isEmpty())
client.createTopics(topics).all().get();
}
} private Properties loadEnvProperties(String fileName) throws IOException {
Properties envProps = new Properties();
try (FileInputStream input = new FileInputStream(fileName)) {
envProps.load(input);
}
return envProps;
} private static ProtobufSerdes<SongEventOuterClass.SongEvent> songEventProtobufSerdes() {
return new ProtobufSerdes<>(SongEventOuterClass.SongEvent.parser());
}
}

主要的逻辑在buildTopology方法中,我们调用KStream的merge方法将两个输入消息流合并到一个输出消息流中。

8. 编写测试Producer和Consumer

和之前的入门系列一样,我们编写TestProducer和TestConsumer类。在src/main/java/huxihx/kafkastreams/tests/TestProducer.java和TestConsumer.java,内容分别如下:

TestProducer.java

package huxihx.kafkastreams.tests;

import huxihx.kafkastreams.proto.SongEventOuterClass;
import huxihx.kafkastreams.serdes.ProtobufSerializer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord; import java.util.Arrays;
import java.util.List;
import java.util.Properties; public class TestProducer {
// 测试输入消息集1
private static final List<SongEventOuterClass.SongEvent> TEST_SONG_EVENTS1 = Arrays.asList(
SongEventOuterClass.SongEvent.newBuilder().setName("Metallica").setTitle("Fade to Black").build(),
SongEventOuterClass.SongEvent.newBuilder().setName("Smashing Pumpkins").setTitle("Today").build(),
SongEventOuterClass.SongEvent.newBuilder().setName("Pink Floyd").setTitle("Another Brick in the Wall").build(),
SongEventOuterClass.SongEvent.newBuilder().setName("Van Halen").setTitle("Jump").build(),
SongEventOuterClass.SongEvent.newBuilder().setName("Led Zeppelin").setTitle("Kashmir").build()
); // 测试输入消息集2
private static final List<SongEventOuterClass.SongEvent> TEST_SONG_EVENTS2 = Arrays.asList(
SongEventOuterClass.SongEvent.newBuilder().setName("Wolfgang Amadeus Mozart").setTitle("The Magic Flute").build(),
SongEventOuterClass.SongEvent.newBuilder().setName("Johann Pachelbel").setTitle("Canon").build(),
SongEventOuterClass.SongEvent.newBuilder().setName("Ludwig van Beethoven").setTitle("Symphony No. 5").build(),
SongEventOuterClass.SongEvent.newBuilder().setName("Edward Elgar").setTitle("Pomp and Circumstance").build()
); public static void main(String[] args) {
if (args.length < 1) {
throw new IllegalArgumentException("Must specify a test set (1 or 2).");
}
int choice = Integer.parseInt(args[0]);
if (choice != 1 && choice != 2) {
throw new IllegalArgumentException("Must specify a test set (1 or 2).");
} Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, new ProtobufSerializer<SongEventOuterClass.SongEvent>().getClass()); try (final Producer<String, SongEventOuterClass.SongEvent> producer = new KafkaProducer<>(props)) {
if (choice == 1) {
TEST_SONG_EVENTS1.stream().map(song ->
new ProducerRecord<String, SongEventOuterClass.SongEvent>("rock-song-events", song))
.forEach(producer::send);
} else {
TEST_SONG_EVENTS2.stream().map(song ->
new ProducerRecord<String, SongEventOuterClass.SongEvent>("classical-song-events", song))
.forEach(producer::send);
}
} }
}

TestConsumer.java

package huxihx.kafkastreams.tests;

import com.google.protobuf.Parser;
import huxihx.kafkastreams.proto.SongEventOuterClass;
import huxihx.kafkastreams.serdes.ProtobufDeserializer;
import org.apache.kafka.clients.consumer.Consumer;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.serialization.Deserializer;
import org.apache.kafka.common.serialization.StringDeserializer; import java.time.Duration;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties; public class TestConsumer { public static void main(String[] args) {
if (args.length < 1) {
throw new IllegalStateException("Must specify an output topic name.");
} Deserializer<SongEventOuterClass.SongEvent> deserializer = new ProtobufDeserializer<>();
Map<String, Parser<SongEventOuterClass.SongEvent>> config = new HashMap<>();
config.put("parser", SongEventOuterClass.SongEvent.parser());
deserializer.configure(config, false); Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "test-group");
props.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "1000");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"); try (final Consumer<String, SongEventOuterClass.SongEvent> consumer = new KafkaConsumer<>(props, new StringDeserializer(), deserializer)) {
consumer.subscribe(Arrays.asList(args[0]));
while (true) {
ConsumerRecords<String, SongEventOuterClass.SongEvent> records = consumer.poll(Duration.ofMillis(500));
for (ConsumerRecord<String, SongEventOuterClass.SongEvent> record : records) {
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}
}
}
}
}

9. 测试  

首先我们运行下列命令构建项目:

$ ./gradlew shadowJar

然后启动Kafka集群,之后运行Kafka Streams应用:

$ java -jar build/libs/kstreams-transform-standalone-0.0.1.jar configuration/dev.properties

现在启动两个终端分别测试两组TestProducer发送测试事件:  

$ java -cp build/libs/kstreams-transform-standalone-0.0.1.jar huxihx.kafkastreams.tests.TestProducer 1

$ java -cp build/libs/kstreams-transform-standalone-0.0.1.jar huxihx.kafkastreams.tests.TestProducer 2

最后启动TestConsumer验证Kafka Streams将两个topic输入消息流合并到一个输出消息流:

$ java -cp build/libs/kstreams-transform-standalone-0.0.1.jar huxihx.kafkastreams.tests.TestConsumer all-song-events

offset = 0, key = null, value = name: "Metallica"
title: "Fade to Black"

offset = 1, key = null, value = name: "Metallica"
title: "Fade to Black"

offset = 2, key = null, value = name: "Smashing Pumpkins"
title: "Today"

offset = 3, key = null, value = name: "Smashing Pumpkins"
title: "Today"

offset = 4, key = null, value = name: "Pink Floyd"
title: "Another Brick in the Wall"

offset = 5, key = null, value = name: "Pink Floyd"
title: "Another Brick in the Wall"

offset = 6, key = null, value = name: "Van Halen"
title: "Jump"

offset = 7, key = null, value = name: "Van Halen"
title: "Jump"

offset = 8, key = null, value = name: "Led Zeppelin"
title: "Kashmir"

offset = 9, key = null, value = name: "Led Zeppelin"
title: "Kashmir"

offset = 10, key = null, value = name: "Wolfgang Amadeus Mozart"
title: "The Magic Flute"

offset = 11, key = null, value = name: "Johann Pachelbel"
title: "Canon"

offset = 12, key = null, value = name: "Ludwig van Beethoven"
title: "Symphony No. 5"

offset = 13, key = null, value = name: "Edward Elgar"
title: "Pomp and Circumstance"

Kafka Streams开发入门(5)的更多相关文章

  1. Kafka Streams开发入门(4)

    背景 上一篇演示了filter操作算子的用法.今天展示一下如何根据不同的条件谓词(Predicate)将一个消息流实时地进行分流,划分成多个新的消息流,即所谓的流split.有的时候我们想要对消息流中 ...

  2. Kafka Streams开发入门(3)

    背景 上一篇我们介绍了Kafka Streams中的消息过滤操作filter,今天我们展示一个对消息进行转换Key的操作,依然是结合一个具体的实例展开介绍.所谓转换Key是指对流处理中每条消息的Key ...

  3. Kafka Streams开发入门(2)

    背景 上一篇我们介绍了Kafka Streams中的消息转换操作map,今天我们给出另一个经典的转换操作filter的用法.依然是结合一个具体的实例展开介绍. 演示功能说明 本篇演示filter用法, ...

  4. Kafka Streams开发入门(1)

    背景 最近发现Confluent公司在官网上发布了Kafka Streams教程,共有10节课,每节课给出了Kafka Streams的一个功能介绍.这个系列教程对于我们了解Kafka Streams ...

  5. Kafka .net 开发入门

    Kafka安装 首先我们需要在windows服务器上安装kafka以及zookeeper,有关zookeeper的介绍将会在后续进行讲解. 在网上可以找到相应的安装方式,我采用的是腾讯云服务器,借鉴的 ...

  6. 大全Kafka Streams

    本文将从以下三个方面全面介绍Kafka Streams 一. Kafka Streams 概念 二. Kafka Streams 使用 三. Kafka Streams WordCount   一. ...

  7. Kafka Streams | 流,实时处理和功能

    1.目标 在我们之前的Kafka教程中,我们讨论了Kafka中的ZooKeeper.今天,在这个Kafka Streams教程中,我们将学习Kafka中Streams的实际含义.此外,我们将看到Kaf ...

  8. 七 Kafka Streams VS Consumer API

    1 kafka Streams:   概念: 处理和分析储存在Kafka中的数据,并把处理结果写回Kafka或发送到外部系统的最终输出点,它建立在一些很重要的概念上,比如事件时间和消息时间的准确区分, ...

  9. Kafka入门实战教程(7):Kafka Streams

    1 关于流处理 流处理平台(Streaming Systems)是处理无限数据集(Unbounded Dataset)的数据处理引擎,而流处理是与批处理(Batch Processing)相对应的.所 ...

随机推荐

  1. #51nod上topcoder练习记

    好久没刷51nod了,又听说topcoder有很多好题.那么就来51nod上刷吧.(那个客户端搞得有点烦(看不懂)) [1366 贫富差距] 当图不连通的时候,答案为无穷大. 当图连通时,两个点之间的 ...

  2. LeetCode 528. Random Pick with Weight

    原题链接在这里:https://leetcode.com/problems/random-pick-with-weight/ 题目: Given an array w of positive inte ...

  3. Pandas | 16 聚合

    当有了滚动,扩展和ewm对象创建了以后,就有几种方法可以对数据执行聚合. DataFrame应用聚合 可以通过向整个DataFrame传递一个函数来进行聚合,或者通过标准的获取项目方法来选择一个列. ...

  4. 8259A的初始化(多片)

    1.主从式8259A的初始化设置: 初始化设置如下: (1)中断触发方式:边沿触发 (2)中断屏蔽方式:常规屏蔽方式,即使用OCW1向IMR写屏码 (3)中断优先级排队方式:固定优先级的完全嵌套方式 ...

  5. cf1199解题报告

    目录 cf1199解题报告 A B C D E F cf1199解题报告 发一波水题. A 模拟 #include <bits/stdc++.h> #define ll long long ...

  6. C# 动态加载(转)

    原文链接地址:http://blog.csdn.net/lanruoshui/article/details/5090710 原理如下: 1.利用反射进行动态加载和调用. Assembly assem ...

  7. 将行数据转换成Java(POJO)对象

    工作中经常会遇到将行数据转换成Java(POJO)对象的场景,其中关于字段校验和类型转换的处理繁琐而冗余,对于有代码洁癖的人着实不能忍.这里分享下自己封装的工具代码,也许能够帮助你更简单地完成此类任务 ...

  8. spark 基本操作整理

    关于spark 的详细操作请参照spark官网 scala 版本:2.11.8 1.添加spark maven依赖,如需访问hdfs,则添加hdfs依赖 groupId = org.apache.sp ...

  9. QuantLib 金融计算——收益率曲线之构建曲线(4)

    [TOC] 如果未做特别说明,文中的程序都是 C++11 代码. QuantLib 金融计算--收益率曲线之构建曲线(4) 本文代码对应的 QuantLib 版本是 1.15.相关源代码可以在 Qua ...

  10. 前端与算法 leetcode 344. 反转字符串

    目录 # 前端与算法 leetcode 344. 反转字符串 题目描述 概要 提示 解析 解法一:双指针 解法二:递归 算法 传入测试用例的运行结果 执行结果 GitHub仓库 # 前端与算法 lee ...