Flink - FlinkKafkaProducer010

https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/kafka.html

使用的方式，

DataStream<String> stream = ...;

FlinkKafkaProducer010Configuration myProducerConfig = FlinkKafkaProducer010.writeToKafkaWithTimestamps(

        stream,                     // input stream

        "my-topic",                 // target topic

        new SimpleStringSchema(),   // serialization schema

        properties);                // custom configuration for KafkaProducer (including broker list)

// the following is necessary for at-least-once delivery guarantee

myProducerConfig.setLogFailuresOnly(false);   // "false" by default

myProducerConfig.setFlushOnCheckpoint(true);  // "false" by default

Besides enabling Flink’s checkpointing, you should also configure the setter methods setLogFailuresOnly(boolean) andsetFlushOnCheckpoint(boolean) appropriately, as shown in the above examples in the previous section:

setLogFailuresOnly(boolean): enabling this will let the producer log failures only instead of catching and rethrowing them. This essentially accounts the record to have succeeded, even if it was never written to the target Kafka topic. This must be disabled for at-least-once.
setFlushOnCheckpoint(boolean): with this enabled, Flink’s checkpoints will wait for any on-the-fly records at the time of the checkpoint to be acknowledged by Kafka before succeeding the checkpoint. This ensures that all records before the checkpoint have been written to Kafka. This must be enabled for at-least-once.

Note: By default, the number of retries is set to “0”. This means that when setLogFailuresOnly is set to false, the producer fails immediately on errors, including leader changes. The value is set to “0” by default to avoid duplicate messages in the target topic that are caused by retries. For most production environments with frequent broker changes, we recommend setting the number of retries to a higher value.

setLogFailuresOnly，如果true，发送kafka失败时，只是log，不会中断执行，这样可能丢数据

如果false，发送kafka失败时，抛异常，这样job会restart，不会丢数据，但是会中断执行；这里最好把produer的retires设成3，这样避免kafka临时不可用导致job中断，比如leader切换

setFlushOnCheckpoint，如果true，在做checkpoint的时候，会等待所有pending的record被发送成功，这样保证数据不丢

首先FlinkKafkaProducer010是一种sink，

一般的使用方式是，steam.addSink(RichSinkFunction)

    public DataStreamSink<T> addSink(SinkFunction<T> sinkFunction) {

        this.transformation.getOutputType();

        if(sinkFunction instanceof InputTypeConfigurable) {

            ((InputTypeConfigurable)sinkFunction).setInputType(this.getType(), this.getExecutionConfig());

        }

        StreamSink sinkOperator = new StreamSink((SinkFunction)this.clean(sinkFunction));

        DataStreamSink sink = new DataStreamSink(this, sinkOperator);

        this.getExecutionEnvironment().addOperator(sink.getTransformation());

        return sink;

    }

这里用FlinkKafkaProducer010.writeToKafkaWithTimestamps封装这部分，比较tricky

   /**

     * Creates a FlinkKafkaProducer for a given topic. The sink produces a DataStream to

     * the topic.

     *

     * This constructor allows writing timestamps to Kafka, it follow approach (b) (see above)

     *

     *  @param inStream The stream to write to Kafka

     *  @param topicId The name of the target topic

     *  @param serializationSchema A serializable serialization schema for turning user objects into a kafka-consumable byte[] supporting key/value messages

     *  @param producerConfig Configuration properties for the KafkaProducer. 'bootstrap.servers.' is the only required argument.

     *  @param customPartitioner A serializable partitioner for assigning messages to Kafka partitions.

     */

    public static <T> FlinkKafkaProducer010Configuration<T> writeToKafkaWithTimestamps(DataStream<T> inStream,

                                                                                    String topicId,

                                                                                    KeyedSerializationSchema<T> serializationSchema,

                                                                                    Properties producerConfig,

                                                                                    KafkaPartitioner<T> customPartitioner) {

        GenericTypeInfo<Object> objectTypeInfo = new GenericTypeInfo<>(Object.class);

        FlinkKafkaProducer010<T> kafkaProducer = new FlinkKafkaProducer010<>(topicId, serializationSchema, producerConfig, customPartitioner);

        SingleOutputStreamOperator<Object> transformation = inStream.transform("FlinKafkaProducer 0.10.x", objectTypeInfo, kafkaProducer);

        return new FlinkKafkaProducer010Configuration<>(transformation, kafkaProducer);

    }

可以看到这里实现了addSink的逻辑，返回FlinkKafkaProducer010Configuration，其实就是DataStreamSink

    public static class FlinkKafkaProducer010Configuration<T> extends DataStreamSink<T> {

        private final FlinkKafkaProducerBase wrappedProducerBase;

        private final FlinkKafkaProducer010 producer;

        private FlinkKafkaProducer010Configuration(DataStream stream, FlinkKafkaProducer010<T> producer) {

            //noinspection unchecked

            super(stream, producer);

            this.producer = producer;

            this.wrappedProducerBase = (FlinkKafkaProducerBase) producer.userFunction;

        }

关键是FlinkKafkaProducer010扩展StreamSink并重写

processElement

public class FlinkKafkaProducer010<T> extends StreamSink<T> implements SinkFunction<T>, RichFunction {

    public FlinkKafkaProducer010(String topicId, KeyedSerializationSchema<T> serializationSchema, Properties producerConfig, KafkaPartitioner<T> customPartitioner) {

        // We create a Kafka 09 producer instance here and only "override" (by intercepting) the

        // invoke call.

        super(new FlinkKafkaProducer09<>(topicId, serializationSchema, producerConfig, customPartitioner));

    }

    @Override

    public void processElement(StreamRecord<T> element) throws Exception {

        invokeInternal(element.getValue(), element.getTimestamp());

    }

StreamSink中processElement是这样实现的，

public class StreamSink<IN> extends AbstractUdfStreamOperator<Object, SinkFunction<IN>>

        implements OneInputStreamOperator<IN, Object> {

    @Override

    public void processElement(StreamRecord<IN> element) throws Exception {

        userFunction.invoke(element.getValue());

    }

可以看到FlinkKafkaProducer010绕开了对SinkFunction的调用，直接调用invokeInternal

所以SinkFunction的实现是无用的，不会被调用到

    public void invoke(T value) throws Exception {

        invokeInternal(value, Long.MAX_VALUE);

    }

invokeInternal

    private void invokeInternal(T next, long elementTimestamp) throws Exception {

        final FlinkKafkaProducerBase<T> internalProducer = (FlinkKafkaProducerBase<T>) userFunction;

        internalProducer.checkErroneous();

        byte[] serializedKey = internalProducer.schema.serializeKey(next);

        byte[] serializedValue = internalProducer.schema.serializeValue(next);

        String targetTopic = internalProducer.schema.getTargetTopic(next);

        if (targetTopic == null) {

            targetTopic = internalProducer.defaultTopicId;

        }

        Long timestamp = null;

        if(this.writeTimestampToKafka) {

            timestamp = elementTimestamp;

        }

        ProducerRecord<byte[], byte[]> record;

        if (internalProducer.partitioner == null) {

            record = new ProducerRecord<>(targetTopic, null, timestamp, serializedKey, serializedValue);

        } else {

            record = new ProducerRecord<>(targetTopic, internalProducer.partitioner.partition(next, serializedKey, serializedValue, internalProducer.partitions.length), timestamp, serializedKey, serializedValue);

        }

        if (internalProducer.flushOnCheckpoint) {

            synchronized (internalProducer.pendingRecordsLock) {

                internalProducer.pendingRecords++;  // 如果flushOnCheckpoint打开，需要记录正在发送的record数目

            }

        }

        internalProducer.producer.send(record, internalProducer.callback);

    }

代码很容易理解，正常的producer发送流程，

除了，

internalProducer.checkErroneous();

internalProducer.callback

internalProducer.callback是用来处理kafka返回的ack的

FlinkKafkaProducerBase

    @Override

    public void open(Configuration configuration) {if (logFailuresOnly) {

            callback = new Callback() {

                @Override

                public void onCompletion(RecordMetadata metadata, Exception e) {

                    if (e != null) {

                        LOG.error("Error while sending record to Kafka: " + e.getMessage(), e);

                    }

                    acknowledgeMessage();

                }

            };

        }

        else {

            callback = new Callback() {

                @Override

                public void onCompletion(RecordMetadata metadata, Exception exception) {

                    if (exception != null && asyncException == null) {

                        asyncException = exception;

                    }

                    acknowledgeMessage();

                }

            };

        }

    }

可以看到logFailuresOnly是true的时候，对于Exception只是，log

如果是false，就会记录下这个Exception到asyncException

acknowledgeMessage，无论是否有错都需要ack

    private void acknowledgeMessage() {

        if (flushOnCheckpoint) {

            synchronized (pendingRecordsLock) {

                pendingRecords--;

                if (pendingRecords == 0) {

                    pendingRecordsLock.notifyAll();

                }

            }

        }

    }

逻辑就是计数--，如果pendingRecords == 0，即没有正在发送的record，通知所有在等锁的

checkErroneous()

    protected void checkErroneous() throws Exception {

        Exception e = asyncException;

        if (e != null) {

            // prevent double throwing

            asyncException = null;

            throw new Exception("Failed to send data to Kafka: " + e.getMessage(), e);

        }

    }

就是把asyncException里面的异常抛出去

Flink - FlinkKafkaProducer010的更多相关文章

flink引出的kafka不同版本的兼容性
参考: 官网协议介绍:http://kafka.apache.org/protocol.html#The_Messages_Fetch kafka协议兼容性 http://www.cnblogs.c ...
Kafka设计解析（二十）Apache Flink Kafka consumer
转载自 huxihx,原文链接 Apache Flink Kafka consumer Flink提供了Kafka connector用于消费/生产Apache Kafka topic的数据.Flin ...
【译】Apache Flink Kafka consumer
Flink提供了Kafka connector用于消费/生产Apache Kafka topic的数据.Flink的Kafka consumer集成了checkpoint机制以提供精确一次的处理语义. ...
flink统计根据账号每30秒金额的平均值
package com.zetyun.streaming.flink; import org.apache.flink.api.common.functions.MapFunction;import ...
FLINK流计算拓扑任务代码分析<一>
我打算以 flink 官方的例子 <<Monitoring the Wikipedia Edit Stream>> 作为示例,进行 flink 流计算任务的源码解析说明. ...
Flink Flow
1. Create environment for stream computing StreamExecutionEnvironment env = StreamExecutionEnvironme ...
Flink学习笔记：Connectors之kafka
本文为<Flink大数据项目实战>学习笔记,想通过视频系统学习Flink这个最火爆的大数据计算框架的同学,推荐学习课程: Flink大数据项目实战:http://t.cn/EJtKhaz ...
关于flink的时间处理不正确的现象复现&原因分析
跟朋友聊天,说输出的时间不对,之前测试没关注到这个,然后就在processing模式下看了下,发现时间确实不正确然后就debug,看问题在哪,最终分析出了原因,记录如下: 最下面给出了复现方案 ...
Flink实战(八) - Streaming Connectors 编程
1 概览 1.1 预定义的源和接收器 Flink内置了一些基本数据源和接收器,并且始终可用.该预定义的数据源包括文件,目录和插socket,并从集合和迭代器摄取数据.该预定义的数据接收器支持写入文件和 ...

随机推荐

PentesterLab渗透演练平台
转载自: https://www.blackh4t.org/archives/1143.html http://www.91ri.org/5958.html 1. 什么是WebApp Pen ...
Android Launcher分析和修改4——初始化加载数据
上面一篇文章说了Launcher是如何被启动的,Launcher启动的过程主要是加载界面数据然后显示出来, 界面数据都是系统APP有关的数据,都是从Launcher的数据库读取,下面我们详细分析Lau ...
【iCore1S 双核心板_FPGA】例程十六：基于SPI的ARM与FPGA通信实验
实验现象: 核心代码: int main(void) { int i,n; ]; ]; HAL_Init(); system_clock.initialize(); led.initialize(); ...
【GMT43智能液晶模块】例程六：WWDG看门狗实验——复位ARM
实验原理: STM32内部包含窗口看门狗,通过看门狗可以监控程序运行,程序运行错误时,未在规定时间喂狗,自动复位ARM.本实验通过UI界面中按钮按下停止喂狗,制造程序运行错误,从而产生复位. 示例 ...
WAS生成的文件：javacore.***.txt 、heapdump.***.phd、core.***.dmp、Snap.***.trc
WAS生成的常见文件有哪些? 原文链接:http://blog.csdn.net/pqh20085101092/article/details/39370389 javacore.***.txt : ...
duilib进阶教程 -- Label控件的bug (8)
上个教程说到了TreeView的文字不能垂直居中的问题,而我们用LabelUI其实是可以垂直居中的,为什么不说是TreeView的bug,而说是Label控件的bug呢?因为影响TreeView垂直居 ...
【中间件安全】IIS7.0 安全加固规范
1. 适用情况适用于使用IIS7进行部署的Web网站. 2. 技能要求熟悉IIS配置操作,能够利用IIS进行建站,并能针对站点使用IIS进行安全加固. 3. 前置条件 1. 根据站点开放端口.进程 ...
【代码审计】XYHCMS V3.5URL重定向漏洞分析
0x00 环境准备 XYHCMS官网:http://www.xyhcms.com/ 网站源码版本:XYHCMS V3.5(2017-12-04 更新) 程序源码下载:http://www.xyhc ...
WebKit最新特性srcset简介(转)
WebKit内核最新新增了对srcset属性的支持(参考:https://www.webkit.org/blog/2910/improved-support-for-high-resolution-d ...
NHibernate之旅(21)：探索对象状态
本节内容引入对象状态对象状态转换结语引入在程序运行过程中使用对象的方式对数据库进行操作,这必然会产生一系列的持久化类的实例对象.这些对象可能是刚刚创建并准备存储的,也可能是从数据库中查询的 ...

Flink - FlinkKafkaProducer010

Flink - FlinkKafkaProducer010的更多相关文章

随机推荐

热门专题