在做支付订单宽表的场景,需要关联的表比较多而且支付有可能要延迟很久,这种情况下不太适合使用Flink的表Join,想到的另外一种解决方案是消费多个Topic的数据,再根据订单号进行keyBy,再在逻辑中根据不同Topic处理,所以在接收到的消息中最好能够有topic字段,JSONKeyValueDeserializationSchema就完美的解决了这个问题。

  def getKafkaConsumer(kafkaAddr: String, topicNames: util.ArrayList[String], groupId: String): FlinkKafkaConsumer[ObjectNode] = {
val properties = getKafkaProperties(groupId, kafkaAddr)
val consumer = new FlinkKafkaConsumer[ObjectNode](topicNames, new JSONKeyValueDeserializationSchema(true), properties)
consumer.setStartFromGroupOffsets() // the default behaviour
consumer
}

在这里new JSONKeyValueDeserializationSchema(true)是需要带上元数据信息,false则不带上,源码如下

public class JSONKeyValueDeserializationSchema implements KafkaDeserializationSchema<ObjectNode> {
private static final long serialVersionUID = 1509391548173891955L;
private final static Logger log = LoggerFactory.getLogger(JSONKeyValueDeserializationSchema.class);
private final boolean includeMetadata;
private ObjectMapper mapper; public JSONKeyValueDeserializationSchema(boolean includeMetadata) {
this.includeMetadata = includeMetadata;
} public ObjectNode deserialize(ConsumerRecord<byte[], byte[]> record) {
if (this.mapper == null) {
this.mapper = new ObjectMapper();
}
ObjectNode node = this.mapper.createObjectNode(); if (record.key() != null) {
node.set("key", this.mapper.readValue(record.key(), JsonNode.class));
} if (record.value() != null) {
node.set("value", this.mapper.readValue(record.value(), JsonNode.class));
} if (this.includeMetadata) {
node.putObject("metadata").put("offset", record.offset()).put("topic", record.topic()).put("partition", record.partition());
}return node;
} public boolean isEndOfStream(ObjectNode nextElement) {
return false;
} public TypeInformation<ObjectNode> getProducedType() {
return TypeExtractor.getForClass(ObjectNode.class);
}
}

本来以为到这里就大功告成了,谁不想居然报错了。。每条消息反序列化的都报错。

2019-11-29 19:55:15.401 flink [Source: kafkasource (1/1)] ERROR c.y.b.D.JSONKeyValueDeserializationSchema - Unrecognized token 'xxxxx': was expecting ('true', 'false' or 'null')
at [Source: [B@2e119f0e; line: 1, column: 45]org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'xxxxxxx': was expecting ('true', 'false' or 'null')
at [Source: [B@2e119f0e; line: 1, column: 45]
at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1586)
at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:521)
at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidToken(UTF8StreamJsonParser.java:3464)
at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2628)
at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:854)
at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:748)
at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3847)
at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3792)
at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2890)
at com.xx.xx.DeserializationSchema.JSONKeyValueDeserializationSchema.deserialize(JSONKeyValueDeserializationSchema.java:33)
at com.xx.xx.DeserializationSchema.JSONKeyValueDeserializationSchema.deserialize(JSONKeyValueDeserializationSchema.java:15)
at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:140)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:711)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:745)

因为源码是没有try catch的,无法获取到报错的具体数据,只能直接重写这个方法了

新建一个DeserializationSchema包,再创建JSONKeyValueDeserializationSchema类,然后在getKafkaConsumer重新引用我们自己的JSONKeyValueDeserializationSchema类,再在日志中我们就可以知道是哪些数据无法反序列化

@PublicEvolving
public class JSONKeyValueDeserializationSchema implements KafkaDeserializationSchema<ObjectNode> {
private static final long serialVersionUID = 1509391548173891955L;
private final static Logger log = LoggerFactory.getLogger(JSONKeyValueDeserializationSchema.class);
private final boolean includeMetadata;
private ObjectMapper mapper; public JSONKeyValueDeserializationSchema(boolean includeMetadata) {
this.includeMetadata = includeMetadata;
} public ObjectNode deserialize(ConsumerRecord<byte[], byte[]> record) {
if (this.mapper == null) {
this.mapper = new ObjectMapper();
}
ObjectNode node = this.mapper.createObjectNode();
try {
if (record.key() != null) {
node.set("key", this.mapper.readValue(record.key(), JsonNode.class));
} if (record.value() != null) {
node.set("value", this.mapper.readValue(record.value(), JsonNode.class));
} if (this.includeMetadata) {
node.putObject("metadata").put("offset", record.offset()).put("topic", record.topic()).put("partition", record.partition());
}
} catch (Exception e) {
log.error(e.getMessage(), e);
log.error("JSONKeyValueDeserializationSchema 出错:" + record.toString() + "=====key为" + new String(record.key()) + "=====数据为" + new String(record.value()));
}
return node;
} public boolean isEndOfStream(ObjectNode nextElement) {
return false;
} public TypeInformation<ObjectNode> getProducedType() {
return TypeExtractor.getForClass(ObjectNode.class);
}
}

发现key为一串订单号,因为topic数据不是原生canal json数据,是被加工过的,那应该是上游生产的时候指定的key

那继续修改我们的JSONKeyValueDeserializationSchema代码,因为key用不到,所以直接注释掉,当然也可以将class指定为String

            if (record.key() != null) {
node.set("key", this.mapper.readValue(record.key(), JsonNode.class));
}

try catch在这里我们还是保留并将出错的数据打到日志,修改后的代码如下

@PublicEvolving
public class JSONKeyValueDeserializationSchema implements KafkaDeserializationSchema<ObjectNode> {
private static final long serialVersionUID = 1509391548173891955L;
private final static Logger log = LoggerFactory.getLogger(JSONKeyValueDeserializationSchema.class);
private final boolean includeMetadata;
private ObjectMapper mapper; public JSONKeyValueDeserializationSchema(boolean includeMetadata) {
this.includeMetadata = includeMetadata;
} public ObjectNode deserialize(ConsumerRecord<byte[], byte[]> record) {
if (this.mapper == null) {
this.mapper = new ObjectMapper();
}
ObjectNode node = this.mapper.createObjectNode();
try {
// if (record.key() != null) {
// node.set("key", this.mapper.readValue(record.key(), JsonNode.class));
// } if (record.value() != null) {
node.set("value", this.mapper.readValue(record.value(), JsonNode.class));
} if (this.includeMetadata) {
node.putObject("metadata").put("offset", record.offset()).put("topic", record.topic()).put("partition", record.partition());
}
} catch (Exception e) {
log.error(e.getMessage(), e);
log.error("JSONKeyValueDeserializationSchema 出错:" + record.toString() + "=====key为" + new String(record.key()) + "=====数据为" + new String(record.value()));
}
return node;
} public boolean isEndOfStream(ObjectNode nextElement) {
return false;
} public TypeInformation<ObjectNode> getProducedType() {
return TypeExtractor.getForClass(ObjectNode.class);
}
}

至此,问题解决。

在flink中使用jackson JSONKeyValueDeserializationSchema反序列化Kafka消息报错解决的更多相关文章

  1. cmd命令中运行pytest命令导入模块报错解决方法

    报错截图 ImportError while loading conftest 'E:\python\HuaFansApi\test_case\conftest.py'. test_case\conf ...

  2. 【报错】IntelliJ IDEA中绿色注释扫描飘红报错解决

    几天开机,突然发现自己昨天的项目可以运行,今天就因为绿色注释飘红而不能运行,很是尴尬: 解决办法如下: 1.在IDEA中的setting中搜索:"javadoc" 2.把Javad ...

  3. laravel 迁移文件中修改含有enum字段的表报错解决方法

    解决方法: 在迁移文件中up方法最上方加上下面这一行代码即可: Schema::getConnection()->getDoctrineSchemaManager()->getDataba ...

  4. 在eclipse中使用git的pull功能时报错解决办法

    打开项目的 .git/config文件,参照以下进行编辑 [core] symlinks = false repositoryformatversion = 0 filemode = false lo ...

  5. vuex中的babel编译mapGetters/mapActions报错解决方法

    vex使用...mapActions报错解决办法 vuex2增加了mapGetters和mapActions的方法,借助stage2的Object Rest Operator 所在通过 methods ...

  6. Spring Boot在反序列化过程中:jackson.databind.exc.InvalidDefinitionException cannot deserialize from Object value

    错误场景 用Spring boot写了一个简单的RESTful API,在测试POST请求的时候,request body是一个符合对应实体类要求的json串,post的时候报错. 先贴一段error ...

  7. Flink 使用(一)——从kafka中读取数据写入到HBASE中

    1.前言 本文是在<如何计算实时热门商品>[1]一文上做的扩展,仅在功能上验证了利用Flink消费Kafka数据,把处理后的数据写入到HBase的流程,其具体性能未做调优.此外,文中并未就 ...

  8. Flink学习笔记:Connectors之kafka

    本文为<Flink大数据项目实战>学习笔记,想通过视频系统学习Flink这个最火爆的大数据计算框架的同学,推荐学习课程: Flink大数据项目实战:http://t.cn/EJtKhaz ...

  9. Flink中的Time

    戳更多文章: 1-Flink入门 2-本地环境搭建&构建第一个Flink应用 3-DataSet API 4-DataSteam API 5-集群部署 6-分布式缓存 7-重启策略 8-Fli ...

随机推荐

  1. 「刷题」卡特兰数&prufer序列

    1.网格 转换模型,翻折容斥出解. 2.有趣的数列 抽象一下模型,把奇数项当作横坐标,偶数项当作纵坐标,就是从n*n矩阵左下角走到右上角并且每一步x<=y的方案数,发现是卡特兰数,关于gcd,可 ...

  2. spring session源码解析

    模块划分 core部分代码 存储实现部分部分: jdbc实现 具体存储的实现类 例如:org.springframework.session.jdbc.JdbcOperationsSessionRep ...

  3. JS- 封装、继承、多态

    http://www.cnblogs.com/silence516/articles/1509456.html

  4. JavaScript部分案例

    JavaScript 是 Web 的编程语言. 所有现代的 HTML 页面都使用 JavaScript. JavaScript 非常容易学. 阅读本教程,您需要有以下基础: HTML 教程 CSS 教 ...

  5. vm虚拟机安装linux centos教程

    1 下载64btnhttp://isoredirect.centos.org/centos/7/isos/x86_64/CentOS-7-x86_64-DVD-1810.iso 2 vm注意选择cen ...

  6. T-SQL, Part III: Check table's existance

    There are several approaches to achieve so. Just list out all approaches I have tried: Approach 1: s ...

  7. [Office] Resources for Office Development

    Office 2013 Document (.chm) download page: http://www.microsoft.com/en-us/download/details.aspx?id=4 ...

  8. 【Elasticsearch 7 探索之路】(三)倒排索引

    上一篇,我们介绍了 ES 文档的基本 CURE 和批量操作.我们都知道倒排索引是搜索引擎非常重要的一种数据结构,什么是倒排索引,倒排索引的原理是什么. 1 索引过程 在讲解倒排索引前,我们先了解索引创 ...

  9. Salesforce学习之路(十三)Aura案例实战分析

    Aura相关知识整合: Salesforce学习之路(十)Aura组件工作原理 Salesforce学习之路(十一)Aura组件属性<aura:attribute /> Salesforc ...

  10. nyoj 199-无线网络覆盖 (ceil())

    199-无线网络覆盖 内存限制:64MB 时间限制:3000ms 特判: No 通过数:4 提交数:13 难度:3 题目描述: 我们的乐乐同学对于网络可算得上是情有独钟,他有一个计划,那就是用无线网覆 ...