在做支付订单宽表的场景,需要关联的表比较多而且支付有可能要延迟很久,这种情况下不太适合使用Flink的表Join,想到的另外一种解决方案是消费多个Topic的数据,再根据订单号进行keyBy,再在逻辑中根据不同Topic处理,所以在接收到的消息中最好能够有topic字段,JSONKeyValueDeserializationSchema就完美的解决了这个问题。

  def getKafkaConsumer(kafkaAddr: String, topicNames: util.ArrayList[String], groupId: String): FlinkKafkaConsumer[ObjectNode] = {
val properties = getKafkaProperties(groupId, kafkaAddr)
val consumer = new FlinkKafkaConsumer[ObjectNode](topicNames, new JSONKeyValueDeserializationSchema(true), properties)
consumer.setStartFromGroupOffsets() // the default behaviour
consumer
}

在这里new JSONKeyValueDeserializationSchema(true)是需要带上元数据信息,false则不带上,源码如下

public class JSONKeyValueDeserializationSchema implements KafkaDeserializationSchema<ObjectNode> {
private static final long serialVersionUID = 1509391548173891955L;
private final static Logger log = LoggerFactory.getLogger(JSONKeyValueDeserializationSchema.class);
private final boolean includeMetadata;
private ObjectMapper mapper; public JSONKeyValueDeserializationSchema(boolean includeMetadata) {
this.includeMetadata = includeMetadata;
} public ObjectNode deserialize(ConsumerRecord<byte[], byte[]> record) {
if (this.mapper == null) {
this.mapper = new ObjectMapper();
}
ObjectNode node = this.mapper.createObjectNode(); if (record.key() != null) {
node.set("key", this.mapper.readValue(record.key(), JsonNode.class));
} if (record.value() != null) {
node.set("value", this.mapper.readValue(record.value(), JsonNode.class));
} if (this.includeMetadata) {
node.putObject("metadata").put("offset", record.offset()).put("topic", record.topic()).put("partition", record.partition());
}return node;
} public boolean isEndOfStream(ObjectNode nextElement) {
return false;
} public TypeInformation<ObjectNode> getProducedType() {
return TypeExtractor.getForClass(ObjectNode.class);
}
}

本来以为到这里就大功告成了,谁不想居然报错了。。每条消息反序列化的都报错。

2019-11-29 19:55:15.401 flink [Source: kafkasource (1/1)] ERROR c.y.b.D.JSONKeyValueDeserializationSchema - Unrecognized token 'xxxxx': was expecting ('true', 'false' or 'null')
at [Source: [B@2e119f0e; line: 1, column: 45]org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'xxxxxxx': was expecting ('true', 'false' or 'null')
at [Source: [B@2e119f0e; line: 1, column: 45]
at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1586)
at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:521)
at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidToken(UTF8StreamJsonParser.java:3464)
at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2628)
at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:854)
at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:748)
at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3847)
at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3792)
at org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2890)
at com.xx.xx.DeserializationSchema.JSONKeyValueDeserializationSchema.deserialize(JSONKeyValueDeserializationSchema.java:33)
at com.xx.xx.DeserializationSchema.JSONKeyValueDeserializationSchema.deserialize(JSONKeyValueDeserializationSchema.java:15)
at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:140)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:711)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:745)

因为源码是没有try catch的,无法获取到报错的具体数据,只能直接重写这个方法了

新建一个DeserializationSchema包,再创建JSONKeyValueDeserializationSchema类,然后在getKafkaConsumer重新引用我们自己的JSONKeyValueDeserializationSchema类,再在日志中我们就可以知道是哪些数据无法反序列化

@PublicEvolving
public class JSONKeyValueDeserializationSchema implements KafkaDeserializationSchema<ObjectNode> {
private static final long serialVersionUID = 1509391548173891955L;
private final static Logger log = LoggerFactory.getLogger(JSONKeyValueDeserializationSchema.class);
private final boolean includeMetadata;
private ObjectMapper mapper; public JSONKeyValueDeserializationSchema(boolean includeMetadata) {
this.includeMetadata = includeMetadata;
} public ObjectNode deserialize(ConsumerRecord<byte[], byte[]> record) {
if (this.mapper == null) {
this.mapper = new ObjectMapper();
}
ObjectNode node = this.mapper.createObjectNode();
try {
if (record.key() != null) {
node.set("key", this.mapper.readValue(record.key(), JsonNode.class));
} if (record.value() != null) {
node.set("value", this.mapper.readValue(record.value(), JsonNode.class));
} if (this.includeMetadata) {
node.putObject("metadata").put("offset", record.offset()).put("topic", record.topic()).put("partition", record.partition());
}
} catch (Exception e) {
log.error(e.getMessage(), e);
log.error("JSONKeyValueDeserializationSchema 出错:" + record.toString() + "=====key为" + new String(record.key()) + "=====数据为" + new String(record.value()));
}
return node;
} public boolean isEndOfStream(ObjectNode nextElement) {
return false;
} public TypeInformation<ObjectNode> getProducedType() {
return TypeExtractor.getForClass(ObjectNode.class);
}
}

发现key为一串订单号,因为topic数据不是原生canal json数据,是被加工过的,那应该是上游生产的时候指定的key

那继续修改我们的JSONKeyValueDeserializationSchema代码,因为key用不到,所以直接注释掉,当然也可以将class指定为String

            if (record.key() != null) {
node.set("key", this.mapper.readValue(record.key(), JsonNode.class));
}

try catch在这里我们还是保留并将出错的数据打到日志,修改后的代码如下

@PublicEvolving
public class JSONKeyValueDeserializationSchema implements KafkaDeserializationSchema<ObjectNode> {
private static final long serialVersionUID = 1509391548173891955L;
private final static Logger log = LoggerFactory.getLogger(JSONKeyValueDeserializationSchema.class);
private final boolean includeMetadata;
private ObjectMapper mapper; public JSONKeyValueDeserializationSchema(boolean includeMetadata) {
this.includeMetadata = includeMetadata;
} public ObjectNode deserialize(ConsumerRecord<byte[], byte[]> record) {
if (this.mapper == null) {
this.mapper = new ObjectMapper();
}
ObjectNode node = this.mapper.createObjectNode();
try {
// if (record.key() != null) {
// node.set("key", this.mapper.readValue(record.key(), JsonNode.class));
// } if (record.value() != null) {
node.set("value", this.mapper.readValue(record.value(), JsonNode.class));
} if (this.includeMetadata) {
node.putObject("metadata").put("offset", record.offset()).put("topic", record.topic()).put("partition", record.partition());
}
} catch (Exception e) {
log.error(e.getMessage(), e);
log.error("JSONKeyValueDeserializationSchema 出错:" + record.toString() + "=====key为" + new String(record.key()) + "=====数据为" + new String(record.value()));
}
return node;
} public boolean isEndOfStream(ObjectNode nextElement) {
return false;
} public TypeInformation<ObjectNode> getProducedType() {
return TypeExtractor.getForClass(ObjectNode.class);
}
}

至此,问题解决。

在flink中使用jackson JSONKeyValueDeserializationSchema反序列化Kafka消息报错解决的更多相关文章

  1. cmd命令中运行pytest命令导入模块报错解决方法

    报错截图 ImportError while loading conftest 'E:\python\HuaFansApi\test_case\conftest.py'. test_case\conf ...

  2. 【报错】IntelliJ IDEA中绿色注释扫描飘红报错解决

    几天开机,突然发现自己昨天的项目可以运行,今天就因为绿色注释飘红而不能运行,很是尴尬: 解决办法如下: 1.在IDEA中的setting中搜索:"javadoc" 2.把Javad ...

  3. laravel 迁移文件中修改含有enum字段的表报错解决方法

    解决方法: 在迁移文件中up方法最上方加上下面这一行代码即可: Schema::getConnection()->getDoctrineSchemaManager()->getDataba ...

  4. 在eclipse中使用git的pull功能时报错解决办法

    打开项目的 .git/config文件,参照以下进行编辑 [core] symlinks = false repositoryformatversion = 0 filemode = false lo ...

  5. vuex中的babel编译mapGetters/mapActions报错解决方法

    vex使用...mapActions报错解决办法 vuex2增加了mapGetters和mapActions的方法,借助stage2的Object Rest Operator 所在通过 methods ...

  6. Spring Boot在反序列化过程中:jackson.databind.exc.InvalidDefinitionException cannot deserialize from Object value

    错误场景 用Spring boot写了一个简单的RESTful API,在测试POST请求的时候,request body是一个符合对应实体类要求的json串,post的时候报错. 先贴一段error ...

  7. Flink 使用(一)——从kafka中读取数据写入到HBASE中

    1.前言 本文是在<如何计算实时热门商品>[1]一文上做的扩展,仅在功能上验证了利用Flink消费Kafka数据,把处理后的数据写入到HBase的流程,其具体性能未做调优.此外,文中并未就 ...

  8. Flink学习笔记:Connectors之kafka

    本文为<Flink大数据项目实战>学习笔记,想通过视频系统学习Flink这个最火爆的大数据计算框架的同学,推荐学习课程: Flink大数据项目实战:http://t.cn/EJtKhaz ...

  9. Flink中的Time

    戳更多文章: 1-Flink入门 2-本地环境搭建&构建第一个Flink应用 3-DataSet API 4-DataSteam API 5-集群部署 6-分布式缓存 7-重启策略 8-Fli ...

随机推荐

  1. 学习笔记63_python反射

    ####反射预备知识一########### __call__ 对象后面加括号,触发执行. python中,类的默认的内置方法,有一个名为__call__,如 class foo: def  __in ...

  2. [考试反思]1024csp-s模拟测试85:以为

    愈发垃圾. T1基本全场切(除了RP<-inf的zkt和把人擦) 然后T2想了半天逐渐趋近于正解,但是因为数据有问题锅了25分,没什么好说的.T3连题意转化都没有完成.括号匹配转为+1/-1做法 ...

  3. [考试反思]0907NOIP模拟测试39:角落

    题比较简单,但是做的非常烂. T1是个愚蠢的找规律组合数快速幂,数组开小了(看错数据范围) T2题目保证联通没看见,hack掉了正解. T3也挺蠢的,但是打乱了,思路不是很清晰导致丢了50分. 只能说 ...

  4. JAVA程序打包方法-挺好

    https://blog.csdn.net/dj0721/article/details/72462688/

  5. 009-2010网络最热的 嵌入式学习|ARM|Linux|wince|ucos|经典资料与实例分析

    前段时间做了一个关于ARM9 2440资料的汇总帖,很高兴看到21ic和CSDN等论坛朋友们的支持和鼓励.当年学单片机的时候datasheet和学习资料基本都是在论坛上找到的,也遇到很多好心的高手朋友 ...

  6. day1-习题

    # 1.使用while循环输入 1 2 3 4 5 6 8 9 10 count = 1 while count<11 : #使用while语句循环输入123...10 if count == ...

  7. fcgi-2.4.1.tar.gz 和 spawn-fcgi-1.6.4.tar.gz 百度云

    链接:https://pan.baidu.com/s/1nEzOkFC0-rfVMDy_BygLWg 提取码:ty0e 美好的东西都是免费滴

  8. 【Swift】UNNotificationServiceExtension

    一.简介 An object that modifies the content of a remote notification before it's delivered to the user. ...

  9. ActiveMQ消息队列从入门到实践(1)—JMS的概念和JMS消息模型

    1. 面向消息的中间件 1.1 什么是MOM 面向消息的中间件,Message Oriented Middleware,简称MOM,中文简称消息中间件,利用高效可靠的消息传递机制进行平台无关的数据交流 ...

  10. Ansible之templates模板

    一.jinja2简介解 Jinja2是Python下一个被广泛应用的模版引擎,他的设计思想来源于Djanjo的模板引擎,并扩展了其语法和一系列强大的功能.ansible的模板配置文件就是用jinja2 ...