先说下版本情况:

Spark 2.4.3

Scala 2.11.12

Flume-1.6.0

Flume配置文件:

simple-agent.sources = netcat-source
simple-agent.sinks = spark-sink
simple-agent.channels = memory-channel #Describe/configure the source
simple-agent.sources.netcat-source.type = netcat
simple-agent.sources.netcat-source.bind =centos
simple-agent.sources.netcat-source.port= 44444 # Describe the sink
simple-agent.sinks.spark-sink.type=org.apache.spark.streaming.flume.sink.SparkSink
simple-agent.sinks.spark-sink.hostname= centos
simple-agent.sinks.spark-sink.port= 41414 simple-agent.channels.memory-channel.type = memory simple-agent.sources.netcat-source.channels = memory-channel
simple-agent.sinks.spark-sink.channel = memory-channel

启动脚本:

flume-ng agent --name simple-agent --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/flume_pull.conf -Dflume.root.logger=INFO,console

到以上步骤均没有出现问题。但是将本地测试代码启动,尝试与Flume的sink进行连接时,崩了...

Flume控制台报错:

2019-10-16 16:42:35,364 (New I/O  worker #1) [WARN - org.apache.avro.ipc.Responder.respond(Responder.java:174)] system error
org.apache.avro.AvroRuntimeException: Unknown datum type: java.lang.Exception: java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.streaming.flume.sink.EventBatch
at org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:593)
at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:558)
at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:144)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at org.apache.avro.ipc.specific.SpecificResponder.writeError(SpecificResponder.java:74)
at org.apache.avro.ipc.Responder.respond(Responder.java:169)
at org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:188)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:173)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:558)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:786)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:458)
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:439)
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:558)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:553)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:84)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:471)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:332)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-10-16 16:42:35,380 (New I/O worker #1) [WARN - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.exceptionCaught(NettyServer.java:201)] Unexpected exception from downstream.
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:59)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:471)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:332)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecuto

本地IDE控制台:

10/16 16:56:38 ERROR Requestor: Error in callback handler: java.lang.IllegalAccessError: tried to access method org.apache.avro.specific.SpecificData.<init>()V from class org.apache.spark.streaming.flume.sink.EventBatch
java.lang.IllegalAccessError: tried to access method org.apache.avro.specific.SpecificData.<init>()V from class org.apache.spark.streaming.flume.sink.EventBatch

解决思路

既然都有这个org.apache.spark.streaming.flume.sink.EventBatch,所幸就看看代码吧

package org.apache.spark.streaming.flume.sink;

import org.apache.avro.specific.SpecificData;
import org.apache.avro.message.BinaryMessageEncoder;
import org.apache.avro.message.BinaryMessageDecoder;
import org.apache.avro.message.SchemaStore; @SuppressWarnings("all")
@org.apache.avro.specific.AvroGenerated
public class EventBatch extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord {
private static final long serialVersionUID = -2739787017790252011L;
public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"EventBatch\",\"namespace\":\"org.apache.spark.streaming.flume.sink\",\"fields\":[{\"name\":\"errorMsg\",\"type\":\"string\",\"default\":\"\"},{\"name\":\"sequenceNumber\",\"type\":\"string\"},{\"name\":\"events\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"record\",\"name\":\"SparkSinkEvent\",\"fields\":[{\"name\":\"headers\",\"type\":{\"type\":\"map\",\"values\":\"string\"}},{\"name\":\"body\",\"type\":\"bytes\"}]}}}]}");
public static org.apache.avro.Schema getClassSchema() { return SCHEMA$; } private static SpecificData MODEL$ = new SpecificData(); private static final BinaryMessageEncoder<EventBatch> ENCODER =
new BinaryMessageEncoder<EventBatch>(MODEL$, SCHEMA$); private static final BinaryMessageDecoder<EventBatch> DECODER =
new BinaryMessageDecoder<EventBatch>(MODEL$, SCHEMA$);

在IDEA中可以看到 org.apache.avro.message.BinaryMessageEncoder;这行是红色的,没有找到该方法。然后我就搜索了一下,

果然是我用的avro版本过旧。

解决方案

1.在代码的pom.xml中添加以下依赖。

   <!-- https://mvnrepository.com/artifact/org.apache.avro/avro -->
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.8.2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.avro/avro-ipc -->
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro-ipc</artifactId>
<version>1.8.2</version>
</dependency>

2.将以上两个jar包上传至 $FLUME_HOME/lib下,并删除旧的avro jar包。

欢迎关注我的公号:彪悍大蓝猫,持续分享大数据、Java、安全干货~

SparkStreaming整合Flume的pull报错解决方案的更多相关文章

  1. SparkStreaming整合Flume的pull方式之启动报错解决方案

    Flume配置文件: simple-agent.sources = netcat-source simple-agent.sinks = spark-sink simple-agent.channel ...

  2. Updates were rejected because the remote contains work that you do(git报错解决方案)

    Updates were rejected because the remote contains work that you do(git报错解决方案) 今天向GitHub远程仓库提交本地项目文件时 ...

  3. eclipse中使用pull报错(git提交冲突)

    1.工程->Team->pull:报错 解决方案: 2.工程->Team->Syschronize Workspace: 3.在左侧会将有冲突的代码列举出来:(可选操作:在其上 ...

  4. 【spring boot】【elasticsearch】spring boot整合elasticsearch,启动报错Caused by: java.lang.IllegalStateException: availableProcessors is already set to [8], rejecting [8

    spring boot整合elasticsearch, 启动报错: Caused by: java.lang.IllegalStateException: availableProcessors ], ...

  5. RabbitMQ>Erlang machine stopped instantly (distribution name conflict?). The service is not restarted as OnFail is set to ignore.-报错解决方案 原来是NNND。。。

    >Erlang machine stopped instantly (distribution name conflict?). The service is not restarted as ...

  6. git pull 报错 You have not concluded your merge (MERGE_HEAD exists).

    git pull时报错 解决方案:

  7. JMeter 报告监听器导入.jtl结果文件报错解决方案

    JMeter 报告监听器导入.jtl结果文件报错解决方案   by:授客 QQ:1033553122   1. 问题描述 把jmeter压测时生成的 .jtl结果文件导入监听器报告中,弹出如下错误提示 ...

  8. Python3.x:import urllib2报错解决方案

    Python:import urllib2报错解决方案 python2和3有些不一样: python2:输出为print 'hello world' python3:输出为print('hello w ...

  9. SparkStreaming整合flume

    SparkStreaming整合flume 在实际开发中push会丢数据,因为push是由flume将数据发给程序,程序出错,丢失数据.所以不会使用不做讲解,这里讲解poll,拉去flume的数据,保 ...

随机推荐

  1. 010 深入理解Python语言

    目录 一.概述 二.计算机技术的演进 2.1 计算机技术的演进过程 三.编程语言的多样初心 3.1 编程语言有哪些? 3.2 不同编程语言的初心和适用对象 3.3 2018年以后的计算环境- 四.Py ...

  2. Redis集群的离线安装以及原理理解

    一.本文主要是记录一下Redis集群在linux系统下离线的安装步骤,毕竟在生产环境下一般都是无法联网的,Redis的集群的Ruby环境安装过程还是很麻烦的,涉及到很多的依赖的安装,所以写了一个文章来 ...

  3. JSP静态include和动态include的区别

    静态include是指令元素.include指令的语法格式:<%@ include file="filename" %>.include指令的作用是在JSP页面中静态包 ...

  4. Linux 笔记 - 第十七章 Linux LVM 逻辑卷管理器

    一.前言 在实际生产中,有时会遇到磁盘分区空间不足的情况,这时候就需要对磁盘进行扩容,普通情况下需要新加一块磁盘,重分区.格式化.数据复制.卸载旧分区.挂载新分区等繁琐的步骤,而且有可能造成数据的丢失 ...

  5. DirectX12 3D 游戏开发与实战第四章内容(下)

    Direct3D的初始化(下) 学习目标 了解Direct3D在3D编程中相对于硬件所扮演的角色 理解组件对象模型COM在Direct3D中的作用 掌握基础的图像学概念,例如2D图像的存储方式,页面翻 ...

  6. 04 (OC)* weak的实现原理

    一:Weak 表 1: Runtime 维护了一个 Weak 表,用于存储所有 Weak 指针.Weak 表是一个哈希表,Key 是对象的地址,Value 是一个数组,数组里面放的是 Weak 指针的 ...

  7. Docker常用命令小记

    除了基本的docker pull.docker image.docker ps,还有一些命令及参数也很重要,在此记录下来避免遗忘. 环境信息 以下是本次操作的环境: 操作系统:CentOS Linux ...

  8. HashMap源码分析(史上最详细的源码分析)

    HashMap简介 HashMap是开发中使用频率最高的用于映射(键值对 key value)处理的数据结构,我们经常把hashMap数据结构叫做散列链表: ObjectI entry<Key, ...

  9. gitbook 入门教程之还在搞公众号互推涨粉?gitbook 集成导流工具,轻轻松松躺增粉丝!

    相信大多数博客作者都或多或少有过这样想法: 现在各种平台这么多,想要实现全平台发布就要到处复制粘贴,等我有空一定做统一平台一次性全部解决! 不知道正在阅读文章的你,有没有这样的想法? 反正我确实这么想 ...

  10. Micronaut 微服务中使用 Kafka

    今天,我们将通过Apache Kafkatopic构建一些彼此异步通信的微服务.我们使用Micronaut框架,它为与Kafka集成提供专门的库.让我们简要介绍一下示例系统的架构.我们有四个微型服务: ...