SparkStreaming整合Flume的pull报错解决方案

先说下版本情况：

Spark 2.4.3

Scala 2.11.12

Flume-1.6.0

Flume配置文件：

simple-agent.sources = netcat-source

simple-agent.sinks = spark-sink

simple-agent.channels = memory-channel

#Describe/configure the source

simple-agent.sources.netcat-source.type = netcat

simple-agent.sources.netcat-source.bind =centos

simple-agent.sources.netcat-source.port= 44444

# Describe the sink

simple-agent.sinks.spark-sink.type=org.apache.spark.streaming.flume.sink.SparkSink

simple-agent.sinks.spark-sink.hostname= centos

simple-agent.sinks.spark-sink.port= 41414

simple-agent.channels.memory-channel.type = memory

simple-agent.sources.netcat-source.channels = memory-channel

simple-agent.sinks.spark-sink.channel = memory-channel

启动脚本：

flume-ng agent --name simple-agent --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/flume_pull.conf -Dflume.root.logger=INFO,console

到以上步骤均没有出现问题。但是将本地测试代码启动，尝试与Flume的sink进行连接时，崩了...

Flume控制台报错：

2019-10-16 16:42:35,364 (New I/O  worker #1) [WARN - org.apache.avro.ipc.Responder.respond(Responder.java:174)] system error

org.apache.avro.AvroRuntimeException: Unknown datum type: java.lang.Exception: java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.streaming.flume.sink.EventBatch

	at org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:593)

	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:558)

	at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:144)

	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71)

	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)

	at org.apache.avro.ipc.specific.SpecificResponder.writeError(SpecificResponder.java:74)

	at org.apache.avro.ipc.Responder.respond(Responder.java:169)

	at org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:188)

	at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)

	at org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:173)

	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:558)

	at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:786)

	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)

	at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:458)

	at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:439)

	at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)

	at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)

	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:558)

	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:553)

	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)

	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)

	at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:84)

	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:471)

	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:332)

	at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)

	at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)

	at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

	at java.lang.Thread.run(Thread.java:748)

2019-10-16 16:42:35,380 (New I/O  worker #1) [WARN - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.exceptionCaught(NettyServer.java:201)] Unexpected exception from downstream.

java.io.IOException: Connection reset by peer

	at sun.nio.ch.FileDispatcherImpl.read0(Native Method)

	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)

	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)

	at sun.nio.ch.IOUtil.read(IOUtil.java:192)

	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)

	at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:59)

	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:471)

	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:332)

	at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)

	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

	at java.util.concurrent.ThreadPoolExecuto

本地IDE控制台：

10/16 16:56:38 ERROR Requestor: Error in callback handler: java.lang.IllegalAccessError: tried to access method org.apache.avro.specific.SpecificData.<init>()V from class org.apache.spark.streaming.flume.sink.EventBatch

java.lang.IllegalAccessError: tried to access method org.apache.avro.specific.SpecificData.<init>()V from class org.apache.spark.streaming.flume.sink.EventBatch

解决思路

既然都有这个org.apache.spark.streaming.flume.sink.EventBatch,所幸就看看代码吧

package org.apache.spark.streaming.flume.sink;

import org.apache.avro.specific.SpecificData;

import org.apache.avro.message.BinaryMessageEncoder;

import org.apache.avro.message.BinaryMessageDecoder;

import org.apache.avro.message.SchemaStore;

@SuppressWarnings("all")

@org.apache.avro.specific.AvroGenerated

public class EventBatch extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord {

  private static final long serialVersionUID = -2739787017790252011L;

  public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"EventBatch\",\"namespace\":\"org.apache.spark.streaming.flume.sink\",\"fields\":[{\"name\":\"errorMsg\",\"type\":\"string\",\"default\":\"\"},{\"name\":\"sequenceNumber\",\"type\":\"string\"},{\"name\":\"events\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"record\",\"name\":\"SparkSinkEvent\",\"fields\":[{\"name\":\"headers\",\"type\":{\"type\":\"map\",\"values\":\"string\"}},{\"name\":\"body\",\"type\":\"bytes\"}]}}}]}");

  public static org.apache.avro.Schema getClassSchema() { return SCHEMA$; }

  private static SpecificData MODEL$ = new SpecificData();

  private static final BinaryMessageEncoder<EventBatch> ENCODER =

      new BinaryMessageEncoder<EventBatch>(MODEL$, SCHEMA$);

  private static final BinaryMessageDecoder<EventBatch> DECODER =

      new BinaryMessageDecoder<EventBatch>(MODEL$, SCHEMA$);

在IDEA中可以看到 org.apache.avro.message.BinaryMessageEncoder;这行是红色的，没有找到该方法。然后我就搜索了一下，

果然是我用的avro版本过旧。

解决方案

1.在代码的pom.xml中添加以下依赖。

   <!-- https://mvnrepository.com/artifact/org.apache.avro/avro -->

        <dependency>

            <groupId>org.apache.avro</groupId>

            <artifactId>avro</artifactId>

            <version>1.8.2</version>

        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.avro/avro-ipc -->

        <dependency>

            <groupId>org.apache.avro</groupId>

            <artifactId>avro-ipc</artifactId>

            <version>1.8.2</version>

        </dependency>

2.将以上两个jar包上传至 $FLUME_HOME/lib下，并删除旧的avro jar包。

欢迎关注我的公号：彪悍大蓝猫，持续分享大数据、Java、安全干货~

SparkStreaming整合Flume的pull报错解决方案的更多相关文章

SparkStreaming整合Flume的pull方式之启动报错解决方案
Flume配置文件: simple-agent.sources = netcat-source simple-agent.sinks = spark-sink simple-agent.channel ...
Updates were rejected because the remote contains work that you do(git报错解决方案)
Updates were rejected because the remote contains work that you do(git报错解决方案) 今天向GitHub远程仓库提交本地项目文件时 ...
eclipse中使用pull报错（git提交冲突）
1.工程->Team->pull:报错解决方案: 2.工程->Team->Syschronize Workspace: 3.在左侧会将有冲突的代码列举出来:(可选操作:在其上 ...
【spring boot】【elasticsearch】spring boot整合elasticsearch，启动报错Caused by: java.lang.IllegalStateException: availableProcessors is already set to [8], rejecting [8
spring boot整合elasticsearch, 启动报错: Caused by: java.lang.IllegalStateException: availableProcessors ], ...
RabbitMQ>Erlang machine stopped instantly (distribution name conflict?). The service is not restarted as OnFail is set to ignore.-报错解决方案原来是NNND。。。
>Erlang machine stopped instantly (distribution name conflict?). The service is not restarted as ...
git pull 报错 You have not concluded your merge (MERGE_HEAD exists).
git pull时报错解决方案:
JMeter 报告监听器导入.jtl结果文件报错解决方案
JMeter 报告监听器导入.jtl结果文件报错解决方案 by:授客 QQ:1033553122 1. 问题描述把jmeter压测时生成的 .jtl结果文件导入监听器报告中,弹出如下错误提示 ...
Python3.x：import urllib2报错解决方案
Python:import urllib2报错解决方案 python2和3有些不一样: python2:输出为print 'hello world' python3:输出为print('hello w ...
SparkStreaming整合flume
SparkStreaming整合flume 在实际开发中push会丢数据,因为push是由flume将数据发给程序,程序出错,丢失数据.所以不会使用不做讲解,这里讲解poll,拉去flume的数据,保 ...

随机推荐

springmvc——@InitBinder注解
转自http://www.cnblogs.com/douJiangYouTiao888/p/6765220.html 有些类型的数据是无法自动转换的,比如请求参数中包含时间类型的数据,无法自动映射到C ...
Spring boot拦截器的实现
Spring boot拦截器的实现 Spring boot自带HandlerInterceptor,可通过继承它来实现拦截功能,其的功能跟过滤器类似,但是提供更精细的的控制能力. 1.注册拦截器 @C ...
初步认识JWT
前言: 现在越来越多的项目或多或少会用到JWT,为什么会出现使用JWT这样的场景的呢? 假设现在有一个APP,后台是分布式系统.APP的首页模块部署在上海机房的服务器上,子页面模块部署在深圳机房的服务 ...
java静态代码块/静态属性、构造块、构造方法执行、main方法、普通代码块的顺序
java静态代码块/静态属性.构造块.构造方法执行.main方法.普通代码块的顺序这也是在笔试中的一个重要的考点,就有一个输出语句让你写出输出的结果. 理论知识: 静态代码块是:属于类的,在类加载时 ...
Java第二次作业第一题
编写图形界面程序,在窗体中设置菜单栏,在菜单栏上添加"file"菜单,在文件菜单中添加"new"和"quit"两个菜单项,其中"q ...
38 (OC)* 进程、线程、堆栈
一.进程和线程 1.什么是进程进程是指在系统中正在运行的一个应用程序每个进程之间是独立的,每个进程均运行在其专用且受保护的内存空间内比如同时打开QQ.Xcode,系统就会分别启动2个进程通过“ ...
睡梦中被拉起来执行Spring事务
梦中惊醒在Tomcat的线程池里,有这样一个线程,自打出生后,从来不去干活儿,有好多次走出线程池“这座大山”去看世界的机会,都被他拱手让给了弟兄们. 弟兄们给他取了个名字叫二师兄.没错,好吃懒做,饱 ...
[Linux] CentOS 显示 -bash: vim: command not found
转载自:https://www.cnblogs.com/wenqiangwu/p/3288349.html i. 那么如何安裝 vim 呢?输入rpm -qa|grep vim 命令, 如果 vim ...
编程必备基础知识|计算机组成原理篇(09)：CPU的控制器和运算器
计算机基础方面的知识,对于一些非科班出身的同学来讲,一直是他们心中的痛,而对于科班出身的同学,很多同学在工作之后,也意识到自身所学知识的不足与欠缺,想回头补补基础知识.关于计算机基础的课程很多,内容繁 ...
微服务SpringCloud之注册中心Consul
Consul 介绍 Consul 是 HashiCorp 公司推出的开源工具,用于实现分布式系统的服务发现与配置.与其它分布式服务注册与发现的方案,Consul 的方案更“一站式”,内置了服务注册与发 ...