SparkStreaming整合Flume的pull报错解决方案
先说下版本情况:
Spark 2.4.3
Scala 2.11.12
Flume-1.6.0
Flume配置文件:
simple-agent.sources = netcat-source
simple-agent.sinks = spark-sink
simple-agent.channels = memory-channel
#Describe/configure the source
simple-agent.sources.netcat-source.type = netcat
simple-agent.sources.netcat-source.bind =centos
simple-agent.sources.netcat-source.port= 44444
# Describe the sink
simple-agent.sinks.spark-sink.type=org.apache.spark.streaming.flume.sink.SparkSink
simple-agent.sinks.spark-sink.hostname= centos
simple-agent.sinks.spark-sink.port= 41414
simple-agent.channels.memory-channel.type = memory
simple-agent.sources.netcat-source.channels = memory-channel
simple-agent.sinks.spark-sink.channel = memory-channel
启动脚本:
flume-ng agent --name simple-agent --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/flume_pull.conf -Dflume.root.logger=INFO,console
到以上步骤均没有出现问题。但是将本地测试代码启动,尝试与Flume的sink进行连接时,崩了...
Flume控制台报错:
2019-10-16 16:42:35,364 (New I/O worker #1) [WARN - org.apache.avro.ipc.Responder.respond(Responder.java:174)] system error
org.apache.avro.AvroRuntimeException: Unknown datum type: java.lang.Exception: java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.streaming.flume.sink.EventBatch
at org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:593)
at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:558)
at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:144)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at org.apache.avro.ipc.specific.SpecificResponder.writeError(SpecificResponder.java:74)
at org.apache.avro.ipc.Responder.respond(Responder.java:169)
at org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:188)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:173)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:558)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:786)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:458)
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:439)
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:558)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:553)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:84)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:471)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:332)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-10-16 16:42:35,380 (New I/O worker #1) [WARN - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.exceptionCaught(NettyServer.java:201)] Unexpected exception from downstream.
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:59)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:471)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:332)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecuto
本地IDE控制台:
10/16 16:56:38 ERROR Requestor: Error in callback handler: java.lang.IllegalAccessError: tried to access method org.apache.avro.specific.SpecificData.<init>()V from class org.apache.spark.streaming.flume.sink.EventBatch
java.lang.IllegalAccessError: tried to access method org.apache.avro.specific.SpecificData.<init>()V from class org.apache.spark.streaming.flume.sink.EventBatch
解决思路
既然都有这个org.apache.spark.streaming.flume.sink.EventBatch,所幸就看看代码吧
package org.apache.spark.streaming.flume.sink;
import org.apache.avro.specific.SpecificData;
import org.apache.avro.message.BinaryMessageEncoder;
import org.apache.avro.message.BinaryMessageDecoder;
import org.apache.avro.message.SchemaStore;
@SuppressWarnings("all")
@org.apache.avro.specific.AvroGenerated
public class EventBatch extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord {
private static final long serialVersionUID = -2739787017790252011L;
public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"EventBatch\",\"namespace\":\"org.apache.spark.streaming.flume.sink\",\"fields\":[{\"name\":\"errorMsg\",\"type\":\"string\",\"default\":\"\"},{\"name\":\"sequenceNumber\",\"type\":\"string\"},{\"name\":\"events\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"record\",\"name\":\"SparkSinkEvent\",\"fields\":[{\"name\":\"headers\",\"type\":{\"type\":\"map\",\"values\":\"string\"}},{\"name\":\"body\",\"type\":\"bytes\"}]}}}]}");
public static org.apache.avro.Schema getClassSchema() { return SCHEMA$; }
private static SpecificData MODEL$ = new SpecificData();
private static final BinaryMessageEncoder<EventBatch> ENCODER =
new BinaryMessageEncoder<EventBatch>(MODEL$, SCHEMA$);
private static final BinaryMessageDecoder<EventBatch> DECODER =
new BinaryMessageDecoder<EventBatch>(MODEL$, SCHEMA$);
在IDEA中可以看到 org.apache.avro.message.BinaryMessageEncoder;
这行是红色的,没有找到该方法。然后我就搜索了一下,
果然是我用的avro版本过旧。
解决方案
1.在代码的pom.xml中添加以下依赖。
<!-- https://mvnrepository.com/artifact/org.apache.avro/avro -->
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.8.2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.avro/avro-ipc -->
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro-ipc</artifactId>
<version>1.8.2</version>
</dependency>
2.将以上两个jar包上传至 $FLUME_HOME/lib下,并删除旧的avro jar包。
欢迎关注我的公号:彪悍大蓝猫,持续分享大数据、Java、安全干货~
SparkStreaming整合Flume的pull报错解决方案的更多相关文章
- SparkStreaming整合Flume的pull方式之启动报错解决方案
Flume配置文件: simple-agent.sources = netcat-source simple-agent.sinks = spark-sink simple-agent.channel ...
- Updates were rejected because the remote contains work that you do(git报错解决方案)
Updates were rejected because the remote contains work that you do(git报错解决方案) 今天向GitHub远程仓库提交本地项目文件时 ...
- eclipse中使用pull报错(git提交冲突)
1.工程->Team->pull:报错 解决方案: 2.工程->Team->Syschronize Workspace: 3.在左侧会将有冲突的代码列举出来:(可选操作:在其上 ...
- 【spring boot】【elasticsearch】spring boot整合elasticsearch,启动报错Caused by: java.lang.IllegalStateException: availableProcessors is already set to [8], rejecting [8
spring boot整合elasticsearch, 启动报错: Caused by: java.lang.IllegalStateException: availableProcessors ], ...
- RabbitMQ>Erlang machine stopped instantly (distribution name conflict?). The service is not restarted as OnFail is set to ignore.-报错解决方案 原来是NNND。。。
>Erlang machine stopped instantly (distribution name conflict?). The service is not restarted as ...
- git pull 报错 You have not concluded your merge (MERGE_HEAD exists).
git pull时报错 解决方案:
- JMeter 报告监听器导入.jtl结果文件报错解决方案
JMeter 报告监听器导入.jtl结果文件报错解决方案 by:授客 QQ:1033553122 1. 问题描述 把jmeter压测时生成的 .jtl结果文件导入监听器报告中,弹出如下错误提示 ...
- Python3.x:import urllib2报错解决方案
Python:import urllib2报错解决方案 python2和3有些不一样: python2:输出为print 'hello world' python3:输出为print('hello w ...
- SparkStreaming整合flume
SparkStreaming整合flume 在实际开发中push会丢数据,因为push是由flume将数据发给程序,程序出错,丢失数据.所以不会使用不做讲解,这里讲解poll,拉去flume的数据,保 ...
随机推荐
- Java 网络编程:必知必会的 URL 和 URLConnection
java.net.URL 类将 URL 地址进行了封装,并提供了解析 URL 地址的基本方法,比如获取 URL 的主机名和端口号.java.net.URLConnection 则代表了应用程序和 UR ...
- pyppeteer的使用
pyppeteer的使用 安装 属于第三方模块进行安装. pip install pyppeteer 在Linux中,如果权限不够则加上. sudo pip install pyppeteer 使用 ...
- mybatis_plus插件——生成器
最近在学习mybatis框架,虽然已经简化了一些Dao代码,但是还想更上一层楼吗?不再被基本的pojo层,controller层,service层,dao层基本重复代码所困恼吗?这里,让我们来学习一下 ...
- 解决php中文乱码的两种方法
第一种是添加html标签变为如下格式: <html> <head> <meta http-equiv="Content-Type" content=& ...
- Docker竟然还能这么玩?商业级4G代理搭建实战!
时间过得真快,距离这个系列的上一篇文章<商业级4G代理搭建指南[准备篇]>发布的时间已经过了两个星期了,上个星期由于各种琐事缠身,周二开始就没空写文章了,所以就咕咕咕了. 那么在准备篇中, ...
- 服务器替换san存储
1.通知DBA停库: 串行登陆服务器 2.备份系统信息 mkdir -p /bakinfo df -h > /bakinfo/df.txt_`date +%Y%m%d%H%M%S` ps -ef ...
- 公用的update
包结构: ===================================== jdbc.properties路径:/jdbc-1/src/jdbc.properties 内容: #连接MySQ ...
- SpringBoot启动zipkin-server报错Error creating bean with name ‘armeriaServer’
目前,GitHub 上最新 release 版本是 Zipkin 2.12.9,从 2.12.6 版本开始有个较大的更新,迁移使用 Armeria HTTP 引擎. 从此版本开始,若直接添加依赖的 S ...
- RDD基本操作之Action
Action介绍 在RDD上计算出来一个结果 把结果返回给driver program或保存在文件系统,count(),save 常用的Action reduce() 接收一个函数,作用在RDD两个类 ...
- 使用Git工具批量拉取代码
公司项目比较多,每天上班第一件事就是拉取代码,cd A 目录 git pull cd .. cd B ...... 一个项目一个项目的拉取,感觉也是很费劲的,那么有没有什么一键操作呢 现在执行一个命令 ...