[Storm] java.io.FileNotFoundException: File '../stormconf.ser' does not exist
This bug will kill supervisors
Affects Version/s: 0.9.2-incubating, 0.9.3, 0.9.4
Fix Version/s: 0.10.0, 0.9.5
问题背景
最近发现刚搭起的Storm集群,没过多久,Supervisor 便悄然死去了一大半。查看死去Supervisor的log,发现java.io.FileNotFoundException: File '../stormconf.ser' does not exist异常。网上给出的答案大多是
将 { storm.local.dir } 目录下的文件清空,重启就好了。
但这是指标不治本,即时重启可以跑起来,可是为什么会出现这个问题,依然不知道。
然后才发现线STORM-130解决了这个问题。该问题的重现场景:
1) Run a storm cluster with atleast 2 supervisors with 4 slots each
2) Deploy a topology that uses 4 workers, topology will be distributed with each supervisor having two workers each
3) kill one of the supervisor lets say supervisor1
4) wait till topology re-balances to occupy 4 workers on supervisor2
5) now bring up supervisor1, It goes through the cycle of cleaning up old topology code
6) nimbus re-balances topology which triggers supervisor.sync-process method
7) sync-process tries to launch a worker for the topology whose code data is delete when the supervisor started causing it throw up following exception
问题原因
上面场景分析提到的 sync-process是supervisor运行的一个函数。Supervisor会在后台运行这两个函数:
synchronize-supervisor: This is called whenever assignments in Zookeeper change and also every 10 seconds.- Downloads code from Nimbus for topologies assigned to this machine for which it doesn't have the code yet.
- Writes into local filesystem what this node is supposed to be running. It writes a map from port -> LocalAssignment. LocalAssignment contains a topology id as well as the list of task ids for that worker.
sync-processes: Reads from the LFS whatsynchronize-supervisorwrote and compares that to what's actually running on the machine. It then starts/stops worker processes as necessary to synchronize.
从描述中可以看出,synchronized-supervisor 和 sync-process 两个函数是通过 LFS 进行同步。The key reason is "synchronize-supervisor" which responsible for download file and remove file thread and "sync-processes" which responsible for start worker process thread is Asynchronous.
in synchronize-supervisor read assigment information from zk, supervisor download necessary file from nimbus and write local state. In aother thread sync-processes funciton read local state to launch workor process, when the worker process has not start ,synchronize-supervisor function is called again topology's assignment information has changed (cased by rebalance,or worker time out etc) worker assignment to this supervisor has move to another supervisor, synchronize-supervisor remove the unnecessary file (jar file and ser file etc.) , after this, worker launched by " sync-processes" ,ser file was not exsit , this issue occur.
可能解决办法
- 换一个storm
- 调整参数
- Change "synchronize-supervisor" thread loop time to a longger than 10(default time) sec, such as 30 sec。
- supervisor.worker.timeout.secs: 30 -> 5
References:
- https://issues.apache.org/jira/browse/STORM-130
- http://storm.apache.org/documentation/Lifecycle-of-a-topology.html
[Storm] java.io.FileNotFoundException: File '../stormconf.ser' does not exist的更多相关文章
- Spark启动报错|java.io.FileNotFoundException: File does not exist: hdfs://hadoop101:9000/directory
at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:) at org.a ...
- com.jcraft.jsch.JSchException: java.io.FileNotFoundException: file:\D:\development\ideaProjects\salary-card\target\salary-card-0.0.1-SNAPSHOT.jar!\BOOT-INF\classes!\keystore\login_id_rsa 资源未找到
com.jcraft.jsch.JSchException: java.io.FileNotFoundException: file:\D:\development\ideaProjects\sala ...
- 关于spark入门报错 java.io.FileNotFoundException: File file:/home/dummy/spark_log/file1.txt does not exist
不想看废话的可以直接拉到最底看总结 废话开始: master: master主机存在文件,却报 执行spark-shell语句: ./spark-shell --master spark://ma ...
- Sqoop 抽数报错: java.io.FileNotFoundException: File does not exist
Sqoop 抽数报错: java.io.FileNotFoundException: File does not exist 一.错误详情 2019-10-17 20:04:49,080 INFO [ ...
- Diagnostics: File file:/private/tmp/spark-d4ebd819-e623-47c3-b008-2a4df8019758/__spark_libs__6824092999244734377.zip does not exist java.io.FileNotFoundException: File file:/private/tmp/spark-d4ebd819
spark伪分布式模式 on-yarn出现一下错误 Diagnostics: File file:/private/tmp/spark-d4ebd819-e623-47c3-b008-2a4df801 ...
- QA:java.lang.RuntimeException:java.io.FileNotFoundException:Resource nexus-maven-repository-index.properties does not exist.
QA:java.lang.RuntimeException:java.io.FileNotFoundException:Resource nexus-maven-repository-index.pr ...
- java.io.FileNotFoundException:file:\D:\code\xml-load\target\XX.jar!\XXX(文件名、目录名或卷标语法不正确。)
1.当使用Spring Boot将应用打成jar时,需要读取resources目录下配置文件时,通常使用ClassLoader直接读取,通常建议使用这种方式,直接将xml文件读成流传入 // 加载xm ...
- java.io.FileNotFoundException:SESSIONS.ser (系统找不到指定的路径。)
问题如下: java.io.FileNotFoundException: E:\apache-tomcat-8.0.37\work\Catalina\localhost\20161013Shoppin ...
- Caused by: java.io.FileNotFoundException: velocity.log (No such file or directory)
Caused by: org.apache.velocity.exception.VelocityException: Error initializing log: Failed to initia ...
随机推荐
- Deformity ASP/ASPX Webshell、Webshell Hidden Learning
catalog . Active Server Page(ASP) . ASP.NET . ASP WEBSHELL变形方式 . ASPX WEBSHELL变形方式 . webshell中常见的编码转 ...
- MVC5-10 从模型验证来说内部那些事
源码解析 模型验证几乎在大部分的项目中都在被使用,这方面的博文教程也很多,关于那些更详细的模型验证这里就不多赘述了,主要讲解内部是如何进行验证的. 在前几篇博文中提到了DefaultModelBind ...
- .net数据库操作
刚接触到数据库时总是被数据库中的一些基本概念,比如Connection.Command.DataReader等,给整的糊里糊涂.如今,对数据库的基本操作有了一定的认识,特此做出总结,以便后续工作中查阅 ...
- POJ 1182 食物链(带权并查集)
传送门 食物链 Time Limit: 1000MS Memory Limit: 10000K Total Submissions: 65579 Accepted: 19336 Descri ...
- 使用uWSGI部署django项目
先说说什么是uWSGI吧,他是实现了WSGI协议.uwsgi.http等协议的一个web服务器,那什么是WSGI呢? WSGI是一种Web服务器网关接口.它是一个Web服务器(如nginx)与应用服务 ...
- 北京地铁站点遍历最少经站次数问题普遍意义上是一个NP问题,目前不存在多项式时间算法能够解决该问题
http://www.cnblogs.com/jiel/p/5852591.html 众所周知求一个图的哈密顿回路是一个NPC问题: In the mathematical field of grap ...
- linux 基础 shell脚本命令
#########shell脚本命令#### 1.diff diff file file1 ####比较两个文件的不同 -c ####显示周围的行 -u ####按照一格式统一输出生成补丁 -r ## ...
- JS-鼠标经过显示二级菜单
在css处添加了border样式为了看得更清楚——源代码有一个程序漏洞,存在一个很烦人的大bug. <ul class="nav"> <li class=&quo ...
- spark 加载文件
spark 加载文件 textFile的参数是一个path,这个path可以是: 1. 一个文件路径,这时候只装载指定的文件 2. 一个目录路径,这时候只装载指定目录下面的所有文件(不包括子目录下面的 ...
- Oracle - 数据库的实例、表空间、用户、表之间关系
完整的Oracle数据库通常由两部分组成:Oracle数据库和数据库实例. 1) 数据库是一系列物理文件的集合(数据文件,控制文件,联机日志,参数文件等): 2) Oracle数据库实例则是一组Ora ...