你遇到了吗？Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.fs.FileAlreadyExistsException)

我在使用 Structured Streaming 的 ForeachWriter，写 HDFS 文件时，出现了这个异常

这个异常出现的原因是HDFS作为一个分布式文件系统，支持多线程读，但是不支持多线程写入。所以HDFS引入了一个时间类型的锁机制，也就是HDFS的租约机制（** lease holder**）。

这个知识点来源于这篇文章 http://blog.csdn.net/weixin_44252761/article/details/89517393

大数据计算时，多线程与分布式的并行可以很好的加速数据的处理速度。可在大数据存储时，分布式的文件存储系统对并发的写请求支持存在天然的缺陷。这是一对天然的矛盾，暂时无法解决，只能缓和。

怎么缓和呢？不得不崇拜Spark开发者的智商，非常的简单和实用。不能同时写一个文件，但是可以同时写多个文件啊，只要我（spark或者程序）认为这多个文件是一个文件，那写一个和多个就没有区别了。

按照这个想法，修改我的代码，真正代码篇幅太长，主要就是一个地方：

将val hdfsWritePath = new Path(path) 改为 val hdfsWritePath = new Path(path + "/" + partitionId) 即可。

有兴趣的朋友可以看看更全面的代码，原来的源代码如下：

       inputStream match {

            case Some(is) =>

                is.writeStream

                        .foreach(new ForeachWriter[Row]() {

                            var successBufferedWriter: Option[BufferedWriter] = None

                            def openHdfs(path: String, partitionId: Long, version: Long): Option[BufferedWriter] = {

                                val configuration: Configuration = new Configuration()

                                configuration.set("fs.defaultFS", hdfsAddr)

                                val fileSystem: FileSystem = FileSystem.get(configuration)

                                val hdfsWritePath = new Path(path)

                                val fsDataOutputStream: FSDataOutputStream =

                                    if (fileSystem.exists(hdfsWritePath))

                                        fileSystem.append(hdfsWritePath)

                                    else

                                        fileSystem.create(hdfsWritePath)

                                Some(new BufferedWriter(new OutputStreamWriter(fsDataOutputStream, StandardCharsets.UTF_8)))

                            }

                            override def open(partitionId: Long, version: Long): Boolean = {

                                successBufferedWriter =

                                        if (successBufferedWriter.isEmpty) openHdfs(successPath, partitionId, version)

                                        else successBufferedWriter

                                true

                            }

                            override def process(value: Row): Unit = {

                                successBufferedWriter.get.write(value.mkString(","))

                                successBufferedWriter.get.newLine()

                            }

                            override def close(errorOrNull: Throwable): Unit = {

                                successBufferedWriter.get.flush()

                                successBufferedWriter.get.close()

                            }

                        })

                        .start()

                        .awaitTermination()

上述代码初看没问题，却会导致标题错误，修改如下：

       inputStream match {

            case Some(is) =>

                is.writeStream

                        .foreach(new ForeachWriter[Row]() {

                            var successBufferedWriter: Option[BufferedWriter] = None

                            def openHdfs(path: String, partitionId: Long, version: Long): Option[BufferedWriter] = {

                                val configuration: Configuration = new Configuration()

                                configuration.set("fs.defaultFS", hdfsAddr)

                                val fileSystem: FileSystem = FileSystem.get(configuration)

                                val hdfsWritePath = new Path(path + "/" + partitionId)

                                val fsDataOutputStream: FSDataOutputStream =

                                    if (fileSystem.exists(hdfsWritePath))

                                        fileSystem.append(hdfsWritePath)

                                    else

                                        fileSystem.create(hdfsWritePath)

                                Some(new BufferedWriter(new OutputStreamWriter(fsDataOutputStream, StandardCharsets.UTF_8)))

                            }

                            override def open(partitionId: Long, version: Long): Boolean = {

                                successBufferedWriter =

                                        if (successBufferedWriter.isEmpty) openHdfs(successPath, partitionId, version)

                                        else successBufferedWriter

                                true

                            }

                            override def process(value: Row): Unit = {

                                successBufferedWriter.get.write(value.mkString(","))

                                successBufferedWriter.get.newLine()

                            }

                            override def close(errorOrNull: Throwable): Unit = {

                                successBufferedWriter.get.flush()

                                successBufferedWriter.get.close()

                            }

                        })

                        .start()

                        .awaitTermination()

如此轻松（其实困扰了我一天）就解决了这个可能大家都会遇到的问题，读取时路径到 successPath 即可，分享出来。

如果有什么问题或不足，希望大家可以与我联系，共同进步。

完~~~~

你遇到了吗？Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.fs.FileAlreadyExistsException)的更多相关文章

异常-Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=hdfs, access=WRITE, inode="/hbase":root:supergroup:drwxr-xr-x
1 详细异常 Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlExce ...
Hive执行count函数失败，Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException)
Hive执行count函数失败 1.现象: 0: jdbc:hive2://192.168.137.12:10000> select count(*) from emp; INFO : Numb ...
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
用windows连接hadoop集群执行mapreduce任务的时候出现以下错误: org.apache.hadoop.security.AccessControlException:Permissi ...
Hive JDBC:java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: root is not allowed to impersonate anonymous
今天使用JDBC来操作Hive时,首先启动了hive远程服务模式:hiveserver2 &(表示后台运行),然后到eclipse中运行程序时出现错误: java.sql.SQLExcepti ...
一脸懵逼加从入门到绝望学习hadoop之 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=Administrator, access=WRITE, inode="/":root:supergroup:drwxr-xr报错
1:初学hadoop遇到各种错误,这里贴一下,方便以后脑补吧,报错如下: 主要是在window环境下面搞hadoop,而hadoop部署在linux操作系统上面:出现这个错误是权限的问题,操作hado ...
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException)
在运行hadoop的程序时,向hdfs中写文件时候,抛出异常信息如下: Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hado ...
hive运行query语句时提示错误：org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.io.IOException:
hive> select product_id, track_time from trackinfo limit 5; Total MapReduce jobs = 1 Launching Jo ...
org.apache.hadoop.ipc.RemoteException(java.io.IOException)
昨晚突然之间mr跑步起来了 jps查看进程都在的,但是在reduce任务跑了85%的时候会抛异常异常情况如下: 2016-09-21 21:32:28,538 INFO [org.apache.h ...
运行基准测试hadoop集群中的问题：org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /benchmarks/TestDFSIO/io_data/test_
在master(即:host2)中执行 hadoop jar hadoop-test-1.1.2.jar DFSCIOTest -write -nrFiles 12 -fileSize 10240 - ...

随机推荐

Python爬虫：获取JS动态内容
经过一段时间的python学习,能写出一些爬虫了.但是,遇到js动态加载的网页就犯了难.于是乎谷歌.百度,发现个好介绍http://www.jianshu.com/p/4fe8bb1ea984 主要就 ...
列表元祖 range
1.列表 list 存放一些数据的容器比如衣柜书包作用:存储一些数据,数据量比较大可以下标可以切片可以步长和字符串的完全一样 lst = [1,2,3] print(lst) #[1, ...
Spark 学习笔记之 aggregateByKey
aggregateByKey: import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD import org.apac ...
Spark 学习笔记之 Standalone与Yarn启动和运行时间测试
Standalone与Yarn启动和运行时间测试: 写一个简单的wordcount: 打包上传运行: Standalone启动: 运行时间: Yarn启动: 运行时间: 测试结果: Standalon ...
Spring Boot 2.X(五)：MyBatis 多数据源配置
前言 MyBatis 多数据源配置,最近在项目建设中,需要在原有系统上扩展一个新的业务模块,特意将数据库分库,以便减少复杂度.本文直接以简单的代码示例,如何对 MyBatis 多数据源配置. 准备创 ...
Windows下IIS搭建Ftp服务器
第一步:启用Windows IIS Web服务器 1.1 控制面板中找到"程序"并打开 1.2 程序界面找到"启用或关闭Windows功能"并打开 1.3 上面 ...
selenium + python + firefox 测试环境的搭建与配置
对于做UI自动化,如果是纯编写一段自动化测试程序,那么后续的维护成本会较高.这种情况下,借助 selenium 这款自动化系测试工具,辅助于自己编写部分脚本,将是个不错的选择.selenium 本身支 ...
MyBatis resultType用Map 返回值中有NULL则缺少字段返回值全NULL则map为null
这个问题我大概花了2个小时才找到结果总共需要2个设置这里是对应springboot中的配置写法 @select("select sum(a) a,sum(b) b from XXX wh ...
github代码仓库提示：“We found a potential security vulnerability in one of your dependencies”
问题描述: Github上传代码后出现这样的错误: We found a potential security vulnerability in one of your dependencies. A ...
【JZOJ5248】花花的聚会
Description 注意测试数据中道路是到的单向道路,与题面恰好相反. Input Output Sample Input 7 7 1 3 1 2 6 7 3 6 3 5 3 4 7 2 3 ...

你遇到了吗？Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.fs.FileAlreadyExistsException)

你遇到了吗？Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.fs.FileAlreadyExistsException)的更多相关文章

随机推荐

热门专题