MongoSpark 28799错误

Exception in thread "main" com.mongodb.MongoCommandException: Command failed with error : 'Received error in response from 192.168.12.161:27018: { $err: "$sample stage could not find a non-duplicate document after 100 while using a random cursor. This is likely a sporadic failure, please try again.", code: 28799 }' on server 192.168.12.161:. The full response is { "ok" : 0.0, "errmsg" : "Received error in response from 192.168.12.161:27018: { $err: \"$sample stage could not find a non-duplicate document after 100 while using a random cursor. This is likely a sporadic failure, please try again.\", code: 28799 }", "code" : , "codeName" : "Location28799" }

    at com.mongodb.connection.ProtocolHelper.getCommandFailureException(ProtocolHelper.java:)

    at com.mongodb.connection.CommandProtocol.execute(CommandProtocol.java:)

    at com.mongodb.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:)

    at com.mongodb.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:)

    at com.mongodb.connection.DefaultServerConnection.command(DefaultServerConnection.java:)

    at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:)

    at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:)

    at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:)

    at com.mongodb.operation.AggregateOperation$.call(AggregateOperation.java:)

    at com.mongodb.operation.AggregateOperation$.call(AggregateOperation.java:)

    at com.mongodb.operation.OperationHelper.withConnectionSource(OperationHelper.java:)

    at com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:)

    at com.mongodb.operation.AggregateOperation.execute(AggregateOperation.java:)

    at com.mongodb.operation.AggregateOperation.execute(AggregateOperation.java:)

    at com.mongodb.Mongo.execute(Mongo.java:)

    at com.mongodb.Mongo$.execute(Mongo.java:)

    at com.mongodb.OperationIterable.iterator(OperationIterable.java:)

    at com.mongodb.OperationIterable.forEach(OperationIterable.java:)

    at com.mongodb.OperationIterable.into(OperationIterable.java:)

    at com.mongodb.AggregateIterableImpl.into(AggregateIterableImpl.java:)

    at com.mongodb.spark.rdd.partitioner.MongoSamplePartitioner$$anonfun$.apply(MongoSamplePartitioner.scala:)

    at com.mongodb.spark.rdd.partitioner.MongoSamplePartitioner$$anonfun$.apply(MongoSamplePartitioner.scala:)

    at com.mongodb.spark.MongoConnector$$anonfun$withCollectionDo$.apply(MongoConnector.scala:)

    at com.mongodb.spark.MongoConnector$$anonfun$withCollectionDo$.apply(MongoConnector.scala:)

    at com.mongodb.spark.MongoConnector$$anonfun$withDatabaseDo$.apply(MongoConnector.scala:)

    at com.mongodb.spark.MongoConnector$$anonfun$withDatabaseDo$.apply(MongoConnector.scala:)

    at com.mongodb.spark.MongoConnector.withMongoClientDo(MongoConnector.scala:)

    at com.mongodb.spark.MongoConnector.withDatabaseDo(MongoConnector.scala:)

    at com.mongodb.spark.MongoConnector.withCollectionDo(MongoConnector.scala:)

    at com.mongodb.spark.rdd.partitioner.MongoSamplePartitioner.partitions(MongoSamplePartitioner.scala:)

    at com.mongodb.spark.rdd.partitioner.DefaultMongoPartitioner.partitions(DefaultMongoPartitioner.scala:)

    at com.mongodb.spark.rdd.MongoRDD.getPartitions(MongoRDD.scala:)

    at org.apache.spark.rdd.RDD$$anonfun$partitions$.apply(RDD.scala:)

    at org.apache.spark.rdd.RDD$$anonfun$partitions$.apply(RDD.scala:)

    at scala.Option.getOrElse(Option.scala:)

    at org.apache.spark.rdd.RDD.partitions(RDD.scala:)

    at org.apache.spark.SparkContext.runJob(SparkContext.scala:)

    at org.apache.spark.rdd.RDD.count(RDD.scala:)

    at org.jh.TestSpark$.doTest(DocHandler.scala:)

    at org.jh.TestSpark$.main(DocHandler.scala:)

    at org.jh.TestSpark.main(DocHandler.scala)

错误如上，解决方式如下，根据connector源码（并没有完全看懂），分析出现这个问题的原因是因为：

if (numDocumentsPerPartition >= count) {

          MongoSinglePartitioner.partitions(connector, readConfig, pipeline)

        } else {

          val samples = connector.withCollectionDo(readConfig, {

            coll: MongoCollection[BsonDocument] =>

              coll.aggregate(List(

                Aggregates.`match`(matchQuery),

                Aggregates.sample(numberOfSamples),

                Aggregates.project(Projections.include(partitionKey)),

                Aggregates.sort(Sorts.ascending(partitionKey))

              ).asJava).allowDiskUse(true).into(new util.ArrayList[BsonDocument]()).asScala

          })

          def collectSplit(i: Int): Boolean = (i % samplesPerPartition == 0) || !matchQuery.isEmpty && i == count - 1

          val rightHandBoundaries = samples.zipWithIndex.collect {

            case (field, i) if collectSplit(i) => field.get(partitionKey)

          }

          PartitionerHelper.createPartitions(partitionKey, rightHandBoundaries, PartitionerHelper.locations(connector))

        }

　　numDocumentsPerPartition < count，导致执行了else代码出现的，else先进行sample，然后：

val numDocumentsPerPartition: Int = math.floor(partitionSizeInBytes.toFloat / avgObjSizeInBytes).toInt
val numberOfSamples = math.floor(samplesPerPartition * count / numDocumentsPerPartition.toFloat).toInt

　　为了避免出错，所以要降低numberOfSamples，那么就需要降低samplesPerPartition，增加numDocumentsPerPartition，samplesPerPartition通过调低spark.mongodb.input.partitionerOptions.samplesPerPartition实现，增加numDocumentsPerPartition通过调大spark.mongodb.input.partitionerOptions.partitionSizeMB实现。并且调大spark.mongodb.input.partitionerOptions.partitionSizeMB会提高numDocumentsPerPartition的数值，可以避免进入else下面的代码块。

　　所以解决方案如下：

SparkSession.builder()

//			.master("local")

			.master(sparkURI)

			.config(new SparkConf().setJars(Array(s"${hdfsURI}/mongolib/mongo-spark-connector_2.11-2.2.1.jar",

					s"${hdfsURI}/mongolib/bson-3.4.2.jar",

					s"${hdfsURI}/mongolib/mongo-java-driver-3.4.2.jar",

					s"${hdfsURI}/mongolib/mongodb-driver-3.4.2.jar",

					s"${hdfsURI}/mongolib/mongodb-driver-core-3.4.2.jar",

					s"${hdfsURI}/mongolib/commons-io-2.5.jar",

					s"${hdfsURI}/mongolib/config-1.2.1.jar",

					s"${hdfsURI}/${jarName}") ++ extJars))

			.config("spark.cores.max", 80)

			.config("spark.executor.cores", 16)

			.config("spark.executor.memory", "32g")

			.config("spark.mongodb.input.uri", inp)

			.config("spark.mongodb.output.uri", oup)

			.config("spark.mongodb.input.partitionerOptions.samplesPerPartition", 1)

			.config("spark.mongodb.input.partitionerOptions.partitionSizeMB", 128)

			.getOrCreate()

MongoSpark 28799错误的更多相关文章

航空概论（历年资料，引之百度文库，PS：未调格式，有点乱）
航空航天尔雅选择题1. 已经实现了<天方夜谭>中的飞毯设想.—— A——美国2. 地球到月球大约—— C 38 万公里3. 建立了航空史上第一条定期空中路线—— B——德国4. 对于孔明 ...
MongoDB With Spark遇到的2个错误，不能初始化和sample重复的key
1.$sample stage could not find a non-duplicate document while using a random cursor 这个问题比较难解决,因为我用mo ...
日期格式代码出现两次的错误 ORA-01810
错误的原因是使用了两次MM . 一.Oracle中使用to_date()时格式化日期需要注意格式码如:select to_date('2005-01-01 11:11:21','yyyy-MM-dd ...
ASP.NET Core应用的错误处理[3]：ExceptionHandlerMiddleware中间件如何呈现“定制化错误页面”
DeveloperExceptionPageMiddleware中间件利用呈现出来的错误页面实现抛出异常和当前请求的详细信息以辅助开发人员更好地进行纠错诊断工作,而ExceptionHandlerMi ...
ASP.NET Core应用的错误处理[2]：DeveloperExceptionPageMiddleware中间件如何呈现“开发者异常页面”
在<ASP.NET Core应用的错误处理[1]:三种呈现错误页面的方式>中,我们通过几个简单的实例演示了如何呈现一个错误页面,这些错误页面的呈现分别由三个对应的中间件来完成,接下来我们将 ...
实时的.NET程序错误监控产品Exceptionless
Exceptionless可以对ASP.NET, Web API, WebForms, WPF, Console, 和 MVC 应用提供错误监控.上传.报表服务.使用时需要在Exceptionless ...
一个粗心的Bug，JSON格式不规范导致AJAX错误
一.事件回放今天工作时碰到了一个奇怪的问题,这个问题很早很早以前也碰到过,不过没想到过这么久了竟然又栽在这里. 当时正在联调一个项目,由于后端没有提供数据接口,于是我直接本地建立了一个 json ...
SQL Server 致程序员（容易忽略的错误）
标签:SQL SERVER/MSSQL/DBA/T-SQL好习惯/数据库/需要注意的地方/程序员/容易犯的错误/遇到的问题概述因为每天需要审核程序员发布的SQL语句,所以收集了一些程序员的一些常见 ...
C# - 值类型、引用类型&走出误区，容易错误的说法
1. 值类型与引用类型小总结 1)对于引用类型的表达式(如一个变量),它的值是一个引用,而非对象. 2)引用就像URL,是允许你访问真实信息的一小片数据. 3)对于值类型的表达式,它的值是实际的数据. ...

随机推荐

使用SMART监控Ubuntu
参考:完全用 GNU/Linux 工作 - 29. 檢測硬碟 S.M.A.R.T. 健康狀態 1.安装 sudo apt-get install smartmontools 2.查看硬盘的参数,需要获 ...
dedecms调用子栏目及文章列表
使用DEDECMS程序建网站时,有些栏目下面有子栏目,我们需要在网站前台调用出子栏目以及子栏目下的文章列表. dedecms调用子栏目及文章列表可以使用以下的代码进行调用: <div class ...
远程连接服务器影像文件进行服务发布以及问题解决【the data item is inaccessible】
场景模拟: 本机安装有arcgis desktop以及arcgis server10.1,server的站点账号为arcgis. 需要发布影像服务并进行切片,使用的影像数据存放在远程服务器上,影像较大 ...
mysql按位的索引判断值是否为1
DELIMITER $$ DROP FUNCTION IF EXISTS `value_of_bit_index_is_true`$$/*计算某个数字的某些索引的位的值是否都为1,索引类似1,2,3, ...
[原][openstack-pike][controller node][issue-2][glance] Could not parse rfc1738 URL from string 'mysql+pymysql=http://glance:glance@controller/glance'
问题点在手动上传镜像的时候:出现错误 Could not parse rfc1738 URL from string 'mysql+pymysql=http://glance:glance@cont ...
python程序如何脱离ide而在操作系统上执行
IDE就像一个婴儿的摇篮,当程序开发好了之后,打包成一个在OS运行的软件,这是算法落地的重要一步.如果只能在IDE上运行,那这个软件有什么意义呢?接下来我就得想办法,把我的程序迁移到win操作系统上执 ...
php代码画足球场
用代码画了个足球场原图: 代码画出的效果图: 代码如下: // 创建一个 200X200 的图像 $img = imagecreate(800, 500); // 分配颜色 $bg = imagec ...
[LeetCode] Majority Element 求大多数
Given an array of size n, find the majority element. The majority element is the element that appear ...
poj 1228
就是给你一堆点,看这些点能否构成一个稳定的凸包. 凸包每条边上有3个及以上的点就可以了. #include <cstdio> #include <cstring> #incl ...
ubuntu apt 软件源的更改
在ubuntu上面安装软件一般都使用 apt安装在ubuntu下面有一个源列表,源列表里面都是一些网站信息,每条网址就是一个源,这个地址指向的数据标识着这台服务器上有哪些软件可以用编辑源命令: s ...

MongoSpark 28799错误

MongoSpark 28799错误的更多相关文章

随机推荐

热门专题