PredictionIO+Universal Recommender快速开发部署推荐引擎的问题总结(3)
PredictionIO+Universal Recommender虽然可以帮助中小企业快速的搭建部署基于用户行为协同过滤的个性化推荐引擎,单纯从引擎层面来看,开发成本近乎于零,但仍然需要一些前提条件。比如说,组织内部最好已经搭建了较稳定的Hadoop,Spark集群,至少要拥有一部分熟悉Spark平台的开发和运维人员,否则会需要技术团队花费很长时间来踩坑,试错。
本文列举了一些PredictionIO+Universal Recommender的使用过程中会遇到的Spark平台相关的异常信息,以及其解决思路和最终的解决办法,供参考。
1,执行训练时,发生java.lang.StackOverflowError错误
这个问题比较简单,查看文档,执行训练时,通过参数指定内存大小可以避免该问题,例如:
pio train -- --driver-memory 8g --executor-memory 8g --verbose
2,执行训练时,发生找不到EmptyRDD方法的错误
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.SparkContext.emptyRDD(Lscala/reflect/ClassTag;)Lorg/apache/spark/rdd/EmptyRDD;
at com.actionml.URAlgorithm.getRanksRDD(URAlgorithm.scala:)
at com.actionml.URAlgorithm.calcAll(URAlgorithm.scala:)
at com.actionml.URAlgorithm.train(URAlgorithm.scala:)
at com.actionml.URAlgorithm.train(URAlgorithm.scala:)
这个是编译和执行环境的Spark版本不一致导致的。
/** Get an RDD that has no partitions or elements. */def emptyRDD[T: ClassTag]: RDD[T] = new EmptyRDD[T](this)
[INFO] [ServerConnector] Started ServerConnector@bd93bc3{HTTP/1.1}{0.0.0.0:}
[INFO] [Server] Started @6428ms
Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:)
at org.apache.predictionio.workflow.WorkflowContext$.apply(WorkflowContext.scala:)
at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:)
at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:)
at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:)
at java.lang.reflect.Method.invoke(Method.java:)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$(SparkSubmit.scala:)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig
at java.net.URLClassLoader.findClass(URLClassLoader.java:)
at java.lang.ClassLoader.loadClass(ClassLoader.java:)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:)
at java.lang.ClassLoader.loadClass(ClassLoader.java:)
... more
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@7772d266{/jobs,null,UNAVAILABLE}
[WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request executors before the AM has registered!
[WARN] [MetricsSystem] Stopping a MetricsSystem that is not running
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException:
at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$setEnvFromInputString$.apply(YarnSparkHadoopUtil.scala:)
at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$setEnvFromInputString$.apply(YarnSparkHadoopUtil.scala:)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:)
at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUtil.scala:)
at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$.apply(Client.scala:)
at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$.apply(Client.scala:)
at scala.Option.foreach(Option.scala:)
at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:)
at org.apache.predictionio.workflow.WorkflowContext$.apply(WorkflowContext.scala:)
at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:)
at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:)
at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:)
at java.lang.reflect.Method.invoke(Method.java:)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$(SparkSubmit.scala:)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
export SPARK_YARN_USER_ENV="HADOOP_CONF_DIR=/home/hadoop/apache-hadoop/etc/hadoop"
[WARN] [TaskSetManager] Lost task 3.0 in stage 173.0 (TID , bigdata01, executor ): java.lang.Error: Multiple ES-Hadoop versions detected in the classpath; please use only one
jar:file:/home/hadoop/apache-hadoop/hadoop/var/yarn/local-dir/usercache/hadoop/appcache/application_1504083960020_0030/container_e235_1504083960020_0030_01_000005/universal-recommender-assembly-0.6.-deps.jar
jar:file:/home/hadoop/apache-hadoop/hadoop-2.7./var/yarn/local-dir/usercache/hadoop/appcache/application_1504083960020_0030/container_e235_1504083960020_0030_01_000005/universal-recommender-assembly-0.6.-deps.jar at org.elasticsearch.hadoop.util.Version.<clinit>(Version.java:)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$.apply(EsSpark.scala:)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$.apply(EsSpark.scala:)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:)
at org.apache.spark.scheduler.Task.run(Task.scala:)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:)
at java.lang.Thread.run(Thread.java:)
这不知道算不算一个BUG,总之,yarn的配置中如果使用了软连接来指定hadoop文件夹的路径,将有可能发生此问题。参考 https://interset.zendesk.com/hc/en-us/articles/230751687-PhoenixToElasticSearchJob-Fails-with-Multiple-ES-Hadoop-versions-detected-in-the-classpath-
解决方式也很简单,nodemanager修改所有采用Hadoop文件夹的软连接的配置,改为真正的路径即可。
6,Spark的JOB执行出错
[WARN] [Utils] Service 'sparkDriver' could not bind on port . Attempting port .
[ERROR] [SparkContext] Error initializing SparkContext.
Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after retries (starting from )! Consider explicitly setting the appropriate port for the service 'sparkDriver' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:)
at sun.nio.ch.Net.bind(Net.java:)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:)
at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:)
at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:)
at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:)
at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:)
at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:)
at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:)
at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:)
at io.netty.bootstrap.AbstractBootstrap$.run(AbstractBootstrap.java:)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:)
at io.netty.util.concurrent.SingleThreadEventExecutor$.run(SingleThreadEventExecutor.java:)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:)
at java.lang.Thread.run(Thread.java:)
PredictionIO+Universal Recommender快速开发部署推荐引擎的问题总结(3)的更多相关文章
- PredictionIO+Universal Recommender快速开发部署推荐引擎的问题总结(2)
1, 对Universal Recommender进行pio build成功,但是却提示No engine found Building and delpoying model [INFO] [Eng ...
- PredictionIO+Universal Recommender快速开发部署推荐引擎的问题总结(1)
1,PredictionIO如果用直接下载的0.11.0-incubating版本,存在一个HDFS配置相关的BUG 执行pio status命令时会发生如下的错误: -- ::, ERROR org ...
- SNF快速开发平台--规则引擎整体介绍及使用说明书
一.设计目标 a)规则引擎语法能够满足分单,计费,WMS策略的配置要求.语法是一致和统一的 b)能够在不修改规则引擎模块的情况下,加入任意一个新的规则:实现上述需求之外的规则配置需求 c)运算速度快 ...
- SNF快速开发平台--规则引擎在程序当中如何调用
规则定义完如何在程序当中进行使用呢? 其时很简单,只需要如下代码就可以调用程序: 规则定义: 调用代码: #region 演示2:生成左表数据(规则) POST: /api/DEMO/DemoSing ...
- SNF快速开发平台--规则引擎介绍和使用文档
设计目标: a) 规则引擎语法能够满足分单,计费,WMS策略的配置要求.语法是一致和统一的 b) 能够在不修改规则引擎模块的情况下,加入任意一个新的规则:实现上述需求之外的规则配置需求 c) 运算速度 ...
- Atitit 快速开发的推荐技术标准化 规范 大原则
Atitit 快速开发的推荐技术标准化 规范 大原则 1. 如何评估什么样的技术适合快速开发??1 1.1. (重要)判断语言层次..层次越高开发效率越高 4gl dsl> 3.5gl &g ...
- 使用ASP.NET MVC、Rabbit WeixinSDK和Azure快速开发部署微信后台
(此文章同时发表在本人微信公众号"dotNET每日精华文章",欢迎右边二维码来关注.) 题记:公众号后台系统和数据都基本准备妥当了,可以来分享下我是如何开发本微信公众号的后台系统了 ...
- 4款java快速开发平台推荐
JBoss Seam JBoss Seam,算得上是Java开源框架里面最优秀的快速开发框架之一. Seam框架非常出色,尤其是他的组件机制设计的很有匠心,真不愧是Gavin King精心打造的框架了 ...
- UWP简单示例(三):快速开发2D游戏引擎
准备 IDE:VisualStudio 2015 Language:VB.NET/C# 图形API:Win2D MSDN教程:UWP游戏开发 游戏开发涉及哪些技术? 游戏开发是一门复杂的艺术,编码方面 ...
随机推荐
- listviewMyAdapter
import android.content.Context;import android.graphics.Bitmap;import android.graphics.BitmapFactory; ...
- python随机图片验证码的生成
Python生成随机验证码,需要使用PIL模块. 安装: 1 pip3 install pillow 基本使用 1. 创建图片 1 2 3 4 5 6 7 8 9 from PIL import Im ...
- 代理模式(Proxy)
代理模式(Proxy) 其实每个模式名称就表明了该模式的作用,代理模式就是多一个代理类出来,替原对象进行一些操作,比如我们在租房子的时候回去找中介,为什么呢?因为你对该地区房屋的信息掌握的不够全面,希 ...
- SolrJ 复杂查询 高亮显示
SolrJ 复杂查询 高亮显示 上一章搭建了Solr服务器和导入了商品数据,本章通过SolrJ去学习Solr在企业中的运用.笔者最先是通过公司的云客服系统接触的Solr,几百万的留言秒秒钟就查询并高亮 ...
- PHP生成xml 无法识别或是无法读取或是浏览器不识别等问题
PHP 数组转XML函数如下 [PHP] 纯文本查看 复制代码 ? 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 ...
- js 数组去重复的方法
数组去重复是js中常用的方法,归纳了四种如下: 1. for + indexOf 去重复 var arr = [3,5,5,4,1,1,2,3,7,2,5]; var target = []; fo ...
- FreeRTOS--疑难解答
此章节涉及新手最常遇见的3种问题: 错误的中断优先级设置 栈溢出 不恰当的使用printf() 使用configASSERT()能够显著地提高生产效率,它能够捕获.识别多种类型的错误.强烈建议在开发或 ...
- C/C++筛选法算素数
什么是求素数 )i在2到n-1之间任取一个数,如果n能被整除则不是素数,否则就是素数 普通枚举法: #include <iostream> #include <string> ...
- Solr中Field常用属性
FieldType 实例:<fieldType name="text_ik" class="solr.TextField"></fieldTy ...
- 【Java入门提高篇】Day9 Java内部类——静态内部类
今天来说说Java中的最后一种内部类--静态内部类 所谓的静态内部类,自然就是用static修饰的内部类,那用static修饰过后的内部类,跟一般的内部类相比有什么特别的地方呢? 首先,它是静态的,这 ...