之前的运行结果比对发现,有1个函数的作用在2个job里面是相同的,但是对应的计算时间却差太远

  于是把4个job分开运行.虽说使用的数据不同,但是生成数据的生成器是相同的,数据排布差距不大,数据量也是相同的.

  以下是这4个job的运行时间表

.content{position:relative}
.vis.timeline .axis{position:absolute;width:100%;height:0;left:0;z-index:1}
.vis.timeline .item{position:absolute;color:#1A1A1A;border-color:#97B0F8;border-width:1px;background-color:#D5DDF6;display:inline-block;padding:5px}
.vis.timeline .item.range{border-style:solid;border-radius:2px;box-sizing:border-box}
.vis.timeline .item.stage {cursor: pointer;}
.vis.timeline .item.stage.succeeded {background-color: #A0DFFF;border-color: #3EC0FF;}
.vis.timeline .item.range .content{position:relative;display:inline-block;max-width:100%;overflow:hidden}
.vis.timeline .item.range .content {position: unset;}
.vis.timeline .timeaxis{position:relative;overflow:hidden}
.vis.timeline .timeaxis .text.measure{position:absolute;padding-left:0;padding-right:0;margin-left:0;margin-right:0;visibility:hidden}
.vis.timeline .timeaxis.foreground{top:0;left:0;width:100%}
.vis.timeline .timeaxis.background{position:absolute;top:0;left:0;width:100%;height:100%}
.vis.timeline .foreground {cursor: move;}
.vis.timeline .foreground .group{position:relative;box-sizing:border-box;border-bottom:1px solid #bfbfbf}
.vis.timeline .foreground .group:last-child{border-bottom:none}
.vis.timeline .timeaxis.background{position:absolute;top:0;left:0;width:100%;height:100%}
.vis.timeline .timeaxis .text{position:absolute;color:#4d4d4d;padding:3px;white-space:nowrap}
.vis.timeline .labelset{position:relative;overflow:hidden;box-sizing:border-box}
.vis.timeline .vispanel{position:absolute;padding:0;margin:0;box-sizing:border-box}
.vis.timeline .vispanel .shadow{position:absolute;width:100%;height:1px;box-shadow:0 0 10px rgba(0,0,0,.8)}
.vis.timeline .vispanel .shadow.top{top:-1px;left:0}
.vis.timeline .vispanel .shadow.bottom{bottom:-1px;left:0}
.vis.timeline .vispanel.bottom,.vis.timeline .vispanel.center,.vis.timeline .vispanel.left,.vis.timeline .vispanel.right,.vis.timeline .vispanel.top{border:1px #bfbfbf}
.vis.timeline .vispanel.center,.vis.timeline .vispanel.left,.vis.timeline .vispanel.right{border-top-style:solid;border-bottom-style:solid;overflow:hidden}
.vis.timeline .vispanel.bottom,.vis.timeline .vispanel.center,.vis.timeline .vispanel.top{border-left-style:solid;border-right-style:solid}
.vis.timeline .background{overflow:hidden}
.vis.timeline .labelset .vlabel{position:relative;left:0;top:0;width:100%;color:#4d4d4d;box-sizing:border-box;border-bottom:1px solid #bfbfbf}
.vis.timeline .labelset .vlabel .inner{display:inline-block;padding:5px}
.vis.timeline .labelset .vlabel:last-child{border-bottom:none}
.vis.timeline .timeaxis .grid.vertical{position:absolute;border-left:1px solid}
.vis.timeline .timeaxis .grid.minor{border-color:#e5e5e5}
#application-timeline div.legend-area,.my-job-timeline div.legend-area {margin-top: 5px;}
.vispanel.center {font-size: 12px;line-height: 12px;}
.legend-area rect.completed-stage-legend {fill: #A0DFFF;stroke: #3EC0FF;}
.legend-area rect.failed-stage-legend {fill: #FFA1B0;stroke: #FF4D6D;}
.legend-area rect.active-stage-legend {fill: #A2FCC0;stroke: #36F572;}
.legend-area rect.executor-added-legend {fill: #A0DFFF;stroke: #3EC0FF;}
.legend-area rect.executor-removed-legend {fill: #FFA1B0;stroke: #FF4D6D;}
div#application-timeline, div.my-job-timeline {margin-bottom: 30px;}
[class*="span"]{float:left;min-height:1px;margin-left:20px;}
table.sortable thead {cursor: pointer;}
table{max-width:100%;background-color:transparent;border-collapse:collapse;border-spacing:0;}
.table{width:100%;margin-bottom:20px;}
.table th,.table td{padding:8px;line-height:20px;text-align:left;vertical-align:top;border-top:1px solid #dddddd;}
.table th{font-weight:bold;}
.table caption+thead tr:first-child th,.table caption+thead tr:first-child td,.table colgroup+thead tr:first-child th,.table colgroup+thead tr:first-child td,.table thead:first-child tr:first-child th,.table thead:first-child tr:first-child td{border-top:0;}
table.sortable td {word-wrap: break-word;max-width: 600px;}
.table-striped tbody>tr:nth-child(odd)>td,.table-striped tbody>tr:nth-child(odd)>th{background-color:#f9f9f9;}
.table thead th{vertical-align:bottom;}
.table-condensed th,.table-condensed td{padding:4px 5px;}
.table-bordered{border:1px solid #dddddd;border-collapse:separate;*border-collapse:collapse;border-left:0;-webkit-border-radius:4px;-moz-border-radius:4px;border-radius:4px;}
.table-bordered th,.table-bordered td{border-left:1px solid #dddddd;}
.table-bordered caption+thead tr:first-child th,.table-bordered caption+tbody tr:first-child th,.table-bordered caption+tbody tr:first-child td,.table-bordered colgroup+thead tr:first-child th,.table-bordered colgroup+tbody tr:first-child th,.table-bordered colgroup+tbody tr:first-child td,.table-bordered thead:first-child tr:first-child th,.table-bordered tbody:first-child tr:first-child th,.table-bordered tbody:first-child tr:first-child td{border-top:0;}
.table-bordered thead:first-child tr:first-child>th:first-child,.table-bordered tbody:first-child tr:first-child>td:first-child,.table-bordered tbody:first-child tr:first-child>th:first-child{-webkit-border-top-left-radius:4px;-moz-border-radius-topleft:4px;border-top-left-radius:4px;}
.table-bordered thead:first-child tr:first-child>th:last-child,.table-bordered tbody:first-child tr:first-child>td:last-child,.table-bordered tbody:first-child tr:first-child>th:last-child{-webkit-border-top-right-radius:4px;-moz-border-radius-topright:4px;border-top-right-radius:4px;}
.table{width:100%;margin-bottom:20px;}
.progress{overflow:hidden;height:20px;margin-bottom:20px;background-color:#f7f7f7;background-image:-moz-linear-gradient(top, #f5f5f5, #f9f9f9);background-image:-webkit-gradient(linear, 0 0, 0 100%, from(#f5f5f5), to(#f9f9f9));background-image:-webkit-linear-gradient(top, #f5f5f5, #f9f9f9);background-image:-o-linear-gradient(top, #f5f5f5, #f9f9f9);background-image:linear-gradient(to bottom, #f5f5f5, #f9f9f9);background-repeat:repeat-x;filter:progid:DXImageTransform.Microsoft.gradient(startColorstr='#fff5f5f5', endColorstr='#fff9f9f9', GradientType=0);-webkit-box-shadow:inset 0 1px 2px rgba(0, 0, 0, 0.1);-moz-box-shadow:inset 0 1px 2px rgba(0, 0, 0, 0.1);box-shadow:inset 0 1px 2px rgba(0, 0, 0, 0.1);-webkit-border-radius:4px;-moz-border-radius:4px;border-radius:4px;}
.progress {margin-bottom: 0px; position: relative}
.progress-completed .bar,.progress .bar{width:0%;height:100%;color:#ffffff;float:left;font-size:12px;text-align:center;text-shadow:0 -1px 0 rgba(0, 0, 0, 0.25);background-color:#0e90d2;background-image:-moz-linear-gradient(top, #149bdf, #0480be);background-image:-webkit-gradient(linear, 0 0, 0 100%, from(#149bdf), to(#0480be));background-image:-webkit-linear-gradient(top, #149bdf, #0480be);background-image:-o-linear-gradient(top, #149bdf, #0480be);background-image:linear-gradient(to bottom, #149bdf, #0480be);background-repeat:repeat-x;filter:progid:DXImageTransform.Microsoft.gradient(startColorstr='#ff149bdf', endColorstr='#ff0480be', GradientType=0);-webkit-box-shadow:inset 0 -1px 0 rgba(0, 0, 0, 0.15);-moz-box-shadow:inset 0 -1px 0 rgba(0, 0, 0, 0.15);box-shadow:inset 0 -1px 0 rgba(0, 0, 0, 0.15);-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box;-webkit-transition:width 0.6s ease;-moz-transition:width 0.6s ease;-o-transition:width 0.6s ease;transition:width 0.6s ease;}
.progress .bar-completed {background-color: #3EC0FF;background-image: -moz-linear-gradient(top, #44CBFF, #34B0EE);background-image: -webkit-gradient(linear, 0 0, 0 100%, from(#44CBFF), to(#34B0EE));background-image: -webkit-linear-gradient(top, #44CBFF, #34B0EE);background-image: -o-linear-gradient(top, #44CBFF, #34B0EE);background-image: linear-gradient(to bottom, #64CBFF, #54B0EE);background-repeat: repeat-x;filter: progid:dximagetransform.microsoft.gradient(startColorstr='#FF44CBFF', endColorstr='#FF34B0EE', GradientType=0);}
tr.corresponding-item-hover > td, tr.corresponding-item-hover > th {background-color: #D6FFE4 !important;}
.tooltip{position:absolute;z-index:1030;display:block;visibility:visible;font-size:11px;line-height:1.4;opacity:0;filter:alpha(opacity=0);}.tooltip.in{opacity:0.8;filter:alpha(opacity=80);}
.tooltip.top{margin-top:-3px;padding:5px 0;}
.tooltip.right{margin-left:3px;padding:0 5px;}
.tooltip.bottom{margin-top:3px;padding:5px 0;}
.tooltip.left{margin-left:-3px;padding:0 5px;}
.tooltip-inner{max-width:200px;padding:8px;color:#ffffff;text-align:center;text-decoration:none;background-color:#000000;-webkit-border-radius:4px;-moz-border-radius:4px;border-radius:4px;}
.tooltip-arrow{position:absolute;width:0;height:0;border-color:transparent;border-style:solid;}
.tooltip.top .tooltip-arrow{bottom:0;left:50%;margin-left:-5px;border-width:5px 5px 0;border-top-color:#000000;}
.tooltip.right .tooltip-arrow{top:50%;left:0;margin-top:-5px;border-width:5px 5px 5px 0;border-right-color:#000000;}
.tooltip.left .tooltip-arrow{top:50%;right:0;margin-top:-5px;border-width:5px 0 5px 5px;border-left-color:#000000;}
.tooltip.bottom .tooltip-arrow{top:0;left:50%;margin-left:-5px;border-width:0 5px 5px;border-bottom-color:#000000;}
.fade{opacity:0;-webkit-transition:opacity 0.15s linear;-moz-transition:opacity 0.15s linear;-o-transition:opacity 0.15s linear;transition:opacity 0.15s linear;}
.fade.in{opacity:1;}
.tooltip{font-weight: normal;}
.vis.timeline .item .tooltip-inner {max-width: unset !important;}
.vis.timeline .item.dot {position: absolute;padding: 0;border-width: 4px;border-style: solid;border-radius: 4px;}
.vis.timeline .item.box {text-align: center;border-style: solid;border-radius: 2px;}
.vis.timeline .item.line{padding:0;position:absolute;width:0;border-left-width:1px;border-left-style:solid}
.vis.timeline .item.executor.added {background-color: #A0DFFF;border-color: #3EC0FF;}
-->

Details for pure RDD job

Event Timeline

Enable zooming

Completed Stages (7)

Stage Id Description Submitted Duration Tasks: Succeeded/Total Input Output Shuffle Read Shuffle Write
6 2019/01/30 15:58:43 94 ms
41/41

 
 
    235.4 KB  
5 2019/01/30 15:58:42 0.4 s
41/41

 
 
    382.9 KB 235.4 KB
4 2019/01/30 15:58:42 0.1 s
41/41

 
 
    99.2 KB 246.0 KB
2 2019/01/30 15:58:41 1 s
41/41

 
 
    765.8 KB 99.2 KB
1 2019/01/30 15:58:38 3 s
41/41

 
 
      750.1 KB
0 2019/01/30 15:58:38 3 s
1/1

 
 
      15.7 KB
3 2019/01/30 15:58:38 4 s
41/41

 
 
      137.0 KB

  可以看到,产品信息被转换为pairRDD要花4秒,城市信息和点击信息要花3秒.而之前的实验的运行时间却是零点几秒.说明这里可能有自动缓存,把之前的运行结果直接拿来用了

  这3个步骤是并行的,花的时间也缩小了.运行时间:5秒

Details for pure RDD job with map join

Event Timeline

Enable zooming

Completed Stages (3)

Stage Id Description Submitted Duration Tasks: Succeeded/Total Input Output Shuffle Read Shuffle Write
3 2019/01/30 16:00:23 0.2 s
41/41

 
 
    246.7 KB  
2 2019/01/30 16:00:22 0.5 s
41/41

 
 
    477.6 KB 246.8 KB
1 2019/01/30 16:00:17 5 s
41/41

 
 
      478.2 KB

  估计是map join很占内存的理由,承载城市信息和点击记录的mapToPair运行时间被延长了.运行时间:6秒

Details for original job

Event Timeline

Enable zooming

Completed Stages (7)

Stage Id Description Submitted Duration Tasks: Succeeded/Total Input Output Shuffle Read Shuffle Write
6 2019/01/30 16:04:04 0.8 s
200/200

 
 
    865.5 KB  
5 2019/01/30 16:03:58 6 s
200/200 (2 failed)

 
 
    899.9 KB 869.3 KB
3 2019/01/30 16:03:56 1 s
200/200

 
 
    224.2 KB 733.2 KB
2 2019/01/30 16:03:55 2 s
41/41

 
 
    766.0 KB 224.3 KB
4 2019/01/30 16:03:50 3 s
41/41

 
 
      159.9 KB
1 2019/01/30 16:03:49 6 s
41/41

 
 
      750.3 KB
0 2019/01/30 16:03:49 3 s
1/1

 
 
      15.7 KB

  数据量最多的点击记录mapToPair耗费时间最长,为6秒

  其他的对应操作耗时都不低于纯RDD版本对应操作,特别是collect前面2个操作,纯RDD程序不用1秒就能跑完.

  据前面的too many open files错误,可以推定SQL操作是在本地创建文件读写的,加上某些SQL语句对业务处理步骤不如RDD简洁,严重拖慢了运行时间,运行时间:16秒

Details for pure sparkSQL job

Event Timeline

Enable zooming

Completed Stages (7)

Stage Id Description Submitted Duration Tasks: Succeeded/Total Input Output Shuffle Read Shuffle Write
6 2019/01/30 16:08:23 0.8 s
200/200

 
 
    869.0 KB  
5 2019/01/30 16:08:21 2 s
200/200 (1 failed)

 
 
    894.1 KB 870.2 KB
3 2019/01/30 16:08:20 1 s
200/200

 
 
    224.2 KB 733.4 KB
2 2019/01/30 16:08:18 1 s
200/200

 
 
    405.2 KB 224.6 KB
4 2019/01/30 16:08:01 4 s
41/41

 
 
      159.9 KB
1 2019/01/30 16:08:01 17 s
1/1

 
 
      4.0 KB
0 2019/01/30 16:08:01 6 s
41/41 (1 failed)

 
 
      401.8 KB

  本身sparkSQL就很慢,前面2步操作被SQL化之后更慢了...运行时间:22秒

Spark大型电商项目实战-及其改良(4) 单独运行程序发现的问题的更多相关文章

  1. Spark大型电商项目实战-及其改良之番外(1)-将spark前端页面效果高效拷贝至博客

    Spark大型电商项目实战-及其改良这个系列的时间轴展示图一直在变....1-3篇是用图直接表示时间轴,用一段简陋的html代码表示时间表.第4篇开始才是用比较完整的前端效果,能移动.缩放时间轴,鼠标 ...

  2. Spark大型电商项目实战-及其改良(3) 分析sparkSQL语句的性能影响

    之前的运行数据被清除了,只能再运行一次,对比一下sparkSQL语句的影响 纯SQL的时间 对应时间表 th:first-child,.table-bordered tbody:first-child ...

  3. Spark大型电商项目实战-及其改良(1) 比对sparkSQL和纯RDD实现的结果

    代码存在码云:https://coding.net/u/funcfans/p/sparkProject/git 代码主要学习https://blog.csdn.net/u012318074/artic ...

  4. Spark大型电商项目实战-及其改良(2) RDD优化效果不稳定的真正原因

    首先看没有map join的第2任务: 时间线如下 接着是对应id的算子计算时间表 Stage Id Description Submitted Duration Tasks: Succeeded/T ...

  5. 16套java架构师,高并发,高可用,高性能,集群,大型分布式电商项目实战视频教程

    16套Java架构师,集群,高可用,高可扩展,高性能,高并发,性能优化,设计模式,数据结构,虚拟机,微服务架构,日志分析,工作流,Jvm,Dubbo ,Spring boot,Spring cloud ...

  6. Java 18套JAVA企业级大型项目实战分布式架构高并发高可用微服务电商项目实战架构

    Java 开发环境:idea https://www.jianshu.com/p/7a824fea1ce7 从无到有构建大型电商微服务架构三个阶段SpringBoot+SpringCloud+Solr ...

  7. SpringBoot电商项目实战 — ElasticSearch接入实现

    如今在一些中大型网站中,搜索引擎已是必不可少的内容了.首先我们看看搜索引擎到底是什么呢?搜索引擎,就是根据用户需求与一定算法,运用特定策略从互联网检索出制定信息反馈给用户的一门检索技术.搜索引擎依托于 ...

  8. SpringBoot电商项目实战 — 前后端分离后的优雅部署及Nginx部署实现

    在如今的SpringBoot微服务项目中,前后端分离已成为业界标准使用方式,通过使用nginx等代理方式有效的进行解耦,并且前后端分离会为以后的大型分布式架构.弹性计算架构.微服务架构.多端化服务(多 ...

  9. C# 大型电商项目性能优化(一)

    经过几个月的忙碌,我厂最近的电商平台项目终于上线,期间遇到的问题以及解决方案,也可以拿来和大家多做交流了. 我厂的项目大多采用C#.net,使用逐渐发展并流行起来的EF(Entity Framewor ...

随机推荐

  1. ubuntu16.04安装mrpt

    源码地址 https://github.com/MRPT/mrpt 安装教程 https://github.com/MRPT/mrpt/blob/master/README.md#32-build-f ...

  2. yum配置163源

    CentOS7 配置163 yum源 1)下载repo文件 wget http://mirrors.163.com/.help/CentOS7-Base-163.repo 2)备份并替换系统的repo ...

  3. ubuntu下安装和配置pycharm和pyqt5

    参考网址:https://blog.csdn.net/qq_37541097/article/details/80021315 PyQt是Python语言的GUI编程解决方案之一.可以用来代替Pyth ...

  4. Python 学习笔记2 变量

    Python变量的一些命名规则和指南 每种编程语言都需要变量, 这些变量的命名,我们一般会遵守一些公认的规则. 已达到方便自己,他人阅读的好处. 变量只能包含字母.数字和下划线.变量可以以字母和下划线 ...

  5. AppStore关键词覆盖法则标记

    https://www.jianshu.com/p/6b39b0dc6ba4 多批设置关键字

  6. 长连接锁服务优化实践 C10K问题 nodejs的内部构造 limits.conf文件修改 sysctl.conf文件修改

    小结: 1. 当文件句柄数目超过 10 之后,epoll 性能将优于 select 和 poll:当文件句柄数目达到 10K 的时候,epoll 已经超过 select 和 poll 两个数量级. 2 ...

  7. VUE 安装&创建一个项目

    1,安装node.js vue依赖nodejs,所以首先要安装node.js 然后打开cmd,输入命令, node -v.正常出现版本号,说明你已经安装成功了 下载地址:http://nodejs.c ...

  8. Tomcat出现端口占用错误

    Several ports (8005, 8080, 8009) required by Tomcat v8.5 Server at localhost are already in use. The ...

  9. 提取一个txt 文档中含指定字符串的所有行

    将一个txt 文档中含指定字符串内容的所有行提取出来并保存至新的txt文档中 例如,要提取 1.txt 中所有包含”aaa” 的行的内容 只需在此文件夹中新建一个bat文件,输入以下代码,双击运行,便 ...

  10. 做一个有产品思维的研发:Scrapy安装

    每天10分钟,解决一个研发问题. 如果你想了解我在做什么,请看<做一个有产品思维的研发:课程大纲>传送门:https://www.cnblogs.com/hunttown/p/104909 ...