HiBench成长笔记——(7) 阅读《The HiBench Benchmark Suite: Characterization of the MapReduce-Based Data Analysis》
《The HiBench Benchmark Suite: Characterization of the MapReduce-Based Data Analysis》内容精选
We then evaluate and characterize the Hadoop framework using HiBench, in terms of speed (i.e., job running time), throughput (i.e., the number of tasks completed per minute), HDFS bandwidth, system resource (e.g., CPU, memory and I/O) utilizations, and data access patterns.
关键内容:speed、 throughput、HDFS bandwidth、 system resource、data access patterns
the last one is an enhanced version of the DFSIO benchmark that we have extended to evaluate the aggregated bandwidth delivered by HDFS.
关键内容:evaluate the aggregated bandwidth delivered by HDFS

As shown in Fig. 1, the aggregated throughput curve has a warm-up period and a cool-down period where map tasks are launching up and shutting down respectively. Between these two periods, there is a steady period where the aggregated throughput values are stable across different time slots. Therefore, the Enhanced DFSIO workload computes the aggregated HDFS throughput by averaging the throughput value of each time slot in the steady period. In Enhanced DFSIO, when the number of concurrent map tasks at a time slot is above a specified percentage (e.g., 50% is used in our benchmarking) of the total map task slots in the Hadoop cluster, the slot is considered to be in the steady period.
关键内容:warm-up period、cool-down period、steady period、computes the aggregated HDFS throughput by averaging the throughput value of each time slot in the steady period

In essence, the TeraSort workload is similar to Sort and therefore is I/O bound in nature. However, we have compressed its shuffle data (i.e., map output) in the experiment so as to minimize the disk and network I/O during shuffle, as shown in Table III. Consequently, TeraSort have very high CPU utilization and moderate disk I/O during the map stage and shuffle phases, and moderate CPU utilization and heavy disk I/O during the reduce phases, as shown in Fig. 4.
关键内容:map stage、shuffle phases、reduce phases、high、moderate、CPU utilization、disk I/O
The best performance (total running time) of Hadoop workloads is usually obtained by accurately estimating the size of the map output, shuffle data and reduce input data, and properly allocating memory buffers to prevent multiple spilling (to disk) of those data.
关键内容:estimating the size of the map output、 shuffle data and reduce input data、allocating memory buffers
HiBench成长笔记——(7) 阅读《The HiBench Benchmark Suite: Characterization of the MapReduce-Based Data Analysis》的更多相关文章
- HiBench成长笔记——(2) CentOS部署安装HiBench
安装Scala 使用spark-shell命令进入shell模式,查看spark版本和Scala版本: 下载Scala2.10.5 wget https://downloads.lightbend.c ...
- HiBench成长笔记——(3) HiBench测试Spark
很多内容之前的博客已经提过,这里不再赘述,详细内容参照本系列前面的博客:https://www.cnblogs.com/ratels/p/10970905.html 创建并修改配置文件conf/spa ...
- HiBench成长笔记——(1) HiBench概述
测试分类 HiBench共计19个测试方向,可大致分为6个测试类别:分别是micro,ml(机器学习),sql,graph,websearch和streaming. 2.1 micro Benchma ...
- HiBench成长笔记——(5) HiBench-Spark-SQL-Scan源码分析
run.sh #!/bin/bash # Licensed to the Apache Software Foundation (ASF) under one or more # contributo ...
- HiBench成长笔记——(4) HiBench测试Spark SQL
很多内容之前的博客已经提过,这里不再赘述,详细内容参照本系列前面的博客:https://www.cnblogs.com/ratels/p/10970905.html 和 https://www.cnb ...
- HiBench成长笔记——(11) 分析源码run.sh
#!/bin/bash # Licensed to the Apache Software Foundation (ASF) under one or more # contributor licen ...
- HiBench成长笔记——(10) 分析源码execute_with_log.py
#!/usr/bin/env python2 # Licensed to the Apache Software Foundation (ASF) under one or more # contri ...
- HiBench成长笔记——(9) 分析源码monitor.py
monitor.py 是主监控程序,将监控数据写入日志,并统计监控数据生成HTML统计展示页面: #!/usr/bin/env python2 # Licensed to the Apache Sof ...
- HiBench成长笔记——(8) 分析源码workload_functions.sh
workload_functions.sh 是测试程序的入口,粘连了监控程序 monitor.py 和 主运行程序: #!/bin/bash # Licensed to the Apache Soft ...
随机推荐
- phpsduty安装SSL证书,apache不能启动,解决方案
最近给客户开发微信小程序,因为本人也不太懂服务器的安装,(大神勿喷),顾个人一直使用的集成环境,原来一直是客户提供主机什么的,都是我们和客户说一下需要什么环境啊,配置啊,之类的,这次首次自己动手. 废 ...
- 如何确认 fastboot unlock 解锁成功,如何确认DM-verity 已关闭
如何确认 fastboot unlock 解锁成功 1.fastboot 模式下按音量上键后是否提示 Unlock Pass...return to fastboot in 3s 2.重启后界面是否显 ...
- connection String加密
aspnet_regiis -pe "connectionStrings" -app "/HG" -prov "ChrisProvider" ...
- Java8新特性——Optional
前言 在开发中,我们常常需要对一个引用进行判空以防止空指针异常的出现.Java8引入了Optional类,为的就是优雅地处理判空等问题.现在也有很多类库在使用Optional封装返回值,比如Sprin ...
- 【转载】script命令使用
二.script命令简介当你在终端或控制台上工作时,你可能想记录下自己做了些什么.这种记录可以看成是保存了终端痕迹的文档.假设你跟一些Linux管理员同时在系统上干活.或者说你让别人远程到你的服务器. ...
- 吴裕雄 Bootstrap 前端框架开发——Bootstrap 显示代码
<!DOCTYPE html> <html> <head> <meta charset="utf-8"> <title> ...
- STM32 RTC上的唤醒和闹钟
RTC很简单只要给备用电,RTC就会不停,可以进行设置和读时间.同时在RTC上也涉及了闹钟(EXTI_17:RTC_FLAG_ALRAF,相当于RTC的定时器,闹钟到了之后进行异步操作)和唤醒中断(低 ...
- JS - 处理浏览器兼容之 event
function test(e){ var event = e || windows.event // IE : windows.event ,非IE : e }
- vs code 批量替换
源内容 .icon-user, .icon-people, .icon-user-female, .icon-user-follow, .icon-user-following, .icon-user ...
- 单例设计模式和main方法
设计模式就是在大量的实践中总结和理论之后优选的代码结构.编程风格.以及解决问题的思考方式. 说白了设计模式就是在实际编程中逐渐总结出的解决问题的套路,类似于数学公式. 类的单例设计模式:在开发过程中有 ...