《The HiBench Benchmark Suite: Characterization of the MapReduce-Based Data Analysis》内容精选

We then evaluate and characterize the Hadoop framework using HiBench, in terms of speed (i.e., job running time), throughput (i.e., the number of tasks completed per minute), HDFS bandwidth, system resource (e.g., CPU, memory and I/O) utilizations, and data access patterns.

关键内容:speed、 throughput、HDFS bandwidth、 system resource、data access patterns

the last one is an enhanced version of the DFSIO benchmark that we have extended to evaluate the aggregated bandwidth delivered by HDFS.

关键内容:evaluate the aggregated bandwidth delivered by HDFS

As shown in Fig. 1, the aggregated throughput curve has a warm-up period and a cool-down period where map tasks are launching up and shutting down respectively. Between these two periods, there is a steady period where the aggregated throughput values are stable across different time slots. Therefore, the Enhanced DFSIO workload computes the aggregated HDFS throughput by averaging the throughput value of each time slot in the steady period. In Enhanced DFSIO, when the number of concurrent map tasks at a time slot is above a specified percentage (e.g., 50% is used in our benchmarking) of the total map task slots in the Hadoop cluster, the slot is considered to be in the steady period.

关键内容:warm-up period、cool-down period、steady period、computes the aggregated HDFS throughput by averaging the throughput value of each time slot in the steady period

In essence, the TeraSort workload is similar to Sort and therefore is I/O bound in nature. However, we have compressed its shuffle data (i.e., map output) in the experiment so as to minimize the disk and network I/O during shuffle, as shown in Table III. Consequently, TeraSort have very high CPU utilization and moderate disk I/O during the map stage and shuffle phases, and moderate CPU utilization and heavy disk I/O during the reduce phases, as shown in Fig. 4.

关键内容:map stage、shuffle phases、reduce phases、high、moderate、CPU utilization、disk I/O

The best performance (total running time) of Hadoop workloads is usually obtained by accurately estimating the size of the map output, shuffle data and reduce input data, and properly allocating memory buffers to prevent multiple spilling (to disk) of those data.

关键内容:estimating the size of the map output、 shuffle data and reduce input data、allocating memory buffers

HiBench成长笔记——(7) 阅读《The HiBench Benchmark Suite: Characterization of the MapReduce-Based Data Analysis》的更多相关文章

  1. HiBench成长笔记——(2) CentOS部署安装HiBench

    安装Scala 使用spark-shell命令进入shell模式,查看spark版本和Scala版本: 下载Scala2.10.5 wget https://downloads.lightbend.c ...

  2. HiBench成长笔记——(3) HiBench测试Spark

    很多内容之前的博客已经提过,这里不再赘述,详细内容参照本系列前面的博客:https://www.cnblogs.com/ratels/p/10970905.html 创建并修改配置文件conf/spa ...

  3. HiBench成长笔记——(1) HiBench概述

    测试分类 HiBench共计19个测试方向,可大致分为6个测试类别:分别是micro,ml(机器学习),sql,graph,websearch和streaming. 2.1 micro Benchma ...

  4. HiBench成长笔记——(5) HiBench-Spark-SQL-Scan源码分析

    run.sh #!/bin/bash # Licensed to the Apache Software Foundation (ASF) under one or more # contributo ...

  5. HiBench成长笔记——(4) HiBench测试Spark SQL

    很多内容之前的博客已经提过,这里不再赘述,详细内容参照本系列前面的博客:https://www.cnblogs.com/ratels/p/10970905.html 和 https://www.cnb ...

  6. HiBench成长笔记——(11) 分析源码run.sh

    #!/bin/bash # Licensed to the Apache Software Foundation (ASF) under one or more # contributor licen ...

  7. HiBench成长笔记——(10) 分析源码execute_with_log.py

    #!/usr/bin/env python2 # Licensed to the Apache Software Foundation (ASF) under one or more # contri ...

  8. HiBench成长笔记——(9) 分析源码monitor.py

    monitor.py 是主监控程序,将监控数据写入日志,并统计监控数据生成HTML统计展示页面: #!/usr/bin/env python2 # Licensed to the Apache Sof ...

  9. HiBench成长笔记——(8) 分析源码workload_functions.sh

    workload_functions.sh 是测试程序的入口,粘连了监控程序 monitor.py 和 主运行程序: #!/bin/bash # Licensed to the Apache Soft ...

随机推荐

  1. ES Search API

    Search API 搜索请求 SearchRequest用于与搜索文档.聚合.suggestions相关的任何操作,还提供了在结果文档上请求高亮的方法. 在最基本的表单中,我们可以向请求添加查询: ...

  2. springBoot+MybatisPlus数据库字段使用驼峰命名法时报错

    假如有个实体类: package com.jeff.entity; public class User { /** * 主键id */ private Integer id; /** * 登陆名 */ ...

  3. 记一次RocketMQ源码导入IDEA过程

    首先,下载源码,可以官网下载source包,也可以从GitHub上直接拉下来导入IDEA.如果是官网下载的source zip包,直接作为当前project的module导入,这里不赘述太多,只强调一 ...

  4. 二十二 XML校验器

    Struts2提供的校验器及其规则:

  5. Spring学习(一)

    搭建环境 1.创建普通的Java工程 2.添加相应的jar包,下载链接:https://files.cnblogs.com/files/AmyZheng/lib.rar,此外,为了打印信息,我们还需要 ...

  6. Android开发:界面设计之六大layouts介绍

    1.帧布局 FrameLayout: FrameLayout是最简单的布局对象.在它里面的的所有显示对象都将固定在屏幕的左上角,不能指定位置,后一个会直接覆盖在前一个之上显示 因为上面的一段话这个是在 ...

  7. CSP2019 Emiya 家今天的饭

    Description: 有 \(n\) 中烹饪方法和 \(m\) 种食材,要求: 至少做一种菜 所有菜的烹饪方法各不相同 同种食材的菜的数量不能超过总菜数的一半 求做菜的方案数. Solution1 ...

  8. 自定义Toast排队重复显示问题:

    原文 http://blog.csdn.net/baiyuliang2013/article/details/38655495Toast是安卓系统中,用户误操作时或某功能执行完毕时,对用户的一种提示, ...

  9. spring mvc绑定参数之 类型转换 有三种方式:

    spring mvc绑定参数之类型转换有三种方式: 1.实体类中加日期格式化注解(上次做项目使用的这种.简单,但有缺点,是一种局部的处理方式,只能在本实体类中使用.方法三是全局的.) @DateTim ...

  10. CSS - icon图标(icon font)

    1. 概念 这个小红点是图标,图标在CSS中实际上是字体. 2. 为什么出现本质是字体的图标? 2.1 图片增加了总文件的大小. 2.2 图片增加了额外的http请求,大大降低网页的性能. 2.3 图 ...