《The HiBench Benchmark Suite: Characterization of the MapReduce-Based Data Analysis》内容精选

We then evaluate and characterize the Hadoop framework using HiBench, in terms of speed (i.e., job running time), throughput (i.e., the number of tasks completed per minute), HDFS bandwidth, system resource (e.g., CPU, memory and I/O) utilizations, and data access patterns.

关键内容:speed、 throughput、HDFS bandwidth、 system resource、data access patterns

the last one is an enhanced version of the DFSIO benchmark that we have extended to evaluate the aggregated bandwidth delivered by HDFS.

关键内容:evaluate the aggregated bandwidth delivered by HDFS

As shown in Fig. 1, the aggregated throughput curve has a warm-up period and a cool-down period where map tasks are launching up and shutting down respectively. Between these two periods, there is a steady period where the aggregated throughput values are stable across different time slots. Therefore, the Enhanced DFSIO workload computes the aggregated HDFS throughput by averaging the throughput value of each time slot in the steady period. In Enhanced DFSIO, when the number of concurrent map tasks at a time slot is above a specified percentage (e.g., 50% is used in our benchmarking) of the total map task slots in the Hadoop cluster, the slot is considered to be in the steady period.

关键内容:warm-up period、cool-down period、steady period、computes the aggregated HDFS throughput by averaging the throughput value of each time slot in the steady period

In essence, the TeraSort workload is similar to Sort and therefore is I/O bound in nature. However, we have compressed its shuffle data (i.e., map output) in the experiment so as to minimize the disk and network I/O during shuffle, as shown in Table III. Consequently, TeraSort have very high CPU utilization and moderate disk I/O during the map stage and shuffle phases, and moderate CPU utilization and heavy disk I/O during the reduce phases, as shown in Fig. 4.

关键内容:map stage、shuffle phases、reduce phases、high、moderate、CPU utilization、disk I/O

The best performance (total running time) of Hadoop workloads is usually obtained by accurately estimating the size of the map output, shuffle data and reduce input data, and properly allocating memory buffers to prevent multiple spilling (to disk) of those data.

关键内容:estimating the size of the map output、 shuffle data and reduce input data、allocating memory buffers

HiBench成长笔记——(7) 阅读《The HiBench Benchmark Suite: Characterization of the MapReduce-Based Data Analysis》的更多相关文章

  1. HiBench成长笔记——(2) CentOS部署安装HiBench

    安装Scala 使用spark-shell命令进入shell模式,查看spark版本和Scala版本: 下载Scala2.10.5 wget https://downloads.lightbend.c ...

  2. HiBench成长笔记——(3) HiBench测试Spark

    很多内容之前的博客已经提过,这里不再赘述,详细内容参照本系列前面的博客:https://www.cnblogs.com/ratels/p/10970905.html 创建并修改配置文件conf/spa ...

  3. HiBench成长笔记——(1) HiBench概述

    测试分类 HiBench共计19个测试方向,可大致分为6个测试类别:分别是micro,ml(机器学习),sql,graph,websearch和streaming. 2.1 micro Benchma ...

  4. HiBench成长笔记——(5) HiBench-Spark-SQL-Scan源码分析

    run.sh #!/bin/bash # Licensed to the Apache Software Foundation (ASF) under one or more # contributo ...

  5. HiBench成长笔记——(4) HiBench测试Spark SQL

    很多内容之前的博客已经提过,这里不再赘述,详细内容参照本系列前面的博客:https://www.cnblogs.com/ratels/p/10970905.html 和 https://www.cnb ...

  6. HiBench成长笔记——(11) 分析源码run.sh

    #!/bin/bash # Licensed to the Apache Software Foundation (ASF) under one or more # contributor licen ...

  7. HiBench成长笔记——(10) 分析源码execute_with_log.py

    #!/usr/bin/env python2 # Licensed to the Apache Software Foundation (ASF) under one or more # contri ...

  8. HiBench成长笔记——(9) 分析源码monitor.py

    monitor.py 是主监控程序,将监控数据写入日志,并统计监控数据生成HTML统计展示页面: #!/usr/bin/env python2 # Licensed to the Apache Sof ...

  9. HiBench成长笔记——(8) 分析源码workload_functions.sh

    workload_functions.sh 是测试程序的入口,粘连了监控程序 monitor.py 和 主运行程序: #!/bin/bash # Licensed to the Apache Soft ...

随机推荐

  1. Sqlserver 日志文件收缩命令

    SELECT NAME,recovery_model_desc FROM sys.databases -- 如果是FULL类型,修改为SIMPLE类型 ALTER DATABASE DBName SE ...

  2. stack的使用-Hdu 1062

    Text Reverse Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 65536/32768 K (Java/Others)Tota ...

  3. Mysql基本用法-left join、right join、 inner join、子查询和join-02

    left join #左连接又叫外连接 left join 返回左表中所有记录和右表中连接字段相等的记录  test_user表 phpcvs表 SQL: select * from test_use ...

  4. 到头来还是逃不开Java - Java13面向对象基础

    面向对象基础 没有特殊说明,我的所有学习笔记都是从廖老师那里摘抄过来的,侵删 引言 兜兜转转到了大四,学过了C,C++,C#,Java,Python,学一门丢一门,到了最后还是要把Java捡起来.所以 ...

  5. updataxml报错注入

    // take the variables//接受变量 // //也就是插入post提交的uname和passwd,参见:https://www.w3school.com.cn/sql/sql_ins ...

  6. 5-3 使用antDesign的form组件

    import { Form, Icon, Input, Button, Checkbox } from 'antd'; class NormalLoginForm extends React.Comp ...

  7. 吴裕雄 Bootstrap 前端框架开发——Bootstrap 网格系统实例:偏移列

    <!DOCTYPE html> <html> <head> <meta charset="utf-8"> <title> ...

  8. 02-03Android学习进度报告三

    今天主要学习了线性布局和相对布局的概念和区别,以及线性布局和相对布局的优缺点. 经过搜素发现,我们屏幕适配的使用用的比较多的就是LinearLayout的权重属性weight,我 学习了一些 Line ...

  9. 2月送书福利:ASP.NET Core开发实战

    大家都知道我有一个公众号“恰童鞋骚年”,在公众号2020年第一天发布的推文<2020年,请让我重新介绍我自己>中,我曾说到我会在2020年中每个月为所有关注“恰童鞋骚年”公众号的童鞋们送一 ...

  10. Python 基础之python运算符

    一.运算符 1.算数运算符 + - * / // % ** var1 = 5var2 = 8 #(1)  + 加res = var1 + var2print(res) # (2)  -  减res = ...