Hadoop基准测试（转载）

《hadoop the definitive way》(third version)中的Benchmarking a Hadoop Cluster Test Cases的class在新的版本中已不再试hadoop-*-test.jar, 新版本中做BanchMark Test应采用如下方法：

1. TestDFSIO

write

TestDFSIO用来测试HDFS的I/O 性能，用一个MapReduce job来并行读取/写入文件，每个文件在一个独立的map task里被读取或写入，而map的输出用来收集该文件被执行过程中的统计数据，

test1 写入２个文件，每个10MB

%yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -write -nrFiles

 -fileSize

提交job时的consol输出：

13/11/13 01:59:06 INFO fs.TestDFSIO: TestDFSIO.1.7

13/11/13 01:59:06 INFO fs.TestDFSIO: nrFiles = 2

13/11/13 01:59:06 INFO fs.TestDFSIO: nrBytes (MB) = 10.0

13/11/13 01:59:06 INFO fs.TestDFSIO: bufferSize = 1000000

13/11/13 01:59:06 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO

13/11/13 01:59:15 INFO fs.TestDFSIO: creating control file: 10485760 bytes, 2 files

13/11/13 01:59:26 INFO fs.TestDFSIO: created control files for: 2 files

13/11/13 01:59:27 INFO client.RMProxy: Connecting to ResourceManager at cluster1/172.16.102.201:8032

13/11/13 01:59:27 INFO client.RMProxy: Connecting to ResourceManager at cluster1/172.16.102.201:8032

13/11/13 01:59:56 INFO mapred.FileInputFormat: Total input paths to process : 2

13/11/13 02:00:21 INFO mapreduce.JobSubmitter: number of splits:2

13/11/13 02:00:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1384321503481_0003

13/11/13 02:00:34 INFO impl.YarnClientImpl: Submitted application application_1384321503481_0003 to ResourceManager at cluster1/172.16.102.201:8032

13/11/13 02:00:36 INFO mapreduce.Job: The url to track the job: http://cluster1:8888/proxy/application_1384321503481_0003/

13/11/13 02:00:36 INFO mapreduce.Job: Running job: job_1384321503481_0003

从consol输出可以看到:

(1)最终文件默认会被写入id_data文件夹下的/benchmarks/TestDFSIO文件夹下，通过test.build.data的系统变量可以修改默认设置。

(2)2个map task (number of splits:2), 同时也证明每一个文件的写入或读取都被单独作为一个map task

job跑完后的console输出：

13/11/13 02:08:15 INFO mapreduce.Job:  map 100% reduce 100%

13/11/13 02:08:17 INFO mapreduce.Job: Job job_1384321503481_0003 completed successfully

13/11/13 02:08:21 INFO mapreduce.Job: Counters: 43

    File System Counters

        FILE: Number of bytes read=174

        FILE: Number of bytes written=240262

        FILE: Number of read operations=0

        FILE: Number of large read operations=0

        FILE: Number of write operations=0

        HDFS: Number of bytes read=468

        HDFS: Number of bytes written=20971595

        HDFS: Number of read operations=11

        HDFS: Number of large read operations=0

        HDFS: Number of write operations=4

    Job Counters

        Launched map tasks=2

        Launched reduce tasks=1

        Data-local map tasks=2

        Total time spent by all maps in occupied slots (ms)=63095

        Total time spent by all reduces in occupied slots (ms)=14813

    Map-Reduce Framework

        Map input records=2

        Map output records=10

        Map output bytes=148

        Map output materialized bytes=180

        Input split bytes=244

        Combine input records=0

        Combine output records=0

        Reduce input groups=5

        Reduce shuffle bytes=180

        Reduce input records=10

        Reduce output records=5

        Spilled Records=20

        Shuffled Maps =2

        Failed Shuffles=0

        Merged Map outputs=2

        GC time elapsed (ms)=495

        CPU time spent (ms)=3640

        Physical memory (bytes) snapshot=562757632

        Virtual memory (bytes) snapshot=2523807744

        Total committed heap usage (bytes)=421330944

    Shuffle Errors

        BAD_ID=0

        CONNECTION=0

        IO_ERROR=0

        WRONG_LENGTH=0

        WRONG_MAP=0

        WRONG_REDUCE=0

    File Input Format Counters

        Bytes Read=224

    File Output Format Counters

        Bytes Written=75

13/11/13 02:08:23 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write

13/11/13 02:08:23 INFO fs.TestDFSIO:            Date & time: Wed Nov 13 02:08:22 PST 2013

13/11/13 02:08:23 INFO fs.TestDFSIO:        Number of files: 2

13/11/13 02:08:23 INFO fs.TestDFSIO: Total MBytes processed: 20.0

13/11/13 02:08:23 INFO fs.TestDFSIO:      Throughput mb/sec: 0.5591277606933184

13/11/13 02:08:23 INFO fs.TestDFSIO: Average IO rate mb/sec: 0.5635650753974915

13/11/13 02:08:23 INFO fs.TestDFSIO:  IO rate std deviation: 0.05000733272172887

13/11/13 02:08:23 INFO fs.TestDFSIO:     Test exec time sec: 534.566

13/11/13 02:08:23 INFO fs.TestDFSIO:

从图中可以看到map task 2, reduce task 1, 统计结果中有平均I/O速率，整体速率， job运行时间，写入文件数;

read

%yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -read  -nrFiles 2 -fileSize

就不仔细分析了，自己试试。

2. MapReduce Test with Sort

hadoop提供了一个MapReduce 程序，可以测试整个MapReduce System。此基准测试分三步：

# 产生random data

# sort data

# validate results

步骤如下：

1. 产生random data

yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar randomwriter random-data

用RandomWriter产生random data, 在yarn上运行RandomWriter会启动一个MapReduce job, 每个node上默认启动10个map task, 每个map 会产生1GB的random data.

修改默认参数： test.randomwriter.maps_per_host, test.randomwrite.bytes_per_map

2. sort data

yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar sort random-data sorted-data

3.validate results

yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar testmapredsort –sortInput randomdata –sortOutput sorted-data

the command 会启动一个SortValidator 程序，此程序会做一些列检查例如检查unsorted和sorted data是否精确。

3. 其他Tests

MRBench –invoked by mrbench, 此程序会启动一个程序，运行多次

NNBench – invoked by nnbench, namenode上的负载测试

Gridmix --没兴趣

Hadoop基准测试（转载）的更多相关文章

Hadoop基准测试（二）
Hadoop Examples 除了<Hadoop基准测试(一)>提到的测试,Hadoop还自带了一些例子,比如WordCount和TeraSort,这些例子在hadoop-example ...
Hadoop 基准测试与example
#pi值示例 hadoop jar /app/cdh23502/share/hadoop/mapreduce2/hadoop-mapreduce-examples--cdh5. #生成数据第一个参数 ...
Hadoop基准测试
其实就是从网络上copy的吧,在这里做一下记录这个是看一下有哪些测试方式: hadoop jar /opt/cloudera/parcels/CDH-5.3.6-1.cdh5.3.6.p0.11/ ...
《Hadoop基础教程》之初识Hadoop（转载）
转载自博主:上善若水任方圆http://blessht.iteye.com/blog/2095675 Hadoop一直是我想学习的技术,正巧最近项目组要做电子商城,我就开始研究Hadoop,虽然最后鉴 ...
Hadoop基准测试（一）
测试对于验证系统的正确性.分析系统的性能来说非常重要,但往往容易被我们所忽视.为了能对系统有更全面的了解.能找到系统的瓶颈所在.能对系统性能做更好的改进,打算先从测试入手,学习Hadoop主要的测试手 ...
Hadoop学习笔记四
一.fsimage,edits和datanode的block在本地文件系统中位置的配置 fsimage:hdfs-site.xml中的dfs.namenode.name.dir 值例如file:// ...
几个有关Hadoop自带的性能测试工具的应用
http://www.talkwithtrend.com/Question/177983-1247453 一些测试的描述如下内容最为详细,供你参考: 测试对于验证系统的正确性.分析系统的性能来说非常重 ...
Hadoop理论基础
Hadoop是 Apache 旗下的一个用 java 语言实现开源软件框架,是一个开发和运行处理大规模数据的软件平台.允许使用简单的编程模型在大量计算机集群上对大型数据集进行分布式处理. 特性:扩 ...
2. 安装 Kerberos
2.1. 环境配置安装kerberos前,要确保主机名可以被解析. 主机名内网IP 角色 Vmw201 172.16.18.201 Master KDC Vmw202 172.16.18.202 ...

随机推荐

tmux 命令
创建并指定session名字tmux new -s $session_name 删除sessionCtrl+b :kill-session 临时退出sessionCtrl+b d 列出sessiont ...
[非技术参考]C#枚举类型
(一)首先讲一个不熟悉的数据类型:byte byte 关键字代表一种整型,该类型按下表所示存储值: 类型范围大小 .NET Framework 类型 byte 0 到 255 无符号 8 位整数 ...
jedis入门一
一.下载Jedis的依赖包jedis-2.1.0.jar,然后将其添加到classpath下面. 1. 定义连接:Redis暂时不要设置登录密码 Jedis jedis = new Jedis(&qu ...
android:visibility
RelativeLayout android:visibility="gone/visible/invisible" 此属性意思是此视图是否显示例如RelativeLayout中 ...
Docker镜像与仓库(一)
Docker镜像与仓库(一) Docker镜像与仓库(一) 如何查找镜像? Docker Hub https://registry.hub.docker.com docker search [OPTI ...
Android studio mac版本快捷键
Mac下快捷键的符号所对应的按键 ⌥—> option|alt ⇧—>shift ⌃—>control ⌘—>command ⎋—>esc 注: 与F6/F7/F12等F ...
Struts2部分标签
由于Struts多用OGNL语言所以使用给类标签之前需引入<%@taglib prefix="s" uri="/struts-tags"%> 1.f ...
JQuerry 权威指南的都市笔记
jquery 如今发展成集javascript.css.DOM .Ajax于一体的强大框架体系.他的主旨是以更少的代码,实现更多的功能(write less,do more) jquery 的进本功 ...
编写一个程序实现strcat函数的功能
写自己的strcat函数------→mycat #include <stdio.h> #include <string.h> #define N 5 char *mycat( ...
ASP.NET MVC3使用Unity2.0实现依赖注入（转载和扩展）
http://note.youdao.com/share/?id=53252d0f897e0e109aadd296a1682354&type=note

Hadoop基准测试（转载）

Hadoop基准测试（转载）的更多相关文章

随机推荐

热门专题