很多内容之前的博客已经提过,这里不再赘述,详细内容参照本系列前面的博客:https://www.cnblogs.com/ratels/p/10970905.html

创建并修改配置文件conf/spark.conf

cp conf/spark.conf.template conf/spark.conf

参考:https://github.com/Intel-bigdata/HiBench/blob/master/docs/run-sparkbench.md,设置属性为下列值

   # Spark home
   hibench.spark.home      /opt/cloudera/parcels/CDH--.cdh5./lib/spark

   # Spark master
   #   standalone mode: spark://xxx:7077
   #   YARN mode: yarn-client
   hibench.spark.master    yarn-client

执行脚本

 bin/workloads/micro/wordcount/prepare/prepare.sh

返回信息

[root@node1 prepare]# ./prepare.sh
patching args=
Parsing conf: /home/cf/app/HiBench-master/conf/hadoop.conf
Parsing conf: /home/cf/app/HiBench-master/conf/hibench.conf
Parsing conf: /home/cf/app/HiBench-master/conf/spark.conf
Parsing conf: /home/cf/app/HiBench-master/conf/workloads/micro/wordcount.conf
probe -.cdh5./lib/hadoop/../../jars/hadoop-mapreduce-client-jobclient--cdh5.14.2-tests.jar
start HadoopPrepareWordcount bench
hdfs -.cdh5./bin/hadoop --config /etc/hadoop/conf.cloudera.yarn fs -rm -r -skipTrash hdfs://node1:8020/HiBench/Wordcount/Input
Deleted hdfs://node1:8020/HiBench/Wordcount/Input
Submit MapReduce Job: /opt/cloudera/parcels/CDH--.cdh5./bin/hadoop --config /etc/hadoop/conf.cloudera.yarn jar /opt/cloudera/parcels/CDH--.cdh5./lib/hadoop/../../jars/hadoop-mapreduce-examples--cdh5. -D mapreduce.randomtextwriter.bytespermap= -D mapreduce.job.maps= -D mapreduce.job.reduces= hdfs://node1:8020/HiBench/Wordcount/Input
The job took  seconds.
finish HadoopPrepareWordcount bench

执行脚本

bin/workloads/micro/wordcount/spark/run.sh

返回信息

[root@node1 spark]# ./run.sh
patching args=
Parsing conf: /home/cf/app/HiBench-master/conf/hadoop.conf
Parsing conf: /home/cf/app/HiBench-master/conf/hibench.conf
Parsing conf: /home/cf/app/HiBench-master/conf/spark.conf
Parsing conf: /home/cf/app/HiBench-master/conf/workloads/micro/wordcount.conf
probe -.cdh5./lib/hadoop/../../jars/hadoop-mapreduce-client-jobclient--cdh5.14.2-tests.jar
start ScalaSparkWordcount bench
hdfs -.cdh5./bin/hadoop --config /etc/hadoop/conf.cloudera.yarn fs -rm -r -skipTrash hdfs://node1:8020/HiBench/Wordcount/Output
Deleted hdfs://node1:8020/HiBench/Wordcount/Output
hdfs -.cdh5./bin/hadoop --config /etc/hadoop/conf.cloudera.yarn fs -du -s hdfs://node1:8020/HiBench/Wordcount/Input
Export env: SPARKBENCH_PROPERTIES_FILES=/home/cf/app/HiBench-master/report/wordcount/spark/conf/sparkbench/sparkbench.conf
Export env: HADOOP_CONF_DIR=/etc/hadoop/conf.cloudera.yarn
Submit Spark job: /opt/cloudera/parcels/CDH--.cdh5./lib/spark/bin/spark-submit  --properties- --executor-cores  --executor-memory 4g /home/cf/app/HiBench-master/sparkbench/assembly/target/sparkbench-assembly-7.1-SNAPSHOT-dist.jar hdfs://node1:8020/HiBench/Wordcount/Input hdfs://node1:8020/HiBench/Wordcount/Output
// :: INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
finish ScalaSparkWordcount bench

查看report/hibench.report

Type         Date       Time     Input_data_size      Duration(s)          Throughput(bytes/s)  Throughput/node
HadoopWordcount -- ::
ScalaSparkWordcount -- ::                                                   

\report\wordcount\spark下有多个文件:monitor.log是原始日志,bench.log是scheduler.DAGScheduler和scheduler.TaskSetManager信息,monitor.html可视化了系统的性能信息,\conf\wordcount.conf、\conf\sparkbench\spark.conf和\conf\sparkbench\sparkbench.conf是本次任务的环境变量

monitor.html中包含了Memory usage heatmap等统计图:

根据官方文档 https://github.com/Intel-bigdata/HiBench/blob/master/docs/run-sparkbench.md ,还可以修改 hibench.scale.profile 调整测试的数据规模,修改 hibench.default.map.parallelism 和 hibench.default.shuffle.parallelism 调整并行化,修改hibench.yarn.executor.num、hibench.yarn.executor.cores、spark.executor.memory和spark.driver.memory控制Spark executor 的数量、核数、内存和driver的内存。

HiBench成长笔记——(3) HiBench测试Spark的更多相关文章

  1. HiBench成长笔记——(1) HiBench概述

    测试分类 HiBench共计19个测试方向,可大致分为6个测试类别:分别是micro,ml(机器学习),sql,graph,websearch和streaming. 2.1 micro Benchma ...

  2. HiBench成长笔记——(4) HiBench测试Spark SQL

    很多内容之前的博客已经提过,这里不再赘述,详细内容参照本系列前面的博客:https://www.cnblogs.com/ratels/p/10970905.html 和 https://www.cnb ...

  3. HiBench成长笔记——(6) HiBench测试结果分析

    Scan Join Aggregation Scan Join Aggregation Scan Join Aggregation Scan Join Aggregation Scan Join Ag ...

  4. HiBench成长笔记——(5) HiBench-Spark-SQL-Scan源码分析

    run.sh #!/bin/bash # Licensed to the Apache Software Foundation (ASF) under one or more # contributo ...

  5. HiBench成长笔记——(2) CentOS部署安装HiBench

    安装Scala 使用spark-shell命令进入shell模式,查看spark版本和Scala版本: 下载Scala2.10.5 wget https://downloads.lightbend.c ...

  6. HiBench成长笔记——(7) 阅读《The HiBench Benchmark Suite: Characterization of the MapReduce-Based Data Analysis》

    <The HiBench Benchmark Suite: Characterization of the MapReduce-Based Data Analysis>内容精选 We th ...

  7. HiBench成长笔记——(10) 分析源码execute_with_log.py

    #!/usr/bin/env python2 # Licensed to the Apache Software Foundation (ASF) under one or more # contri ...

  8. HiBench成长笔记——(9) 分析源码monitor.py

    monitor.py 是主监控程序,将监控数据写入日志,并统计监控数据生成HTML统计展示页面: #!/usr/bin/env python2 # Licensed to the Apache Sof ...

  9. HiBench成长笔记——(8) 分析源码workload_functions.sh

    workload_functions.sh 是测试程序的入口,粘连了监控程序 monitor.py 和 主运行程序: #!/bin/bash # Licensed to the Apache Soft ...

随机推荐

  1. 解决PLSQL 查询后显示中文为问号(???)问题

    我的问题已解决,在装oracle的服务器上配置了下面的两个环境变量后,重启服务器,重新录入中文,在查询即可正确显示中文. 原因: 本机(装oracle的服务器)没有配置数据库字符集环境变量,或是与数据 ...

  2. flask exception

    flask exception 1.1.    abort 概念:flask中的异常处理语句,功能类似于python中raise语句,只要触发abort,后面的代码不会执行,abort只能抛出符合ht ...

  3. windows下创建/删除服务

    windows下创建/删除服务 1.      windows下创建/删除服务 1.1.    创建服务 命令格式: sc [servername] create Servicename [Optio ...

  4. 十三 Spring的通知类型,切入表达式写法

    Spring中通知类型: 前置通知:目标方法执行之前进行操作,可以获得切入点信息 后置通知: 目标方法执行之后进行操作,可以获得方法的返回值 环绕通知:在目标方法执行之前和之后进行操作,可以终止目标方 ...

  5. LeetCode 83. Remove Duplicates from Sorted List(从有序链表中删除重复节点)

    题意:从有序链表中删除重复节点. /** * Definition for singly-linked list. * struct ListNode { * int val; * ListNode ...

  6. Locale

    1. Locale 概述 2. Windows 区域设置 3 Linux Locale 3.1 Linux Locale 语言环境名称格式 3.2 常用区域描述(简写)日期习惯 3.3 日期显示格式 ...

  7. DRF项目之序列化器和视图重写方法的区别

    我们,都知道,DRF框架是一款高度封装的框架. 我们可以通过重写一些方法来实现自定义的功能. 今天,就来说说在视图中重写和序列化器中重写方法的区别. 在视图中重写方法: 接收请求,处理数据(业务逻辑) ...

  8. 输出简单图形(StringBuilder代替双重循环)

    在有些题目中打印简单图形必须使用StringBuilder或者StringBuffer,否则会运行超时(用String都会超时). 因为在题目的要求中说到输入的n是小于1000的,用双重循环就会超时, ...

  9. 使用 JvisualVM 监控 spark executor

    使用 JvisualVM,需要先配置 java 的启动参数 jmx 正常情况下,如下配置 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmx ...

  10. Go 开发者平均年薪 46 万?爬数据展示国内 Go 的市场行情到底如何

    随着云原生时代的到来,拥有高并发性.语法易学等特点的 Golang 地位逐渐凸显,在云原生编程中占据了主导地位.在近期出炉的 TIOBE 10 月编程语言排行榜中,Golang 从前一个月的 16 位 ...