Run Test Case on Spark
今天有哥们问到怎样对Spark进行单元測试。如今将Sbt的測试方法写出来,例如以下:
对Spark的test case进行測试的时候能够用sbt的test命令:
一、測试所有test case
sbt/sbt test
二、測试单个test case
sbt/sbt "test-only *DriverSuite*"
以下举个样例:
这个Test Case是位于$SPARK_HOME/core/src/test/scala/org/apache/spark/DriverSuite.scala
FunSuit是scalatest里面的測试Suit。要继承它。
这里主要是一个回归測试,測试Spark程序正常结束后,Driver会不会正常退出。
注:我就拿这个样例模拟一下,測试成功和測试失败的情景,这个样例和DriverSuite的測试目的全然不一致。仅仅是演示作用。 :)
以下是正常执行退出的样例:
package org.apache.spark import java.io.File import org.apache.log4j.Logger
import org.apache.log4j.Level import org.scalatest.FunSuite
import org.scalatest.concurrent.Timeouts
import org.scalatest.prop.TableDrivenPropertyChecks._
import org.scalatest.time.SpanSugar._ import org.apache.spark.util.Utils import scala.language.postfixOps class DriverSuite extends FunSuite with Timeouts { test("driver should exit after finishing") {
val sparkHome = sys.env.get("SPARK_HOME").orElse(sys.props.get("spark.home")).get
// Regression test for SPARK-530: "Spark driver process doesn't exit after finishing"
val masters = Table(("master"), ("local"), ("local-cluster[2,1,512]"))
forAll(masters) { (master: String) =>
failAfter(60 seconds) {
Utils.executeAndGetOutput(
Seq("./bin/spark-class", "org.apache.spark.DriverWithoutCleanup", master),
new File(sparkHome),
Map("SPARK_TESTING" -> "1", "SPARK_HOME" -> sparkHome))
}
}
}
} /**
* Program that creates a Spark driver but doesn't call SparkContext.stop() or
* Sys.exit() after finishing.
*/
object DriverWithoutCleanup {
def main(args: Array[String]) {
Logger.getRootLogger().setLevel(Level.WARN)
val sc = new SparkContext(args(0), "DriverWithoutCleanup")
sc.parallelize(1 to 100, 4).count()
}
}
executeAndGetOutput方法接受一个command命令,调用spark-class来执行DriverWithoutCleanup类。
/**
* Execute a command and get its output, throwing an exception if it yields a code other than 0.
*/
def executeAndGetOutput(command: Seq[String], workingDir: File = new File("."),
extraEnvironment: Map[String, String] = Map.empty): String = {
val builder = new ProcessBuilder(command: _*)
.directory(workingDir)
val environment = builder.environment()
for ((key, value) <- extraEnvironment) {
environment.put(key, value)
}
val process = builder.start() //启动一个进程来执行spark job
new Thread("read stderr for " + command(0)) {
override def run() {
for (line <- Source.fromInputStream(process.getErrorStream).getLines) {
System.err.println(line)
}
}
}.start()
val output = new StringBuffer
val stdoutThread = new Thread("read stdout for " + command(0)) { //读取spark job的输出
override def run() {
for (line <- Source.fromInputStream(process.getInputStream).getLines) {
output.append(line)
}
}
}
stdoutThread.start()
val exitCode = process.waitFor()
stdoutThread.join() // Wait for it to finish reading output
if (exitCode != 0) {
throw new SparkException("Process " + command + " exited with code " + exitCode)
}
output.toString //返回spark job的输出
}
执行第二个命令能够看到执行结果:
sbt/sbt "test-only *DriverSuite*"
执行结果:
[info] Compiling 1 Scala source to /app/hadoop/spark-1.0.1/core/target/scala-2.10/test-classes...
[info] DriverSuite: //执行DriverSuit这个TestSuit
Spark assembly has been built with Hive, including Datanucleus jars on classpath
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hadoop/spark-1.0.1/lib_managed/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/spark-1.0.1/assembly/target/scala-2.10/spark-assembly-1.0.1-hadoop0.20.2-cdh3u5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/08/14 18:20:15 WARN spark.SparkConf:
SPARK_CLASSPATH was detected (set to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*').
This is deprecated in Spark 1.0+. Please instead use:
- ./spark-submit with --driver-class-path to augment the driver classpath
- spark.executor.extraClassPath to augment the executor classpath 14/08/14 18:20:15 WARN spark.SparkConf: Setting 'spark.executor.extraClassPath' to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*' as a work-around.
14/08/14 18:20:15 WARN spark.SparkConf: Setting 'spark.driver.extraClassPath' to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*' as a work-around.
Spark assembly has been built with Hive, including Datanucleus jars on classpath
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hadoop/spark-1.0.1/lib_managed/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/spark-1.0.1/assembly/target/scala-2.10/spark-assembly-1.0.1-hadoop0.20.2-cdh3u5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/08/14 18:20:19 WARN spark.SparkConf:
SPARK_CLASSPATH was detected (set to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*').
This is deprecated in Spark 1.0+. Please instead use:
- ./spark-submit with --driver-class-path to augment the driver classpath
- spark.executor.extraClassPath to augment the executor classpath 14/08/14 18:20:19 WARN spark.SparkConf: Setting 'spark.executor.extraClassPath' to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*' as a work-around.
14/08/14 18:20:19 WARN spark.SparkConf: Setting 'spark.driver.extraClassPath' to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*' as a work-around.
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Spark assembly has been built with Hive, including Datanucleus jars on classpath
[info] - driver should exit after finishing
[info] ScalaTest
[info] Run completed in 12 seconds, 586 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 1, Failed 0, Errors 0, Passed 1
[success] Total time: 76 s, completed Aug 14, 2014 6:20:26 PM
測试通过, Total 1, Failed 0, Errors 0。 Passed 1。
这里假设我们略微将test case 改改。让spark job抛异常,那么这个,这样test case 就会failed掉。例如以下:
object DriverWithoutCleanup {
def main(args: Array[String]) {
Logger.getRootLogger().setLevel(Level.WARN)
val sc = new SparkContext(args(0), "DriverWithoutCleanup")
sc.parallelize(1 to 100, 4).count()
throw new RuntimeException("OopsOutOfMemory, haha, not real OOM, don't worry!") //加入此行
}
那么。再次执行測试:
会发现错误
[info] DriverSuite:
Spark assembly has been built with Hive, including Datanucleus jars on classpath
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hadoop/spark-1.0.1/lib_managed/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/spark-1.0.1/assembly/target/scala-2.10/spark-assembly-1.0.1-hadoop0.20.2-cdh3u5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/08/14 18:40:07 WARN spark.SparkConf:
SPARK_CLASSPATH was detected (set to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*').
This is deprecated in Spark 1.0+. Please instead use:
- ./spark-submit with --driver-class-path to augment the driver classpath
- spark.executor.extraClassPath to augment the executor classpath 14/08/14 18:40:07 WARN spark.SparkConf: Setting 'spark.executor.extraClassPath' to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*' as a work-around.
14/08/14 18:40:07 WARN spark.SparkConf: Setting 'spark.driver.extraClassPath' to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*' as a work-around.
Exception in thread "main" java.lang.RuntimeException: OopsOutOfMemory, haha, not real OOM, don't worry! //自己定义抛异常使spark job执行失败,打印出了异常堆栈,測试用例失败
at org.apache.spark.DriverWithoutCleanup$.main(DriverSuite.scala:60)
at org.apache.spark.DriverWithoutCleanup.main(DriverSuite.scala)
[info] - driver should exit after finishing *** FAILED ***
[info] SparkException was thrown during property evaluation. (DriverSuite.scala:40)
[info] Message: Process List(./bin/spark-class, org.apache.spark.DriverWithoutCleanup, local) exited with code 1
[info] Occurred at table row 0 (zero based, not counting headings), which had values (
[info] master = local
[info] )
[info] ScalaTest
[info] Run completed in 4 seconds, 765 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 0, failed 1, canceled 0, ignored 0, pending 0
[info] *** 1 TEST FAILED ***
[error] Failed: Total 1, Failed 1, Errors 0, Passed 0
[error] Failed tests:
[error] org.apache.spark.DriverSuite
[error] (core/test:testOnly) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 14 s, completed Aug 14, 2014 6:40:10 PM
能够看到TEST FAILED。
三、 总结:
假设想做contributor,这一关必须过了。
——EOF——
原创文章,转载请注明,出自http://blog.csdn.net/oopsoom/article/details/38555173
Run Test Case on Spark的更多相关文章
- Spark术语
1.resilient distributed dataset (RDD) The core programming abstraction in Spark, consisting of a fau ...
- Spark技术内幕: Shuffle详解(三)
前两篇文章写了Shuffle Read的一些实现细节.但是要想彻底理清楚这里边的实现逻辑,还是需要更多篇幅的:本篇开始,将按照Job的执行顺序,来讲解Shuffle.即,结果数据(ShuffleMap ...
- APACHE SPARK 2.0 API IMPROVEMENTS: RDD, DATAFRAME, DATASET AND SQL
What’s New, What’s Changed and How to get Started. Are you ready for Apache Spark 2.0? If you are ju ...
- How Cigna Tuned Its Spark Streaming App for Real-time Processing with Apache Kafka
Explore the configuration changes that Cigna’s Big Data Analytics team has made to optimize the perf ...
- Building Lambda Architecture with Spark Streaming
The versatility of Apache Spark’s API for both batch/ETL and streaming workloads brings the promise ...
- 在Java Web中使用Spark MLlib训练的模型
PMML是一种通用的配置文件,只要遵循标准的配置文件,就可以在Spark中训练机器学习模型,然后再web接口端去使用.目前应用最广的就是基于Jpmml来加载模型在javaweb中应用,这样就可以实现跨 ...
- Spark记录-官网学习配置篇(一)
参考http://spark.apache.org/docs/latest/configuration.html Spark提供三个位置来配置系统: Spark属性控制大多数应用程序参数,可以使用Sp ...
- Spark Structured Stream 2
❤Limitations of DStream API Batch Time Constraint application级别的设置. 不支持EventTime event time 比process ...
- spark属性
应用属性 属性名 缺省值 意义 spark.app.name (none) The name of your application. This will appear in the UI and i ...
随机推荐
- JAVA设计模式-设计原则
6大原则: 单一职责原则 里氏替换原则 依赖倒置原则 接口隔离原则 迪米特法则 开闭原则 一.单一职责原则 定义:应该有且仅有一个原因引起类的变更 带来的好处: 类的复杂性降低,实现什么职责有清晰明确 ...
- YAML 语言教程
编程免不了要写配置文件,怎么写配置也是一门学问. YAML 是专门用来写配置文件的语言,非常简洁和强大,远比 JSON 格式方便. 本文介绍 YAML 的语法,以 JS-YAML 的实现为例.你可以去 ...
- JAVA中对List<map<String,Object>>根据map某个key值进行排序
方法compareTo()比较此对象与指定对象的顺序.如果该对象小于.等于或大于指定对象,则分别返回负整数.零或正整数.返回整数,1,-1,0:返回1表示大于,返回-1表示小于,返回0表示相等. 普通 ...
- 防止Memcached的DDOS攻击另外一个思路
3月3日,国家互联网应急中心通报了一条消息 关于利用memcached服务器实施反射DDoS攻击的情况通报 通告了 memcached 服务器漏洞被黑客利用的情况,笔者的一台服务器也存在漏洞,因此将漏 ...
- 使用pm2管理node.js应用
中文文档:https://pm2.io/doc/zh/runtime/quick-start/ pm2是从nodejs衍生出来的服务器进程管理工具,可以做到开机就启动nodejs.当然了,有些运维同学 ...
- [转载][HASS.IO] 【HASSOS安装】成功安装HASSOS 1.9(避开了大部分坑版)
7月20日HA官方放出HASSOS说明时,我开始入坑HASSOS,经历了安装没流量.打开主页:8123没显示.HASS.IO边栏不显示.安装不了HASS.IO插件等问题之后,在8月6日总算避开了大坑进 ...
- linux服务器开发三(网络编程)
网络基础 协议的概念 什么是协议 从应用的角度出发,协议可理解为"规则",是数据传输和数据的解释的规则. 假设,A.B双方欲传输文件.规定: 第一次,传输文件名,接收方接收到文件名 ...
- Windows Server 2008 R2 小技巧 (转)
一些 Windows Server 2008 R2 的小技巧,包括启用「God Mode (上帝模式)」.添加「快速启动」工具栏.启用桌面「个性化」服务.停用「密碼複雜性」要求,对老程序员熟悉新版的 ...
- VS2010整合NUnit进行单元测试
1.下载安装NUnit(最新win版本为NUnit-2.6.0.12051.msi) http://www.nunit.org/index.php?p=download 2.下载并安装VS的Visua ...
- WPF 动态改变窗口大小
1.删除 Width 和 Height 属性:2.将 Windows.SizeToContent 属性设置为 WidthAndHeight这时窗口就能自动调整自身大小,从而容纳所包含的内容. 通过将 ...