Basic Information

  • Authors: Jooyong Yi, Shin Hwei Tan, Sergey Mechtaev, Marcel Böhme, Abhik Roychoudhury
  • Publication: EMSE'17
  • Conclusion: In general, with the increase of traditional test suite metrics, the reliability of repairs tend to increase. In particular, such a trend is most strongly observed in statement coverage. Their results imply that the traditional test suite metrics proposed for software testing can also be used for automated program repair to improve the reliability of repairs.

Interesting Points

Correlation between Mutation Testing and Automated Program Repair:

To some extent, automated program repair and mutation testing are very similar. It can be viewed that automated program repair “mutates" the original program, this time in an attempt to find a repair. As in mutation testing, mutants that fail to pass all tests in the provided test-suite are considered buggy (hence, incorrect repairs). This conceptual similarity between mutation testing and automated program repair suggests the plausibility of using the mutation score to measure the quality of a test-suite not only for mutation testing but also for automated program repair. Just as a higher mutation score is associated with a better fault-detection ability in mutation testing, it appears plausible to associate a higher mutation score with a better ability to guide a reliable repair.

There is not only similarity but also duality between mutation testing and automated program repair. As pointed out by Weimer et al (2013), “our confidence in mutant testing increases with the set of non-redundant mutants considered, but our confidence in the quality of a program repair gains increases with the set of non-redundant tests." Note that mutation score measures the non-redundancy of killed mutants, not the non-redundancy of tests capable of killing mutants. We introduce a new metric called capable-tests ratio in the next section that measures the non-redundancy of capable tests.

Measure quality of Test-suite quality and Repair

This paper mainly explore the correlation between quality of automated program repair (APR) and test-suite.

To evaluate quality of APR, traditional metrics (i.e., 1) statement coverage, 2) branch coverage, 3) test-suite size, 4) mutation score) and capable-tests ratio are used.

RQs and Results

RQ1: : Is there a negative correlation between the metrics of a testsuite and the regression ratio of automatically generated repairs? In other words, are generated repairs less likely to cause regressions, as test-suite metrics increase?

As the traditional test-suite metrics (statement coverage, branch coverage, test-suite size, and mutation score) increase, the regression rate of automatically generated repairs generally decreases, showing the promise of using the traditional test-suite metrics to control the regression ratio of automatically generated repairs. Capable-tests ratio does not seem as useful as the traditional metrics in controlling the quality of generated repairs.

RQ2: Which test-suite metric is most strongly correlated with the regression ratio of automatically generated repairs?

In our experiments, statement coverage is, on average, more strongly correlated with regression ratio than other metrics we investigate. Our results suggest that to reduce the regression ratio, increasing statement coverage is more promising than improving the other test-suite metrics.

RQ3: Is there a negative correlation between the metrics of a test-suite and the repairability of automated program repair? In other words, should repairability be sacrificed in an attempt to obtain a higher-quality repair via a higher-quality testsuite?

Our experimental results are inconclusive about the correlation between test-suites and repairability. However, we note that increasing test-suite metric does not always decrease repairability. Im some subjects, positive correlations were observed between test-suite metrics and repairability, indicating that as the test-suite metrics increase, repairability tends to increase.

RQ4: Is there a negative correlation between the metrics of a test-suite and repair time? In other words, should more time be spent in an attempt to obtain a higher-quality repair via a higher-quality test-suite?

Our experimental results are inconlusive about the correlation between test-suites and repair time. However, we note that increasing test-suite metric does not always increase repair time. In some subjects, negative correlations were observed between test-suite metrics and repair time, indicating that as the test-suite metrics increase, repair time tends to decrease.

Different Repair Algorithm: SEMFIX

Our experimental results from SEMFIX generally coincide with our finding from the GENPROG experiment, despite the differences in repair algorithms and fault localization techniques. The traditional test-suite metrics are, overall, negatively correlated with regression ratio, similar to our GENPROG experimental results. In particular, **statement coverage** is again shown to be most strongly correlated with regression ratio.

[EMSE'17] A Correlation Study between Automated Program Repair and Test-Suite Metrics的更多相关文章

  1. Reading List on Automated Program Repair

    Some resources: https://www.monperrus.net/martin/automatic-software-repair 2017 [ ] DeepFix: Fixing ...

  2. [Benchmark] Codeflaws: A Programming Competition Benchmark for Evaluating Automated Program Repair Tools

    Basic Information Publication: ICSE'17 Authors: Shin Hwei Tan, Jooyong Yi, Yulis, Sergey Mechtaev, A ...

  3. One example to understand SemFix: Program Repair via Semantic Analysis

    One example to understand SemFix: Program Repair via Semantic Analysis Basic Information Authors: Ho ...

  4. paho_c_pub 使用方法

    Latest Paho Status (2) 摘自:http://modelbasedtesting.co.uk/ I last wrote about the state of Paho in Oc ...

  5. A Great List of Windows Tools

    Windows is an extremely effective and a an efficient operating system. Like any other operating syst ...

  6. docker入门级详解

    Docker 1 docker安装 yum install docker [root@topcheer ~]# systemctl start docker [root@topcheer ~]# mk ...

  7. C#Lambda表达式演变和Linq的深度解析

    Lambda 一.Lambda的演变 Lambda的演变,从下面的类中可以看出,.Net Framwork1.0时还是用方法实例化委托的,2.0的时候出现了匿名方法,3.0的时候出现了Lambda. ...

  8. hadoop 2.7.3本地环境运行官方wordcount

    hadoop 2.7.3本地环境运行官方wordcount 基本环境: 系统:win7 虚机环境:virtualBox 虚机:centos 7 hadoop版本:2.7.3 本次先以独立模式(本地模式 ...

  9. Manual——Test (翻译1)

    LTE Manual ——Logging(翻译) (本文为个人学习笔记,如有不当的地方,欢迎指正!) 1.17.3 Testing framework(测试框架)   ns-3 包含一个仿真核心引擎. ...

随机推荐

  1. ES6语法(一)let 和 const 命令

    1.let命令 与 var 的区别 用法类似于var,但是所声明的变量,只在 let命令所在的代码块内有效. var 声明的变量可以在声明之前使用,值为 undefined ;let命令改变了语法行为 ...

  2. 【数位dp】Beautiful Numbers @2018acm上海大都会赛J

    目录 Beautiful Numbers PROBLEM 题目描述 输入描述: 输出描述: 输入 输出 MEANING SOLUTION CODE Beautiful Numbers PROBLEM ...

  3. python之流程控制与运算符

    第一:流程控制 一:if条件语句 计算机之所以能做很多自动化的任务,因为它可以自己做条件判断. 单分支语句: 单分支,单个条件 age = 20 if age >= 18: print('you ...

  4. JavaScript字符串API

    String.prototype.anchor() anchor()方法用于创建一个<a>html描元素 const str = '我是html内容'.anchor('我是name属性值' ...

  5. SAP S/4 1610 IDES + HANA 2.0 安装

    前几天安装的都没带演示数据 ,这个版本带DEMO数据,学习比较好 我的机器配置: 内存:128G CPU:E5-2618L V4 硬盘:1T SSD 安装在VMware虚拟机中,安装完后,虚拟机大小只 ...

  6. CMD批处理循环,太强大了(转)

    终极dos批处理循环命令详解格式:FOR [参数] %%变量名 IN (相关文件或命令)   DO 执行的命令 作用:对一个或一组文件,字符串或命令结果中的每一个对象执行特定命令,达到我们想要的结果. ...

  7. python 2 字典的基本使用

    dicVisit={} if not dicVisit.has_key(visitKey):#不存在 diaSet = set() diaSet.add(diagnoseContent) dicVis ...

  8. [Linux] - 利用ping给端口加密,限制访问

    Linux中,想对特定的端口加密访问,可以使用iptables的ping方式. 作用 访问被限制的端口,必需先ping发送对应的字节包(字节包大小可自行设置,此为密钥)才能访问成功! 下边是对SSH的 ...

  9. ChartControl控件0和null的效果

    DevExpress的ChartControl虽然还不能完全代替Office图表(例如它暂时不支持添加数据表),但它算同类产品中相当优秀的了,下面是对0值和空值的处理. DataTable zeroD ...

  10. ionic BUILD FAILED

    BUILD FAILED Total time: 24.572 secs FAILURE: Build failed with an exception. What went wrong: Execu ...