[EMSE'17] A Correlation Study between Automated Program Repair and Test-Suite Metrics
Basic Information
- Authors: Jooyong Yi, Shin Hwei Tan, Sergey Mechtaev, Marcel Böhme, Abhik Roychoudhury
- Publication: EMSE'17
- Conclusion: In general, with the increase of traditional test suite metrics, the reliability of repairs tend to increase. In particular, such a trend is most strongly observed in statement coverage. Their results imply that the traditional test suite metrics proposed for software testing can also be used for automated program repair to improve the reliability of repairs.
Interesting Points
Correlation between Mutation Testing and Automated Program Repair:
To some extent, automated program repair and mutation testing are very similar. It can be viewed that automated program repair “mutates" the original program, this time in an attempt to find a repair. As in mutation testing, mutants that fail to pass all tests in the provided test-suite are considered buggy (hence, incorrect repairs). This conceptual similarity between mutation testing and automated program repair suggests the plausibility of using the mutation score to measure the quality of a test-suite not only for mutation testing but also for automated program repair. Just as a higher mutation score is associated with a better fault-detection ability in mutation testing, it appears plausible to associate a higher mutation score with a better ability to guide a reliable repair.
There is not only similarity but also duality between mutation testing and automated program repair. As pointed out by Weimer et al (2013), “our confidence in mutant testing increases with the set of non-redundant mutants considered, but our confidence in the quality of a program repair gains increases with the set of non-redundant tests." Note that mutation score measures the non-redundancy of killed mutants, not the non-redundancy of tests capable of killing mutants. We introduce a new metric called capable-tests ratio in the next section that measures the non-redundancy of capable tests.
Measure quality of Test-suite quality and Repair
This paper mainly explore the correlation between quality of automated program repair (APR) and test-suite.
To evaluate quality of APR, traditional metrics (i.e., 1) statement coverage, 2) branch coverage, 3) test-suite size, 4) mutation score) and capable-tests ratio are used.
RQs and Results
RQ1: : Is there a negative correlation between the metrics of a testsuite and the regression ratio of automatically generated repairs? In other words, are generated repairs less likely to cause regressions, as test-suite metrics increase?
As the traditional test-suite metrics (statement coverage, branch coverage, test-suite size, and mutation score) increase, the regression rate of automatically generated repairs generally decreases, showing the promise of using the traditional test-suite metrics to control the regression ratio of automatically generated repairs. Capable-tests ratio does not seem as useful as the traditional metrics in controlling the quality of generated repairs.
RQ2: Which test-suite metric is most strongly correlated with the regression ratio of automatically generated repairs?
In our experiments, statement coverage is, on average, more strongly correlated with regression ratio than other metrics we investigate. Our results suggest that to reduce the regression ratio, increasing statement coverage is more promising than improving the other test-suite metrics.
RQ3: Is there a negative correlation between the metrics of a test-suite and the repairability of automated program repair? In other words, should repairability be sacrificed in an attempt to obtain a higher-quality repair via a higher-quality testsuite?
Our experimental results are inconclusive about the correlation between test-suites and repairability. However, we note that increasing test-suite metric does not always decrease repairability. Im some subjects, positive correlations were observed between test-suite metrics and repairability, indicating that as the test-suite metrics increase, repairability tends to increase.
RQ4: Is there a negative correlation between the metrics of a test-suite and repair time? In other words, should more time be spent in an attempt to obtain a higher-quality repair via a higher-quality test-suite?
Our experimental results are inconlusive about the correlation between test-suites and repair time. However, we note that increasing test-suite metric does not always increase repair time. In some subjects, negative correlations were observed between test-suite metrics and repair time, indicating that as the test-suite metrics increase, repair time tends to decrease.
Different Repair Algorithm: SEMFIX
Our experimental results from SEMFIX generally coincide with our finding from the GENPROG experiment, despite the differences in repair algorithms and fault localization techniques. The traditional test-suite metrics are, overall, negatively correlated with regression ratio, similar to our GENPROG experimental results. In particular, **statement coverage** is again shown to be most strongly correlated with regression ratio.
[EMSE'17] A Correlation Study between Automated Program Repair and Test-Suite Metrics的更多相关文章
- Reading List on Automated Program Repair
Some resources: https://www.monperrus.net/martin/automatic-software-repair 2017 [ ] DeepFix: Fixing ...
- [Benchmark] Codeflaws: A Programming Competition Benchmark for Evaluating Automated Program Repair Tools
Basic Information Publication: ICSE'17 Authors: Shin Hwei Tan, Jooyong Yi, Yulis, Sergey Mechtaev, A ...
- One example to understand SemFix: Program Repair via Semantic Analysis
One example to understand SemFix: Program Repair via Semantic Analysis Basic Information Authors: Ho ...
- paho_c_pub 使用方法
Latest Paho Status (2) 摘自:http://modelbasedtesting.co.uk/ I last wrote about the state of Paho in Oc ...
- A Great List of Windows Tools
Windows is an extremely effective and a an efficient operating system. Like any other operating syst ...
- docker入门级详解
Docker 1 docker安装 yum install docker [root@topcheer ~]# systemctl start docker [root@topcheer ~]# mk ...
- C#Lambda表达式演变和Linq的深度解析
Lambda 一.Lambda的演变 Lambda的演变,从下面的类中可以看出,.Net Framwork1.0时还是用方法实例化委托的,2.0的时候出现了匿名方法,3.0的时候出现了Lambda. ...
- hadoop 2.7.3本地环境运行官方wordcount
hadoop 2.7.3本地环境运行官方wordcount 基本环境: 系统:win7 虚机环境:virtualBox 虚机:centos 7 hadoop版本:2.7.3 本次先以独立模式(本地模式 ...
- Manual——Test (翻译1)
LTE Manual ——Logging(翻译) (本文为个人学习笔记,如有不当的地方,欢迎指正!) 1.17.3 Testing framework(测试框架) ns-3 包含一个仿真核心引擎. ...
随机推荐
- list-列表练习
#list列表取值更方便灵活 列表.数组说的都是1个东西#列表中每个字符都有一个编号,就是我们说的下标,从0开始#如果你输入的下标在列表中不存在,会报下标越界的错误 1.查询user表中下标为0的记录 ...
- Linux安装gcc时碰到的有关问题解决(解决gcc依赖有关问题)
Linux安装gcc时碰到的有关问题解决(解决gcc依赖有关问题) rpm安装gcc时碰到的有关问题解决(解决gcc依赖有关问题) 提示:error: Failed dependencies: clo ...
- lettcode21. Merge Two Sorted Lists
lettcode21. Merge Two Sorted Lists Merge two sorted linked lists and return it as a new list. The ne ...
- Hibernate(9)_双向n对n
1.概述 ①双向 n-n 关联需要两端都使用集合属性 ②双向n-n关联必须使用连接表 ③集合属性应增加 key 子元素用以映射外键列, 集合元素里还应增加many-to-many子元素关联实体类 ④在 ...
- 7、js对象
在python中我们学习了面向对象,javascript也是一门面向对象语言,在JavaScript中除了null和undefined以外其他的数据类型都被定义成了对象. 本篇导航: String对象 ...
- D. Cutting Out 二分
题意是给你n个数字的序列,让你从中找含k个数字的序列,要求这k个数字要尽可能多次的从n个数字的序列中减去. 解法就是从1到n,二分查找可以删除的最大次数. http://codeforces.com/ ...
- insert /*+append*/为什么会提高性能
在上一篇的blog中 做了下使用,在归档和非归档下,做数据插入http://blog.csdn.net/guogang83/article/details/9219479.结论是在非归档模式下表设置为 ...
- Java全栈程序员之09:IDEA+GitHub
GitHub是源码托管站点,其依赖于Git这个源码管理工具来进行代码的托管.所以将我们的代码托管到GitHub之前,我们需要安装Git. 1.Git安装 可以通过输入git命令来确定是否在本机已经安装 ...
- c++中transform()函数和find()函数的使用方法。
1.transform函数的使用 transform在指定的范围内应用于给定的操作,并将结果存储在指定的另一个范围内.transform函数包含在<algorithm>头文件中. 以下是s ...
- SOFABolt 源码分析
SOFABolt 是一个轻量级.高性能.易用的远程通信框架,基于netty4.1,由蚂蚁金服开源. 本系列博客会分析 SOFABolt 的使用姿势,设计方案及详细的源码解析.后续还会分析 SOFABo ...