[Benchmark] Codeflaws: A Programming Competition Benchmark for Evaluating Automated Program Repair Tools
Basic Information
- Publication: ICSE'17
- Authors: Shin Hwei Tan, Jooyong Yi, Yulis, Sergey Mechtaev, Abhik Roychoudhury
- Language: C Program
- Source: Codeforces Programming Contest (Reject/Accept)
- Description: a set of 3902 defects from 7436 programs automatically classified across 39 defect classes
- Dataset Homepage
Summary
Existing benchmarks (like ManyBugs and IntroClass) on automated program repairs do not allow thorough investigation of the relationship between fault types and the effectiveness of repair tools.
Four criterias for a benchmark that allows extensive evaluation of repair tools:
- C1: Diverse types of real defects.
- C2: Large number of defects.
- C3: Large number of programs.
- C4: Programs that are algorithmically complex
- C5: Large held-out test suite for patch correctness verification
Overall, author crawled over 10000 webpages from Codeforces programming contest. For each rejected submission r, they find another accepted submission a by the same user for the same programming problem in the crawled data. Each fault is represented by the submission pair (r, a). In total, they obtain 5544 defects. Then they further exclude 924 defects due to inadequate held-out tests, 677 defects due to non-reproducible bugs, and 41 defects due to a known CIL bugs2 in handling variable sized multidimensional array.
All defects are divided into 39 classes by using Gumtree on AST-level syntactic differences between buggy program and patched program.
Structure
codeflaws
|> 1-A-bug-18353198-18353306 (<contestid>-<problem>-bug-<buggy-submisionid>-<accepted-submissionid>)
|===> 1-A-18353198.c (<contestid>-<problem>-<buggy-submisionid>.c)
|===> 1-A-18353306.c (<contestid>-<problem>-<accepted-submissionid>.c)
|===> input-neg1 (Test input files: input[0-9]+ file used by Test suite (i))
|===> output-neg1 (Test output files: output[0-9]+ file used by Test suite (i))
|===> heldout-input-pos1 (heldout-input[0-9]+ file used by Test suite (ii))
|===> heldout-output-pos1 (heldout-output[0-9]+ file used by Test suite (ii))
|===> 1-A-18353198.c.revlog(Test configuration for SPR that specify the name for pass/fail test: --.c.revlog)
|===> test-genprog.sh (Repair Test script (test suite given to repair tools for generating repair), test-genprog.sh is for search-based repair tools (GenProg, SPR, Prophet))
|===> test-angelix.sh (Repair Test script (test suite given to repair tools for generating repair), test-angelix.sh is for Angelix as it requires inserting special instrumentation)
|===> test-valid.sh(Test script for patch validation (held-out test suite): test-valid.sh is for validating the correctness of patches)
|===> Makefile (Makefile for compiling the buggy submission. This contains the CFLAGS options recommended by Codeforces. To compile the accepted submission, use the command make FILENAME=10-A-13543524)
|===> Makefile.genprog (Makefile.genprog for compiling the buggy submission using cilly. This is for GenProg experiments as GenProg works on CIL representation.)
[Benchmark] Codeflaws: A Programming Competition Benchmark for Evaluating Automated Program Repair Tools的更多相关文章
- Benchmark result without MONITOR running: Benchmark result with MONITOR running (redis-cli monitor > /dev/null): 吞吐量 下降约1半 Redis监控工具,命令和调优
https://redis.io/commands/monitor In this particular case, running a single MONITOR client can reduc ...
- 2050 Programming Competition
http://2050.acmclub.cn/contests/contest_show.php?cid=3 开场白 Time Limit: 2000/1000 MS (Java/Others) ...
- 2050 Programming Competition (CCPC)
Pro&Sol 链接: https://pan.baidu.com/s/17Tt3EPKEQivP2-3OHkYD2A 提取码: wbnu 复制这段内容后打开百度网盘手机App,操作更方便哦 ...
- 2019 China Collegiate Programming Contest Qinhuangdao Onsite F. Forest Program(DFS计算图中所有环的长度)
题目链接:https://codeforces.com/gym/102361/problem/F 题意 有 \(n\) 个点和 \(m\) 条边,每条边属于 \(0\) 或 \(1\) 个环,问去掉一 ...
- Reading List on Automated Program Repair
Some resources: https://www.monperrus.net/martin/automatic-software-repair 2017 [ ] DeepFix: Fixing ...
- Azure Redis Cache (3) 在Windows 环境下使用Redis Benchmark
<Windows Azure Platform 系列文章目录> 熟悉Redis环境的读者都知道,我们可以在Linux环境里,使用Redis Benchmark,测试Redis的性能. ht ...
- MYSQL BENCHMARK函数的使用
MYSQL BENCHMARK函数是最重要的函数之一,下文对该函数的使用进行了详尽的分析,如果您对此感兴趣的话,不妨一看. 下文为您介绍的是MYSQL BENCHMARK函数的语法,及一些MYSQL ...
- Benchmark与Profiler---性能调优得力助手
转载请注明出处:http://blog.csdn.net/gaoyanjie55/article/details/34981077 性能优化.它是一种诊断性能瓶颈,能问题点进行优化的过程.前两天听完s ...
- c++性能测试工具:google benchmark入门(一)
如果你正在寻找一款c++性能测试工具,那么这篇文章是不容错过的. 市面上的benchmark工具或多或少存在一些使用上的不便,那么是否存在一个使用简便又功能强大的性能测试工具呢?答案是google/b ...
随机推荐
- Android requestcode resultcode的作用
requestcode 一个页面的不同事件,激发不同的函数,startActivityForResult中传入不同的请求码的值以调用下一个界面,在被调用界面结束返回第一个界面时,请求码会自动返回(自动 ...
- python之进程和线程
1 操作系统 为什么要有操作系统 ? 操作系统位于底层硬件与应用软件之间的一层 工作方式:向下管理硬件,向上提供接口 操作系统进程切换: 出现IO操作 固定时间 2 进程和线程的概念 进程就是一个程序 ...
- sql 索引笔记
以下资料来自MSDN. 数据库注意事项 设计索引时,应考虑以下数据库准则: 一个表如果建有大量索引会影响 INSERT.UPDATE 和 DELETE 语句的性能,因为在表中的数据更改时,所有索引都须 ...
- MUI学习04-开关按钮
HTML代码如下: <div class="mui-switch"> <div class="mui-switch-handle">&l ...
- 12、mysql补充
本篇导航: 视图 触发器 事务 存储过程 函数 流程控制 一.视图 视图是一个虚拟表(非真实存在),其本质是[根据SQL语句获取动态的数据集,并为其命名],用户使用时只需使用[名称]即可获取结果集,可 ...
- JAVA调用外部安装7-Zip压缩和解压zip文件
1.首先在本地安装7-Zip(下载链接:https://www.7-zip.org/)2.调用7-Zip压缩.zip文件: /** * 生成.zip压缩文件 * @param fi ...
- postgresql中使用distinct去重
select语法 [ WITH [ RECURSIVE ] with_query [, ...] ] SELECT [ ALL | DISTINCT [ ON ( expression [, ...] ...
- ConcurrentHashMap代码解析
ConcurrentHashMap (JDK 1.7)的继承关系如下: 1. ConcurrentHashMap是线程安全的hash map.ConcurrentHashMap的数据结构是一个Segm ...
- mybatis --- 如何相互转换逗号分隔的字符串和List
如果程序员想实现某种功能,有两条路可以走.一条就是自己实现,一条就是调用别人的实现,别人的实现就是所谓的API.而且大多数情况下,好多“别人”都 实现了这个功能.程序员有不得不在这其中选择.大部分情况 ...
- java 根据身份证号码获取出生日期、性别、年龄
1.情景展示 如何根据身份证号,计算出出生日期.性别.年龄? 2.解决方案 从网上找的别人的,因为并没有实际用到,所以并未对其优化! /** * 通过身份证号码获取出生日期.性别.年龄 * @pa ...