[Benchmark] Codeflaws: A Programming Competition Benchmark for Evaluating Automated Program Repair Tools
Basic Information
- Publication: ICSE'17
- Authors: Shin Hwei Tan, Jooyong Yi, Yulis, Sergey Mechtaev, Abhik Roychoudhury
- Language: C Program
- Source: Codeforces Programming Contest (Reject/Accept)
- Description: a set of 3902 defects from 7436 programs automatically classified across 39 defect classes
- Dataset Homepage
Summary
Existing benchmarks (like ManyBugs and IntroClass) on automated program repairs do not allow thorough investigation of the relationship between fault types and the effectiveness of repair tools.
Four criterias for a benchmark that allows extensive evaluation of repair tools:
- C1: Diverse types of real defects.
- C2: Large number of defects.
- C3: Large number of programs.
- C4: Programs that are algorithmically complex
- C5: Large held-out test suite for patch correctness verification
Overall, author crawled over 10000 webpages from Codeforces programming contest. For each rejected submission r, they find another accepted submission a by the same user for the same programming problem in the crawled data. Each fault is represented by the submission pair (r, a). In total, they obtain 5544 defects. Then they further exclude 924 defects due to inadequate held-out tests, 677 defects due to non-reproducible bugs, and 41 defects due to a known CIL bugs2 in handling variable sized multidimensional array.
All defects are divided into 39 classes by using Gumtree on AST-level syntactic differences between buggy program and patched program.
Structure
codeflaws
|> 1-A-bug-18353198-18353306 (<contestid>-<problem>-bug-<buggy-submisionid>-<accepted-submissionid>)
|===> 1-A-18353198.c (<contestid>-<problem>-<buggy-submisionid>.c)
|===> 1-A-18353306.c (<contestid>-<problem>-<accepted-submissionid>.c)
|===> input-neg1 (Test input files: input[0-9]+ file used by Test suite (i))
|===> output-neg1 (Test output files: output[0-9]+ file used by Test suite (i))
|===> heldout-input-pos1 (heldout-input[0-9]+ file used by Test suite (ii))
|===> heldout-output-pos1 (heldout-output[0-9]+ file used by Test suite (ii))
|===> 1-A-18353198.c.revlog(Test configuration for SPR that specify the name for pass/fail test: --.c.revlog)
|===> test-genprog.sh (Repair Test script (test suite given to repair tools for generating repair), test-genprog.sh is for search-based repair tools (GenProg, SPR, Prophet))
|===> test-angelix.sh (Repair Test script (test suite given to repair tools for generating repair), test-angelix.sh is for Angelix as it requires inserting special instrumentation)
|===> test-valid.sh(Test script for patch validation (held-out test suite): test-valid.sh is for validating the correctness of patches)
|===> Makefile (Makefile for compiling the buggy submission. This contains the CFLAGS options recommended by Codeforces. To compile the accepted submission, use the command make FILENAME=10-A-13543524)
|===> Makefile.genprog (Makefile.genprog for compiling the buggy submission using cilly. This is for GenProg experiments as GenProg works on CIL representation.)
[Benchmark] Codeflaws: A Programming Competition Benchmark for Evaluating Automated Program Repair Tools的更多相关文章
- Benchmark result without MONITOR running: Benchmark result with MONITOR running (redis-cli monitor > /dev/null): 吞吐量 下降约1半 Redis监控工具,命令和调优
https://redis.io/commands/monitor In this particular case, running a single MONITOR client can reduc ...
- 2050 Programming Competition
http://2050.acmclub.cn/contests/contest_show.php?cid=3 开场白 Time Limit: 2000/1000 MS (Java/Others) ...
- 2050 Programming Competition (CCPC)
Pro&Sol 链接: https://pan.baidu.com/s/17Tt3EPKEQivP2-3OHkYD2A 提取码: wbnu 复制这段内容后打开百度网盘手机App,操作更方便哦 ...
- 2019 China Collegiate Programming Contest Qinhuangdao Onsite F. Forest Program(DFS计算图中所有环的长度)
题目链接:https://codeforces.com/gym/102361/problem/F 题意 有 \(n\) 个点和 \(m\) 条边,每条边属于 \(0\) 或 \(1\) 个环,问去掉一 ...
- Reading List on Automated Program Repair
Some resources: https://www.monperrus.net/martin/automatic-software-repair 2017 [ ] DeepFix: Fixing ...
- Azure Redis Cache (3) 在Windows 环境下使用Redis Benchmark
<Windows Azure Platform 系列文章目录> 熟悉Redis环境的读者都知道,我们可以在Linux环境里,使用Redis Benchmark,测试Redis的性能. ht ...
- MYSQL BENCHMARK函数的使用
MYSQL BENCHMARK函数是最重要的函数之一,下文对该函数的使用进行了详尽的分析,如果您对此感兴趣的话,不妨一看. 下文为您介绍的是MYSQL BENCHMARK函数的语法,及一些MYSQL ...
- Benchmark与Profiler---性能调优得力助手
转载请注明出处:http://blog.csdn.net/gaoyanjie55/article/details/34981077 性能优化.它是一种诊断性能瓶颈,能问题点进行优化的过程.前两天听完s ...
- c++性能测试工具:google benchmark入门(一)
如果你正在寻找一款c++性能测试工具,那么这篇文章是不容错过的. 市面上的benchmark工具或多或少存在一些使用上的不便,那么是否存在一个使用简便又功能强大的性能测试工具呢?答案是google/b ...
随机推荐
- list-循环小练习(作业已交未交)
报错 list index out of range : 超出下标 这个错误是因为在写stus列表的时候写成了如下stus=['小花,未交'] ,但是取下标的时候取的是stus[1]:实际该列表中 ...
- Idea-每次修改JS文件都需要重启Idea才能生效解决方法
最近开始使用Idea,有些地方的确比eclipse方便.但是我发现工程每次修改JS或者是JSP页面后,并没有生效,每次修改都需要重启一次Tomcat这样的确不方便.我想Idea肯定有设置的方法,不可能 ...
- 关于实现udev/mdev自动挂载与卸载
在网上有很多关于讲mdev的自动挂载基本上都是一个版本,经过测试自动挂载确实可行,但是关于自动卸载mdev似乎不能很好的支持,经过修改已经可以做到与udev的效果相似.不能在挂载的目录中进行热插拔,否 ...
- JAVA的基本数据类型和类型转换
一.数据类型 java是一种强类型语言,第一次申明变量必须说明数据类型,第一次变量赋值称为变量的初始化. java数据类型分为基本数据类型和引用数据类型 基本数据类型有4类8种 第一类(有4种)整型: ...
- python测试开发django-50.jquery发送ajax请求(get)
前言 有时候,我们希望点击页面上的某个按钮后,不刷新整个页面,给后台发送一个请求过去,请求到数据后填充到html上,这意味着可以在不重新加载整个网页的情况下,对网页的某部分进行更新.Ajax可以完美的 ...
- 打开KVM Console的一些注意事项
今天早上跟思科CIMC里的KVM console较劲了很久,终于成功的打开了KVM console. 总结了下面的一些注意事项.如果你也遇到了KVM console打不开,那么可以尝试一下. 我不想花 ...
- java-Freemarker TemplateLoader实现模版
TemplateLoader的实现 作为一个模板文件加载的抽象,自然不能限制模板来自何方,在FreeMarker中由几个主要的实现类来体现,这些TemplateLoader是可以独立使用的,Webap ...
- grid - 网格项目对齐方式(Box Alignment)
CSS的Box Alignment Module补充了网格项目沿着网格行或列轴对齐方式. <view class="grid"> <view class='ite ...
- jquery核心
1.找到所有 p 元素,并且这些元素都必须是 div 元素的子元素 $("div > p"); 2.设置页面背景色 $(document.body).css("ba ...
- phpBB3导入用户的Python脚本
关联的数据表 在phpBB3中导入用户时, 需要处理的有两张表, 一个是 users, 一个是 user_group. 如果是新安装的论坛, 在每次导入之前, 用以下语句初始化: DELETE FRO ...