Basic Information

  • Publication: ICSE'17
  • Authors: Shin Hwei Tan, Jooyong Yi, Yulis, Sergey Mechtaev, Abhik Roychoudhury
  • Language: C Program
  • Source: Codeforces Programming Contest (Reject/Accept)
  • Description: a set of 3902 defects from 7436 programs automatically classified across 39 defect classes
  • Dataset Homepage

Summary

Existing benchmarks (like ManyBugs and IntroClass) on automated program repairs do not allow thorough investigation of the relationship between fault types and the effectiveness of repair tools.
Four criterias for a benchmark that allows extensive evaluation of repair tools:

  • C1: Diverse types of real defects.
  • C2: Large number of defects.
  • C3: Large number of programs.
  • C4: Programs that are algorithmically complex
  • C5: Large held-out test suite for patch correctness verification

Overall, author crawled over 10000 webpages from Codeforces programming contest. For each rejected submission r, they find another accepted submission a by the same user for the same programming problem in the crawled data. Each fault is represented by the submission pair (r, a). In total, they obtain 5544 defects. Then they further exclude 924 defects due to inadequate held-out tests, 677 defects due to non-reproducible bugs, and 41 defects due to a known CIL bugs2 in handling variable sized multidimensional array.

All defects are divided into 39 classes by using Gumtree on AST-level syntactic differences between buggy program and patched program.

Structure

codeflaws
|> 1-A-bug-18353198-18353306 (<contestid>-<problem>-bug-<buggy-submisionid>-<accepted-submissionid>)
|===> 1-A-18353198.c (<contestid>-<problem>-<buggy-submisionid>.c)
|===> 1-A-18353306.c (<contestid>-<problem>-<accepted-submissionid>.c)
|===> input-neg1 (Test input files: input[0-9]+ file used by Test suite (i))
|===> output-neg1 (Test output files: output[0-9]+ file used by Test suite (i))
|===> heldout-input-pos1 (heldout-input[0-9]+ file used by Test suite (ii))
|===> heldout-output-pos1 (heldout-output[0-9]+ file used by Test suite (ii))
|===> 1-A-18353198.c.revlog(Test configuration for SPR that specify the name for pass/fail test: --.c.revlog)
|===> test-genprog.sh (Repair Test script (test suite given to repair tools for generating repair), test-genprog.sh is for search-based repair tools (GenProg, SPR, Prophet))
|===> test-angelix.sh (Repair Test script (test suite given to repair tools for generating repair), test-angelix.sh is for Angelix as it requires inserting special instrumentation)
|===> test-valid.sh(Test script for patch validation (held-out test suite): test-valid.sh is for validating the correctness of patches)
|===> Makefile (Makefile for compiling the buggy submission. This contains the CFLAGS options recommended by Codeforces. To compile the accepted submission, use the command make FILENAME=10-A-13543524)
|===> Makefile.genprog (Makefile.genprog for compiling the buggy submission using cilly. This is for GenProg experiments as GenProg works on CIL representation.)

[Benchmark] Codeflaws: A Programming Competition Benchmark for Evaluating Automated Program Repair Tools的更多相关文章

  1. Benchmark result without MONITOR running: Benchmark result with MONITOR running (redis-cli monitor > /dev/null): 吞吐量 下降约1半 Redis监控工具,命令和调优

    https://redis.io/commands/monitor In this particular case, running a single MONITOR client can reduc ...

  2. 2050 Programming Competition

    http://2050.acmclub.cn/contests/contest_show.php?cid=3 开场白 Time Limit: 2000/1000 MS (Java/Others)    ...

  3. 2050 Programming Competition (CCPC)

    Pro&Sol 链接: https://pan.baidu.com/s/17Tt3EPKEQivP2-3OHkYD2A 提取码: wbnu 复制这段内容后打开百度网盘手机App,操作更方便哦 ...

  4. 2019 China Collegiate Programming Contest Qinhuangdao Onsite F. Forest Program(DFS计算图中所有环的长度)

    题目链接:https://codeforces.com/gym/102361/problem/F 题意 有 \(n\) 个点和 \(m\) 条边,每条边属于 \(0\) 或 \(1\) 个环,问去掉一 ...

  5. Reading List on Automated Program Repair

    Some resources: https://www.monperrus.net/martin/automatic-software-repair 2017 [ ] DeepFix: Fixing ...

  6. Azure Redis Cache (3) 在Windows 环境下使用Redis Benchmark

    <Windows Azure Platform 系列文章目录> 熟悉Redis环境的读者都知道,我们可以在Linux环境里,使用Redis Benchmark,测试Redis的性能. ht ...

  7. MYSQL BENCHMARK函数的使用

    MYSQL BENCHMARK函数是最重要的函数之一,下文对该函数的使用进行了详尽的分析,如果您对此感兴趣的话,不妨一看. 下文为您介绍的是MYSQL BENCHMARK函数的语法,及一些MYSQL  ...

  8. Benchmark与Profiler---性能调优得力助手

    转载请注明出处:http://blog.csdn.net/gaoyanjie55/article/details/34981077 性能优化.它是一种诊断性能瓶颈,能问题点进行优化的过程.前两天听完s ...

  9. c++性能测试工具:google benchmark入门(一)

    如果你正在寻找一款c++性能测试工具,那么这篇文章是不容错过的. 市面上的benchmark工具或多或少存在一些使用上的不便,那么是否存在一个使用简便又功能强大的性能测试工具呢?答案是google/b ...

随机推荐

  1. Spring MVC4 + Spring Security4 + Hibernate实例

    http://www.yiibai.com/spring-security/spring-mvc-4-and-spring-security-4-integration-example.html 在这 ...

  2. Linux之通配符实验

    作业五:通配符实验 反引号与()在此时都是表死获取结果 但是一般使用()的方式,因为反引号在多个反引号的时候无法正确指代 获取当前bash 的变量 echo $变量名 echo $? 表示上一次命令的 ...

  3. flask之信号和mateclass元类

    本篇导航: flask实例化参数 信号 metaclass元类解析 一.flask实例化参数 instance_path和instance_relative_config是配合来用的:这两个参数是用来 ...

  4. 通过System.CommandLine快速生成支持命令行的应用

    一直以来,当我们想让我们的控制台程序支持命令行启动时,往往需要编写大量代码来实现这一看起来很简单的功能.虽然有一些库可以简化一些操作,但整个过程仍然是一个相当枯燥而乏味的过程.我之前也写过一些文章简单 ...

  5. 修改编辑器为Markdown编辑器

    一直都在使用cnblogs的TinyMCE,不过感觉好久不更新,还是用Markdown吧,写多了Markdown 还真是受感染呢. 学习下吧,边学便用. 参考链接: 序列图 [简明版]有道云笔记Mar ...

  6. 2012年NOIP普及组 摆花

    题目描述 Description 小明的花店新开张,为了吸引顾客,他想在花店的门口摆上一排花,共m盆.通过调查顾客的喜好,小明列出了顾客最喜欢的n种花,从1到n标号.为了在门口展出更多种花,规定第i种 ...

  7. 【MySQL】MySQL中查询出数据表中存在重复的值list

    1.目的:查询MySQL数据表中,重复记录的值 2.示例: 3.代码: select serial_num,count(*) as count FROM card_ticket GROUP BY se ...

  8. 关于Jenkins日志爆满的解决方法

    最近发现公司的jenkins因为日志量太大把磁盘占满,查看日志文件“/var/log/jenkins/jenkins.log”几分钟产生了几十G的日志 而且日志还在一直增长,内容如下 120: 313 ...

  9. SNF框架及机器人2018年1-9月份升级内容

    1月 增加评星控件.年月选择控件 完善表格弹框的封装,增加多选弹框 的封装 增加表格 单元格合并.列头必填与可填写的标识 4月 关于分页查询和排序的各种修改(扶额) 导入excel优化 bs计算合计的 ...

  10. JSON.stringify转化报错

    两种方式会导致该错误:1.json格式数据存在循环调用.   举个例子: var obj = { title: '标题'}obj.content = obj;JSON.stringify(obj); ...