DEEPCODER: LEARNING TO WRITE PROGRAMS

Basic Information

  • Authors: Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, Daniel Tarlow
  • Publication: ICLR'17
  • Description: Generate code based on input-output examples via neural network techniques

INDUCTIVE PROGRAM SYNTHESIS (IPS)

The Inductive Program Synthesis (IPS) problem is the following: given input-output examples, produce a program that has behavior consistent with the examples.

Building an IPS system requires solving two problems:

  • Search problem: to find consistent programs we need to search over a suitable set of possible programs. We need to define the set
    (i.e., the program space) and search procedure.
  • Ranking problem: if there are multiple programs consistent with the input-output examples, which one do we return?

Domain Specific Languages (DSLs)

  • DSLs are programming languages that are suitable for a
    specialized domain but are more restrictive than full-featured programming languages.
  • Restricted DSLs can also enable more efficient special-purpose search algorithms.
  • The choice of DSL also affects the difficulty of the ranking problem.

Search Techniques

Technique for searching for programs consistent with input-output examples.

  • Special-purpose algorithm
  • Satisfiability Modulo Theories (SMT) solving

Ranking

LEARNING INDUCTIVE PROGRAM SYNTHESIS (LIPS)

The components of LIPS are:

  1. a DSL specification,

    An attribute function A that maps programs P of the DSL to finite attribute vectors a = A(P). (Attribute vectors of different programs need not have equal length.) Attributes serve as the link between the machine learning and the search component of LIPS: the machine learning model predicts a distribution q(a | E), where E is the set of input-output examples, and the search procedure aims to search over programs P as ordered by q(A(P) | E). Thus an attribute is useful if it is both predictable from input-output examples, and if conditioning on its value significantly reduces the effective size of the search space.

    Possible attributes are the (perhaps position-dependent) presence or absence of high-level functions (e.g., does the program contain or end in a call to SORT). Other possible attributes include control
    flow templates (e.g., the number of loops and conditionals).

  2. a data-generation procedure,

    Generate a dataset ((P(n), a(n), E(n)))Nn=1 of programs P(n) in the chosen DSL, their attributes a(n), and accompanying input-output examples E(n)).

  3. a machine learning model that maps from input-output examples to program attributes,

    Learn a distribution of attributes given input-output examples, q(a | E).

  4. a search procedure that searches program space in an order guided by the model from (3).

    Interface with an existing solver, using the predicted q(a | E) to guide the search.

DEEPCODER: Instantiation of LIPS

  1. DSL AND ATTRIBUTES
    A program in our DSL is a sequence of function calls, where the result of each call initializes a fresh variable that is either a
    singleton integer or an integer array. Functions can be applied to any of the inputs or previously computed (intermediate) variables. The output of the program is the return value of the last function
    call, i.e., the last variable. See Fig. 1 for an example program of length T = 4 in our DSL.
    Overall, our DSL contains the first-order functions HEAD, LAST, TAKE, DROP, ACCESS, MINIMUM, MAXIMUM, REVERSE, SORT, SUM, and the higher-order functions MAP, FILTER, COUNT, ZIPWITH, SCANL1.

  1. DATA GENERATION
  2. MACHINE LEARNING MODEL
    1. an encoder: a differentiable mapping from a set of M input-output examples generated by
      a single program to a latent real-valued vector, and
    2. a decoder: a differentiable mapping from the latent vector representing a set of M inputoutput
      examples to predictions of the ground truth program’s attributes.

  1. SEARCH

    1. Depth-first search (DFS)
    2. “Sort and add” enumeration
    3. Sketch
  2. TRAINING LOSS FUNCTION
    Negative cross entropy loss

Implementation

  1. Pure python 3 implementation of DeepCoder
  2. Re-implement DeepCoder
  3. DeepCoder-tensorflow

[ICLR'17] DEEPCODER: LEARNING TO WRITE PROGRAMS的更多相关文章

  1. 17、Learning and Transferring IDs Representation in E-commerce笔记

    一.摘要 电子商务场景:主要组成部分(用户ID.商品ID.产品ID.商店ID.品牌ID.类别ID等) 传统的编码两个缺陷:如onehot,(1)存在稀疏性问题,维度高(2)不能反映关系,以两个不同的i ...

  2. SysML——AI-Sys Spring 2019

    AI-Sys Syllabus Projects Grading AI-Sys Spring 2019 When: Mondays and Wednesdays from 9:30 to 11:00 ...

  3. [综述]Deep Compression/Acceleration深度压缩/加速/量化

    Survey Recent Advances in Efficient Computation of Deep Convolutional Neural Networks, [arxiv '18] A ...

  4. (zhuan) Deep Reinforcement Learning Papers

    Deep Reinforcement Learning Papers A list of recent papers regarding deep reinforcement learning. Th ...

  5. Machine Learning 方向读博的一些重要期刊及会议 && 读博第一次组会时博导的交代

    读博从报道那天算起到现在已经3个多月了,这段时间以来和博导总共见过两次面,寥寥数语的见面要我对剩下的几年读书生活没有了太多的期盼,有些事情一直想去做却总是打不起来精神,最后挣扎一下还是决定把和博导开学 ...

  6. 【Deep Learning Nanodegree Foundation笔记】第 0 课:课程计划

    第一周 机器学习的类型,以及何时使用机器学习 我们将首先简单介绍线性回归和机器学习.这将让你熟悉这些领域的常用术语,你需要了解的技术进展,并了解深度学习在更大的机器学习背景中的位置. 直播:线性回归 ...

  7. Github项目推荐-图神经网络(GNN)相关资源大列表

    文章发布于公号[数智物语] (ID:decision_engine),关注公号不错过每一篇干货. 转自 | AI研习社 作者|Zonghan Wu 这是一个与图神经网络相关的资源集合.相关资源浏览下方 ...

  8. 库、教程、论文实现,这是一份超全的PyTorch资源列表(Github 2.2K星)

    项目地址:https://github.com/bharathgs/Awesome-pytorch-list 列表结构: NLP 与语音处理 计算机视觉 概率/生成库 其他库 教程与示例 论文实现 P ...

  9. CNN结构:场景分割与Relation Network

    参考第一个回答:如何评价DeepMind最新提出的RelationNetWork 参考链接:Relation Network笔记  ,暂时还没有应用到场景中 LiFeifei阿姨的课程:CV与ML课程 ...

随机推荐

  1. HTML常用标签2

    1 <!DOCTYPE>标签 声明位于文档最前面的位置,处于<html>标签之前.告知浏览器文档使用哪种规范 模式: 1. BackCompat:怪异模式,浏览器使用自己的怪异 ...

  2. JSP(2)—绝对路径与相对路径、配置Servlet与Servlet注解

    一.绝对路径和相对路径 ①开发时建议使用据对路径,使用绝对路径肯定没有问题,但是用相对路径可能会有问题. 在由Servlet转发到JSP页面时,此时在浏览器地址栏显示Sevvlet路径,若JSP页面的 ...

  3. pygame 笔记-6 碰撞检测

    这一节学习碰撞检测,先看原理图: 2个矩形如果发生碰撞(即:图形有重叠区域),按上图的判断条件就能检测出来,如果是圆形,则稍微变通一下,用半径检测.如果是其它不规则图形,大多数游戏中,并不要求精确检测 ...

  4. 08、共享变量(Broadcast Variable和Accumulator)

    共享变量工作原理 Spark一个非常重要的特性就是共享变量.   默认情况下,如果在一个算子的函数中使用到了某个外部的变量,那么这个变量的值会被拷贝到每个task中.此时每个task只能操作自己的那份 ...

  5. python下的selenium和PhantomJS

    一般我们使用python的第三方库requests及框架scrapy来爬取网上的资源,但是设计javascript渲染的页面却不能抓取,此时,我们使用web自动化测试化工具Selenium+无界面浏览 ...

  6. HRMS(人力资源管理系统)-从单机应用到SaaS应用-系统介绍

    上周发布的<2018,全新出发(全力推动实现住有所居)>文章,其中记录了个人在这5年过程中的成长和收获,有幸认识了不少博客园的朋友,大家一起学习交流,在这个过程当中好多朋友提出SaaS系统 ...

  7. Spark(四十五):Schema Registry

    很多时候在流数据处理时,我们会将avro格式的数据写入到kafka的topic,但是avro写入到kafka的时候,数据有可能会与版本升级,也就是schema发生变化,此时如果消费端,不知道哪些数据的 ...

  8. 【算法随记】Canny边缘检测算法实现和优化分析。

    以前的博文大部分都写的非常详细,有很多分析过程,不过写起来确实很累人,一般一篇好的文章要整理个三四天,但是,时间越来越紧张,后续的一些算法可能就以随记的方式,把实现过程的一些比较容易出错和有价值的细节 ...

  9. SSE图像算法优化系列二十九:基础的拉普拉斯金字塔融合用于改善图像增强中易出现的过增强问题(一)

    拉普拉斯金字塔融合是多图融合相关算法里最简单和最容易实现的一种,我们在看网络上大部分的文章都是在拿那个苹果和橙子融合在一起,变成一个果橙的效果作为例子说明.在这方面确实融合的比较好.但是本文我们主要讲 ...

  10. Sandcastle Help File Builder(.NET帮助文档工具)的版本选择心得——支持VS2010至VS2015,高版本项目文件问题

    作者: zyl910 一.缘由 "Sandcastle Help File Builder"(简称SHFB)是一个很好用.NET 帮助文档生成工具. 但它的每个版本支持的VS版本范 ...