Paper Reading - Convolutional Sequence to Sequence Learning ( CoRR 2017 ) ★
Link of the Paper: https://arxiv.org/abs/1705.03122
Motivation:
- Compared to recurrent layers, convolutions create representations for fixed size contexts, however, the effective context size of the network can easily be made larger by stacking several layers on top of each other. This allows to precisely control the maximum length of dependencies to be modeled. Convolutional networks do not depend on the computations of the previous time step and therefore allow parallelization over every element in a sequence. This contrasts with RNNs which maintain a hidden state of the entire past that prevents parallel computation within a sequence.
 - Multi-layer convolutional neural networks create hierarchical representations over the input sequence in which nearby input elements interact at lower layers while distant elements interact at higher layers. Hierarchical structure provides a shorter path to capture long-range dependencies compared to the chain structure modeled by recurrent networks. Inputs to a convolutional network are fed through a constant number of kernels and non-linearities, whereas recurrent networks apply up to n operations and non-linearities to the first word and only a single set of operations to the last word. Fixing the number of nonlinearities applied to the inputs also eases learning.
 
Innotation:
- An architecture for Seq2Seq modeling based entirely on convolutional neural networks. Both encoder and decoder networks share a simple block structure that computes intermediate states based on a fixed number of input elements. Each block contains a one dimensional convolution followed by a non-linearity. For a decoder network with a single block and kernel width k, each resulting state hi1 contains information over k input elements. Stacking several blocks on top of each other increases the number of input elements represented in a state. ( Stacking is similar to the pooling process. )
 - Position Embeddings: Input elements x = (x1, . . . , xm) embedded in distributional space as w = (w1, . . . , wm), where wj ∈ Rf is a column in an embedding matrix D ∈ RV×f. The authors also equip the model with a sense of order by embedding the absolute position of input elements p = (p1, . . . , pm) where pj ∈ Rf. Both are combined to obtain input element representations e = (w1+p1, . . . , wm+pm). Position embeddings are useful in the architecture since they give the model a sense of which portion of the sequence in the input or output it is currently dealing with.
 - The authors introduce a separate attention mechanism for each decoder layer.
 

Improvement:
- The model is equipped with gated linear units ( Language modeling with gated linear units - Dauphin et al., arXiv 2016 ) and residual connections ( Deep Residual Learning for Image Recognition - He et al., CVPR 2015a ).
- The authors choose gated linear units as non-linearity which implement a simple gating mechanism over the output of the convolution Y = [A B] ∈ R2d: v([A B]) = A ⓧ σ(B), where A, B ∈ Rd are the inputs to the non-linearity, ⓧ is the point-wise multiplication and the output v([A B]) ∈ Rd is half the size of Y. The gates σ(B) control which inputs A of the current context are relevant. And GLUs perform better than tanh in the context of language modelling.
 - To enable deep convolutional networks, authors add residual connections from the input of each convolution to the output of the block. hil = v( Wl [hi−k/2l−1, . . . , hi+k/2l−1] + bwl ) + hil−1
 
 - For encoder networks authors ensure that the output of the convolutional layers matches the input length by padding the input at each layer. However, for decoder networks they have to take care that no future information is available to the decoder. Specifically, we pad the input by k − 1 elements on both the left and right side by zero vectors, and then remove k elements from the end of the convolution output.
 
General Points:
- Sequence to sequence modeling has been synonymous with recurrent neural network based encoder-decoder architectures. The encoder RNN processes an input sequence x = (x1, . . . , xm) of m elements and returns state representations z = (z1, . . . , zm). The decoder RNN takes z and generates the output sequence y = (y1, . . . , yn) left to right, one element at a time. To generate output yi+1, the decoder computes a new hidden state hi+1 based on the previous state hi, an embedding gi of the previous target language word yi, as well as a conditional input ci derived from the encoder output z. Models without attention consider only the final encoder state zm by setting ci = zm for all i, or simply initialize the first decoder state with zm, in which case ci is not used. Architectures with attention compute ci as a weighted sum of (z1, . . . , zm) at each time step. The weights of the sum are referred to as attention scores and allow the network to focus on different parts of the input sequence as it generates the output sequences. Attention scores are computed by essentially comparing each encoder state zj to a combination of the previous decoder state hi and the last prediction yi; the result is normalized to be a distribution over input elements.
 
Paper Reading - Convolutional Sequence to Sequence Learning ( CoRR 2017 ) ★的更多相关文章
- Paper Reading - Convolutional Image Captioning ( CVPR 2018 )
		
Link of the Paper: https://arxiv.org/abs/1711.09151 Motivation: LSTM units are complex and inherentl ...
 - [paper reading] C-MIL: Continuation Multiple Instance Learning for Weakly Supervised Object Detection CVPR2019
		
MIL陷入局部最优,检测到局部,无法完整的检测到物体.将instance划分为空间相关和类别相关的子集.在这些子集中定义一系列平滑的损失近似代替原损失函数,优化这些平滑损失. C-MIL learns ...
 - Paper Reading - Attention Is All You Need ( NIPS 2017 )    ★
		
Link of the Paper: https://arxiv.org/abs/1706.03762 Motivation: The inherently sequential nature of ...
 - Convolutional Sequence to Sequence Learning 论文笔记
		
目录 简介 模型结构 Position Embeddings GLU or GRU Convolutional Block Structure Multi-step Attention Normali ...
 - PP: Sequence to sequence learning with neural networks
		
From google institution; 1. Before this, DNN cannot be used to map sequences to sequences. In this p ...
 - 【论文阅读】Sequence to Sequence Learning with Neural Network
		
Sequence to Sequence Learning with NN <基于神经网络的序列到序列学习>原文google scholar下载. @author: Ilya Sutske ...
 - Mol Cell Proteomics. | Prediction of LC-MS/MS properties of peptides from sequence by deep learning (通过深度学习技术根据肽段序列预测其LC-MS/MS谱特征) (解读人:梅占龙)
		
通过深度学习技术根据肽段序列预测其LC-MS/MS谱特征 解读人:梅占龙 质谱平台 文献名:Prediction of LC-MS/MS properties of peptides from se ...
 - Paper Read: Convolutional Image Captioning
		
Convolutional Image Captioning 2018-11-04 20:42:07 Paper: http://openaccess.thecvf.com/content_cvpr_ ...
 - A neural chatbot using sequence to sequence model with attentional decoder. This is a fully functional chatbot.
		
原项目链接:https://github.com/chiphuyen/stanford-tensorflow-tutorials/tree/master/assignments/chatbot 一个使 ...
 
随机推荐
- Java Activiti 流程审批 后台框架源码 springmvc SSM  工作流引擎
			
即时通讯:支持好友,群组,发图片.文件,消息声音提醒,离线消息,保留聊天记录 工作流模块-------------------------------------------------------- ...
 - 分布式架构学习-Consul集群配置
			
简介 之前公司用的是Consul进行服务发现以及服务管理,自己一直以来只是用一下,但是没有具体的深入,觉得学习不可以这样,所以稍微研究了一下. 网上有很多关于Consul的介绍和对比,我这里也不献丑了 ...
 - Centos 批量分发脚本
			
## Centos / ## #!/bin/sh file="$1" remotedir="$2" filename=$(|awk -F '/' '{print ...
 - Vue  子组件接收父组件的值
			
1.父组件 <template> <div id="rightmenu8"> <rightmenu7 ref="rightmenu7&quo ...
 - vue-cli 脚手架中 webpack 配置基础文件详解
			
一.前言 vue-cli是构建vue单页应用的脚手架,输入一串指定的命令行从而自动生成vue.js+wepack的项目模板.这其中webpack发挥了很大的作用,它使得我们的代码模块化,引入一些插件帮 ...
 - 网页中的图像<img>
			
插入图像 img标记的属性及描述 属性 值 描述 alt text 定义有关图形的短描述 src URL 要显示图像的URL height pixels% 定义图像的高度 width pixels% ...
 - 手动封装一个属于自己的AJAX类库
			
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8&quo ...
 - ASP.NET 并发控制
			
当多个用户试图同时修改数据时,需要建立控制机制来防止一个用户的修改对同时操作的其他用户所作的修改产生不利的影响.处理这种情况的系统叫做“并发控制”. 并发控制的类型 通常,管理数据库中的并发有三种常见 ...
 - Hadoop学习总结之Map-Reduce的过程解析
			
一.客户端 Map-Reduce的过程首先是由客户端提交一个任务开始的. 提交任务主要是通过JobClient.runJob(JobConf)静态函数实现的: public static Runnin ...
 - PhoneGap3.2安装步骤
			
1.首选安装好JDK.Android SDK.Ant 配置如下: 系统环境变量 ANDROID_HOME Value: C:\Development\adt-bundle\ ...