基于PyTorch的Seq2Seq翻译模型详细注释介绍（一）

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。
本文链接：https://blog.csdn.net/qysh123/article/details/91245246
Seq2Seq是目前主流的深度学习翻译模型，在自然语言翻译，甚至跨模态知识映射方面都有不错的效果。在软件工程方面，近年来也得到了广泛的应用，例如：

Jiang, Siyuan, Ameer Armaly, and Collin McMillan. "Automatically generating commit messages from diffs using neural machine translation." In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, pp. 135-146. IEEE Press, 2017.

Hu, Xing, Ge Li, Xin Xia, David Lo, and Zhi Jin. "Deep code comment generation." In Proceedings of the 26th Conference on Program Comprehension, pp. 200-210. ACM, 2018.

这里我结合PyTorch给出的Seq2Seq的示例代码来简单总结一下这个模型实现时的细节以及PyTorch对应的API。PyTorch在其官网上有Tutorial：https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html，其对应的GitHub链接是：https://github.com/pytorch/tutorials/blob/master/intermediate_source/seq2seq_translation_tutorial.py。这里就以这段代码为例来进行总结：

在上面那个官网的链接中给出了对应数据的下载链接：https://download.pytorch.org/tutorial/data.zip，另外，其实网上很多教程也都是翻译上面这个官方教程的，我也参考了一些，主要包括：

https://www.cnblogs.com/HolyShine/p/9850822.html

https://www.cnblogs.com/www-caiyin-com/p/10123346.html

http://www.pianshen.com/article/5376154542/

所以大家可以以这些教程为基础，我也只是在它们的基础上进行一些补充和解释，所以并不会像上面教程一样给出完整的解释，只是总结一些我觉得重要的内容。首先，初始化编码这些就不总结了，大家看看现有的教程就理解。从Encoder开始总结：

class EncoderRNN(nn.Module):
def __init__(self, input_size, hidden_size):
super(EncoderRNN, self).__init__()#对继承自父类的属性进行初始化。
self.hidden_size = hidden_size

self.embedding = nn.Embedding(input_size, hidden_size)#对输入做初始化Embedding。
self.gru = nn.GRU(hidden_size, hidden_size)#Applies a multilayer gated recurrent unit (GRU) RNN to an input sequence.

def forward(self, input, hidden):
embedded = self.embedding(input).view(1, 1, -1)#view实际上是对现有tensor改造的方法。
output = embedded
output, hidden = self.gru(output, hidden)
return output, hidden

def initHidden(self):
return torch.zeros(1, 1, self.hidden_size, device=device)#初始化，生成(1,1,256)维的全零Tensor。
虽然只有短短几行，可还是有些需要讨论的内容：nn.Embedding是进行初始embedding，当然，这种embedding是完全随机的，并不通过训练或具有实际意义，我觉得网上有些文章连这一点都没搞清楚（例如这里的解释就是错误的：https://my.oschina.net/earnp/blog/1113896），具体可以参看这里的讨论：https://blog.csdn.net/qq_36097393/article/details/88567942。其参数含义可以参考这个解释：nn.Embedding(2, 5)，这里的2表示有2个词，5表示维度为5，其实也就是一个2x5的矩阵，所以如果你有1000个词，每个词希望是100维，你就可以这样建立一个word embedding，nn.Embedding(1000, 100)。也可以运行下面我总结示例代码：

import torch
import torch.nn as nn

word_to_ix={'hello':0, 'world':1}
embeds=nn.Embedding(2,5)
hello_idx=torch.LongTensor([word_to_ix['hello']])
world_idx=torch.LongTensor([word_to_ix['world']])
hello_embed=embeds(hello_idx)
print(hello_embed)
world_embed=embeds(world_idx)
print(world_embed)
具体含义相信大家一看便知，可以试着跑一下（每次print的结果不相同，并且也没啥实际含义）。

另外就是.view(1, 1, -1)的含义，说实话我也没搞清楚过，其实在stackoverflow上已经有人讨论了这个问题：

https://stackoverflow.com/questions/42479902/how-does-the-view-method-work-in-pytorch

大家看看就知，我这里也把上面别人给出的例子提供一下：

import torch
a = torch.range(1, 16)
print(a)
a = a.view(4, 4)
print(a)
Encoder就简单总结这些。下面直接进入到带注意力机制的解码器的总结（为了帮助理解，下面增加了一些注释，说明每一步Tensor的纬度，我个人觉得还是能够便于理解的）：

class AttnDecoderRNN(nn.Module):
def __init__(self, hidden_size, output_size, dropout_p=0.1, max_length=MAX_LENGTH):#MAX_LENGTH在翻译任务中定义为10
super(AttnDecoderRNN, self).__init__()
self.hidden_size = hidden_size
self.output_size = output_size#这里的output_size是output_lang.n_words
self.dropout_p = dropout_p#dropout的比例。
self.max_length = max_length

self.embedding = nn.Embedding(self.output_size, self.hidden_size)
self.attn = nn.Linear(self.hidden_size * 2, self.max_length)
self.attn_combine = nn.Linear(self.hidden_size * 2, self.hidden_size)#按照维度要求，进行线性变换。
self.dropout = nn.Dropout(self.dropout_p)
self.gru = nn.GRU(self.hidden_size, self.hidden_size)
self.out = nn.Linear(self.hidden_size, self.output_size)

def forward(self, input, hidden, encoder_outputs):

print(input)
print('size of input: '+str(input.size()))
print('size of self.embedding(input): '+str(self.embedding(input).size()))

embedded = self.embedding(input).view(1, 1, -1)
print('size of embedded: '+str(embedded.size()))

embedded = self.dropout(embedded)
print('size of embedded[0]: '+str(embedded[0].size()))
print('size of torch.cat((embedded[0], hidden[0]), 1): '+str(torch.cat((embedded[0], hidden[0]), 1).size()))
print('size of self.attn(torch.cat((embedded[0], hidden[0]), 1)): '+str(self.attn(torch.cat((embedded[0], hidden[0]), 1)).size()))

#Size of embedded: [1,1,256]
#Size of embedded[0]: [1,256]
#Size of size of torch.cat((embedded[0], hidden[0]), 1): [1,512]

# 此处相当于学出来了attention的权重
# 需要注意的是torch的concatenate函数是torch.cat，是在已有的维度上拼接，按照代码中的写法，就是在第二个纬度上拼接。
# 而stack是建立一个新的维度，然后再在该纬度上进行拼接。
attn_weights = F.softmax(
self.attn(torch.cat((embedded[0], hidden[0]), 1)), dim=1)#这里的F.softmax表示的是torch.nn.functional.softmax

#Size of attn_weights: [1,10]
#Size of attn_weights.unsqueeze(0): [1,1,10]
#Size of encoder_outputs: [10,256]
#Size of encoder_outputs.unsqueeze(0): [1,10,256]

#unsqueeze的解释是Returns a new tensor with a dimension of size one inserted at the specified position.
attn_applied = torch.bmm(attn_weights.unsqueeze(0),
encoder_outputs.unsqueeze(0))#bmm本质上来讲是个批量的矩阵乘操作。

#Size of attn_applied: [1,1,256]
output = torch.cat((embedded[0], attn_applied[0]), 1)
#Size of output here is: [1,512]
print('size of output (at this location): '+str(output.size()))
output = self.attn_combine(output).unsqueeze(0)
#Size of output here is: [1,1,256]
#print(output)
output = F.relu(output)#rectified linear unit function element-wise:
#print(output)
output, hidden = self.gru(output, hidden)
output = F.log_softmax(self.out(output[0]), dim=1)
print('')
print('------------')
return output, hidden, attn_weights

def initHidden(self):
return torch.zeros(1, 1, self.hidden_size, device=device)
首先是dropout，关于dropout可以首先参考一下PyTorch的官方解释：

https://pytorch.org/docs/stable/nn.html?highlight=nn%20dropout#torch.nn.Dropout

简单来说，就是During training, randomly zeroes some of the elements of the input tensor with probability p using samples from a Bernoulli distribution，有朋友给出了很详细的讨论和解释：

https://blog.csdn.net/stdcoutzyx/article/details/49022443

其次应该注意一下nn.Linear的含义和作用，还是给出官网的解释：Applies a linear transformation to the incoming data，类似地，可以参考一下我下面给出的示例代码：

import torch
import torch.nn as nn
m = nn.Linear(2, 3)
input = torch.randn(2, 2)
print(input)
output = m(input)
print(output)
接下来解释一下torch.bmm。按照PyTorch官网的解释，https://pytorch.org/docs/stable/torch.html?highlight=torch%20bmm#torch.bmm

torch.bmm起的作用是：Performs a batch matrix-matrix product of matrices stored in batch1 and batch2，这样的解释还是太抽象，其实通过一个例子就很好懂了，实际就是一个批量矩阵乘法：

import torch
batch1=torch.randn(2,3,4)
print(batch1)
batch2=torch.randn(2,4,5)
print(batch2)
res=torch.bmm(batch1,batch2)
print(res)
具体的乘法规则是：If batch1 is a (b×n×m) tensor, batch2 is a (b×m×p) tensor, out will be a (b×n×p) tensor.

关于torch.cat，还是以PyTorch官网给出的例子做一个简单说明：

Concatenates the given sequence of seq tensors in the given dimension. 例子如下：

import torch
x=torch.randn(2,3)
print(x)
print(torch.cat((x, x, x), 0))
print(torch.cat((x, x, x), 1))
这里就先总结到这里，会在下一篇博客中继续总结。
————————————————
版权声明：本文为CSDN博主「蛐蛐蛐」的原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/qysh123/article/details/91245246

基于PyTorch的Seq2Seq翻译模型详细注释介绍（一）的更多相关文章

pytorch做seq2seq注意力模型的翻译
以下是对pytorch 1.0版本的seq2seq+注意力模型做法语--英语翻译的理解(这个代码在pytorch0.4上也可以正常跑): # -*- coding: utf-8 -*- " ...
详解Linux2.6内核中基于platform机制的驱动模型 (经典)
[摘要]本文以Linux 2.6.25 内核为例,分析了基于platform总线的驱动模型.首先介绍了Platform总线的基本概念,接着介绍了platform device和platform dri ...
目标检测-基于Pytorch实现Yolov3（1）- 搭建模型
原文地址:https://www.cnblogs.com/jacklu/p/9853599.html 本人前段时间在T厂做了目标检测的项目,对一些目标检测框架也有了一定理解.其中Yolov3速度非常快 ...
实践torch.fx第一篇——基于Pytorch的模型优化量化神器
第一篇--什么是torch.fx 今天聊一下比较重要的torch.fx,也趁着这次机会把之前的torch.fx笔记整理下,笔记大概拆成三份,分别对应三篇: 什么是torch.fx 基于torch.fx ...
[Pytorch]基于混和精度的模型加速
这篇博客是在pytorch中基于apex使用混合精度加速的一个偏工程的描述,原理层面的解释并不是这篇博客的目的,不过在参考部分提供了非常有价值的资料,可以进一步研究. 一个关键原则:“仅仅在权重更新的 ...
目标检测之Faster-RCNN的pytorch代码详解(模型训练篇)
本文所用代码gayhub的地址:https://github.com/chenyuntc/simple-faster-rcnn-pytorch (非本人所写,博文只是解释代码) 好长时间没有发博客了 ...
AAAI 2020论文分享：通过识别和翻译交互打造更优的语音翻译模型
2月初,AAAI 2020在美国纽约拉开了帷幕.本届大会百度共有28篇论文被收录.本文将对其中的机器翻译领域入选论文<Synchronous Speech Recognition and Spe ...
原来你是这样的BERT，i了i了！ —— 超详细BERT介绍（一）BERT主模型的结构及其组件
原来你是这样的BERT,i了i了! -- 超详细BERT介绍(一)BERT主模型的结构及其组件 BERT(Bidirectional Encoder Representations from Tran ...
深度学习教程 | Seq2Seq序列模型和注意力机制
作者:韩信子@ShowMeAI 教程地址:http://www.showmeai.tech/tutorials/35 本文地址:http://www.showmeai.tech/article-det ...

随机推荐

Django项目：CRM(客户关系管理系统)--38--30PerfectCRM实现King_admin编辑自定义字段验证
# kingadmin.py # ————————04PerfectCRM实现King_admin注册功能———————— from crm import models #print("ki ...
洛谷2593 [ZJOI2006]超级麻将——可行性dp
题目:https://www.luogu.org/problemnew/show/P2593 发现三个连续牌的影响范围只有3.相同牌的影响范围只有1之后就可以dp了. O(100^7)T飞. #inc ...
html文档加载顺序简单理解
html <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF- ...
ORACLE的Copy命令和create table,insert into的比较
在数据表间复制数据是Oracle DBA经常面对的任务之一,Oracle为这一任务提供了多种解决方案,SQL*Plus Copy 命令便是其中之一.SQL*Plus Copy 命令通过SQL*Net在 ...
cvc-elt.1: 找不到元素 'beans' 的声明。springmvc netbeans maven
搭建最基本的框架,出现问题,提示cvc-elt.1: 找不到元素 'beans' 的声明. HTTP Status 500 - Servlet.init() for servlet spring th ...
避免SQL注入三慷慨法
版权声明:本文为博主原创文章,未经博主同意不得转载. https://blog.csdn.net/wangyy130/article/details/26154837 要说SQL注入还要从 ...
SVG 动态添加元素与事件
SVG文件是由各个元素组成.元素由标签定义,而标签格式即html的元素定义格式.但是载入一个SVG文件,却无法通过常规的js获取对象方式来获取到SVG中定义的元素,更无法通过这种方式来动态添加SVG元 ...
ucore os 初始化
从bootloader 交出控制权开始 bootloader 最后调用 ((void(*))(void) ()ELF->e_entry& 0xffffff)() ; lab2 虽然e_e ...
XtraBackup构建MySQL主从环境的方法
环境:HE3主库,HE1从库HE1:192.168.1.248HE3:192.168.1.250从库my.cnf加入以下参数并重启数据库:read_only=1log_slave_updates=1( ...
使用Cmder 安装 Composer 出现 "attempt to call a nil value"
原因: 不是这个原因,也不是那个原因,而是采用了中文路径, 把comder 整个搬到其他目录就行了

基于PyTorch的Seq2Seq翻译模型详细注释介绍（一）

基于PyTorch的Seq2Seq翻译模型详细注释介绍（一）的更多相关文章

随机推荐

热门专题