Sequence models are central to NLP: they are models where there is some sort of dependence through time between your inputs. The classical example of a sequence model is the Hidden Markov Model for part-of-speech tagging. Another example is the conditional random field.

LSTM’s in Pytorch

Pytorch’s LSTM expects all of its inputs to be 3D tensors. The semantics of the axes of these tensors is important. The first axis is the sequence itself, the second indexes instances in the mini-batch, and the third indexes elements of the input. We haven’t discussed mini-batching, so lets just ignore that and assume we will always have just 1 dimension on the second axis. If we want to run the sequence model over the sentence “The cow jumped”, our input should look like

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim torch.manual_seed(1) lstm=nn.LSTM(3,3) #Input dim is 3, output dim is 3
inputs = [torch.randn(1, 3) for _ in range(5)] # make a sequence of length 5 # initialize the hidden state.
hidden = (torch.randn(1, 1, 3),
torch.randn(1, 1, 3)) for i in inputs:
out,hidden=lstm(i.view(1,1,-1),hidden) inputs=torch.cat(inputs).view(len(inputs),1,-1)
hidden=(torch.randn(1,1,3),torch.randn(1,1,3))
out,hidden=lstm(inputs,hidden)
print(out)
print(hidden)

Example: An LSTM for Part-of-Speech Tagging

Prepare data:

def prepare_sequence(seq, to_ix):
idxs = [to_ix[w] for w in seq]
return torch.tensor(idxs, dtype=torch.long) training_data = [
("The dog ate the apple".split(), ["DET", "NN", "V", "DET", "NN"]),
("Everybody read that book".split(), ["NN", "V", "DET", "NN"])
]
word_to_ix = {}
for sent, tags in training_data:
for word in sent:
if word not in word_to_ix:
word_to_ix[word] = len(word_to_ix)
print(word_to_ix)
tag_to_ix = {"DET": 0, "NN": 1, "V": 2} # These will usually be more like 32 or 64 dimensional.
# We will keep them small, so we can see how the weights change as we train.
EMBEDDING_DIM = 6
HIDDEN_DIM = 6

Create the model:

class LSTMTagger(nn.Module):

    def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size):
super(LSTMTagger, self).__init__()
self.hidden_dim = hidden_dim self.word_embeddings = nn.Embedding(vocab_size, embedding_dim) # The LSTM takes word embeddings as inputs, and outputs hidden states
# with dimensionality hidden_dim.
self.lstm = nn.LSTM(embedding_dim, hidden_dim) # The linear layer that maps from hidden state space to tag space
self.hidden2tag = nn.Linear(hidden_dim, tagset_size)
self.hidden = self.init_hidden() def init_hidden(self):
# Before we've done anything, we dont have any hidden state.
# Refer to the Pytorch documentation to see exactly
# why they have this dimensionality.
# The axes semantics are (num_layers, minibatch_size, hidden_dim)
return (torch.zeros(1, 1, self.hidden_dim),
torch.zeros(1, 1, self.hidden_dim)) def forward(self, sentence):
embeds = self.word_embeddings(sentence)
lstm_out, self.hidden = self.lstm(
embeds.view(len(sentence), 1, -1), self.hidden)
tag_space = self.hidden2tag(lstm_out.view(len(sentence), -1))
tag_scores = F.log_softmax(tag_space, dim=1)
return tag_scores

Train the model:

model = LSTMTagger(EMBEDDING_DIM, HIDDEN_DIM, len(word_to_ix), len(tag_to_ix))
loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1) # See what the scores are before training
# Note that element i,j of the output is the score for tag j for word i.
# Here we don't need to train, so the code is wrapped in torch.no_grad()
with torch.no_grad():
inputs = prepare_sequence(training_data[0][0], word_to_ix)
tag_scores = model(inputs)
print(tag_scores) for epoch in range(300): # again, normally you would NOT do 300 epochs, it is toy data
for sentence, tags in training_data:
# Step 1. Remember that Pytorch accumulates gradients.
# We need to clear them out before each instance
model.zero_grad() # Also, we need to clear out the hidden state of the LSTM,
# detaching it from its history on the last instance.
model.hidden = model.init_hidden() # Step 2. Get our inputs ready for the network, that is, turn them into
# Tensors of word indices.
sentence_in = prepare_sequence(sentence, word_to_ix)
targets = prepare_sequence(tags, tag_to_ix) # Step 3. Run our forward pass.
tag_scores = model(sentence_in) # Step 4. Compute the loss, gradients, and update the parameters by
# calling optimizer.step()
loss = loss_function(tag_scores, targets)
loss.backward()
optimizer.step() # See what the scores are after training
with torch.no_grad():
inputs = prepare_sequence(training_data[0][0], word_to_ix)
tag_scores = model(inputs) # The sentence is "the dog ate the apple". i,j corresponds to score for tag j
# for word i. The predicted tag is the maximum scoring tag.
# Here, we can see the predicted sequence below is 0 1 2 0 1
# since 0 is index of the maximum value of row 1,
# 1 is the index of maximum value of row 2, etc.
# Which is DET NOUN VERB DET NOUN, the correct sequence!
print(tag_scores)

Sequence Models and Long-Short Term Memory Networks的更多相关文章

  1. LSTM学习—Long Short Term Memory networks

    原文链接:https://colah.github.io/posts/2015-08-Understanding-LSTMs/ Understanding LSTM Networks Recurren ...

  2. LSTM(Long Short Term Memory)

    长时依赖是这样的一个问题,当预测点与依赖的相关信息距离比较远的时候,就难以学到该相关信息.例如在句子”我出生在法国,……,我会说法语“中,若要预测末尾”法语“,我们需要用到上下文”法国“.理论上,递归 ...

  3. [C5W1] Sequence Models - Recurrent Neural Networks

    第一周 循环序列模型(Recurrent Neural Networks) 为什么选择序列模型?(Why Sequence Models?) 在本课程中你将学会序列模型,它是深度学习中最令人激动的内容 ...

  4. Sequence Models

    Sequence Models This is the fifth and final course of the deep learning specialization at Coursera w ...

  5. [C7] Andrew Ng - Sequence Models

    About this Course This course will teach you how to build models for natural language, audio, and ot ...

  6. Sequence Models 笔记(一)

    1 Recurrent Neural Networks(循环神经网络) 1.1 序列数据 输入或输出其中一个或两个是序列构成.例如语音识别,自然语言处理,音乐生成,感觉分类,dna序列,机器翻译,视频 ...

  7. 《Sequence Models》课堂笔记

    Lesson 5 Sequence Models 这篇文章其实是 Coursera 上吴恩达老师的深度学习专业课程的第五门课程的课程笔记. 参考了其他人的笔记继续归纳的. 符号定义 假如我们想要建立一 ...

  8. 吴恩达《深度学习》-第五门课 序列模型(Sequence Models)-第一周 循环序列模型(Recurrent Neural Networks) -课程笔记

    第一周 循环序列模型(Recurrent Neural Networks) 1.1 为什么选择序列模型?(Why Sequence Models?) 1.2 数学符号(Notation) 这个输入数据 ...

  9. 课程五(Sequence Models),第三周(Sequence models & Attention mechanism) —— 1.Programming assignments:Neural Machine Translation with Attention

    Neural Machine Translation Welcome to your first programming assignment for this week! You will buil ...

随机推荐

  1. 一大波Java来袭(四)String类、StringBuilder类、StringBuffer类对照

    本文主要介绍String类.StringBuffer类.StringBuilder类的差别  : 一.概述 (一)String 字符串常量.可是它具有不可变性,就是一旦创建,对它进行的不论什么改动操作 ...

  2. android获取和展示音乐的频谱

    做了个音乐播放器 就一直想做个加一个音乐频谱的展示界面 觉的这是一个好玩的东西,可以将耳边动听的声音形象化,仿佛眼前可以看到声音一样. 但是我在文档的开发者指南里没有讲任何有关音乐频谱的东西,最后还是 ...

  3. js进阶 9-14 js如何实现下拉列表多选移除

    js进阶 9-14 js如何实现下拉列表多选移除 一.总结 一句话总结: 1.js如何实现下拉列表多选移除? 把这个下拉列表中的option移除,然后加到另外一个下拉列表(文字)中去.remove方法 ...

  4. erlang数字转字符串

    http://fengmm521.blog.163.com/blog/static/2509135820147922355273/ 如果有一个数字,你想要转换成字符串这个在Erlang中是怎么操作的, ...

  5. 【codeforces 768C】Jon Snow and his Favourite Number

    [题目链接]:http://codeforces.com/contest/768/problem/C [题意] 给你n个数字; 让你每次把这n个数字排序; 然后对奇数位的数字进行异或操作,然后对新生成 ...

  6. tomcat7,8 centos7 配置apr极好教程

    转自:http://blog.csdn.net/remote_roamer/article/details/51719891 第一次我自己是用的yum安装apr, apr-utils, tomcat- ...

  7. 【27.40%】【codeforces 599D】Spongebob and Squares

    time limit per test2 seconds memory limit per test256 megabytes inputstandard input outputstandard o ...

  8. MapReduce wordcount 输入路径为目录 java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Ljava/lang/String;)Lorg/apache/hadoop/io/nativeio/NativeIO$POSIX$Stat;

    之前windows下执行wordcount都正常,今天执行的时候指定的输入路径是文件夹,然后就报了如题的错误,把输入路径改成文件后是正常的,也就是说目前的wordcount无法对多个文件操作 报的异常 ...

  9. React父子组件的一个混淆点

    反正我自己是混淆了,React父子组件和组件类的继承弄混在一起了.这两个东西完全是不相关的. 父子组件可以看成两个组件标签的包含关系,在另外一个组件标签的内部就是子组件,父子组件通过这种关系通信. 组 ...

  10. 从Client应用场景介绍IdentityServer4(四)

    原文:从Client应用场景介绍IdentityServer4(四) 上节以对话形式,大概说了几种客户端授权模式的原理,这节重点介绍Hybrid模式在MVC下的使用.且为实现IdentityServe ...