深度学总结:skip-gram pytorch实现
文章目录
skip-gram pytorch 朴素实现
网络结构
训练过程:使用nn.NLLLoss()
batch的准备,为unsupervised,准备数据获取(center,contex)的pair:
采样时的优化:Subsampling降低高频词的概率
skip-gram 进阶:negative sampling
一般都是针对计算效率优化的方法:negative sampling和hierachical softmax
negative sampling实现:
negative sampling原理:
negative sampling抽样方法:
negative sampling前向传递过程:
negative sampling训练过程:
skip-gram pytorch 朴素实现
网络结构
class SkipGram(nn.Module):
def __init__(self, n_vocab, n_embed):
super().__init__()
self.embed = nn.Embedding(n_vocab, n_embed)
self.output = nn.Linear(n_embed, n_vocab)
self.log_softmax = nn.LogSoftmax(dim=1)
def forward(self, x):
x = self.embed(x)
scores = self.output(x)
log_ps = self.log_softmax(scores)
return log_ps
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
训练过程:使用nn.NLLLoss()
# check if GPU is available
device = 'cuda' if torch.cuda.is_available() else 'cpu'
embedding_dim=300 # you can change, if you want
model = SkipGram(len(vocab_to_int), embedding_dim).to(device)
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=0.003)
print_every = 500
steps = 0
epochs = 5
# train for some number of epochs
for e in range(epochs):
# get input and target batches
for inputs, targets in get_batches(train_words, 512):
steps += 1
inputs, targets = torch.LongTensor(inputs), torch.LongTensor(targets)
inputs, targets = inputs.to(device), targets.to(device)
log_ps = model(inputs)
loss = criterion(log_ps, targets)
optimizer.zero_grad()
loss.backward()
optimizer.step()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
batch的准备,为unsupervised,准备数据获取(center,contex)的pair:
def get_target(words, idx, window_size=5):
''' Get a list of words in a window around an index. '''
R = np.random.randint(1, window_size+1)
start = idx - R if (idx - R) > 0 else 0
stop = idx + R
target_words = words[start:idx] + words[idx+1:stop+1]
return list(target_words)
def get_batches(words, batch_size, window_size=5):
''' Create a generator of word batches as a tuple (inputs, targets) '''
n_batches = len(words)//batch_size
# only full batches
words = words[:n_batches*batch_size]
for idx in range(0, len(words), batch_size):
x, y = [], []
batch = words[idx:idx+batch_size]
for ii in range(len(batch)):
batch_x = batch[ii]
batch_y = get_target(batch, ii, window_size)
y.extend(batch_y)
x.extend([batch_x]*len(batch_y))
yield x, y
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
采样时的优化:Subsampling降低高频词的概率
Words that show up often such as “the”, “of”, and “for” don’t provide much context to the nearby words. If we discard some of them, we can remove some of the noise from our data and in return get faster training and better representations. This process is called subsampling by Mikolov. For each word wi w_iw
i
in the training set, we’ll discard it with probability given by
P(wi)=1−tf(wi)−−−−√ P(w_i) = 1 - \sqrt{\frac{t}{f(w_i)}}
P(w
i
)=1−
f(w
i
)
t
where t tt is a threshold parameter and f(wi) f(w_i)f(w
i
) is the frequency of word wi w_iw
i
in the total dataset.
from collections import Counter
import random
import numpy as np
threshold = 1e-5
word_counts = Counter(int_words)
#print(list(word_counts.items())[0]) # dictionary of int_words, how many times they appear
total_count = len(int_words)
freqs = {word: count/total_count for word, count in word_counts.items()}
p_drop = {word: 1 - np.sqrt(threshold/freqs[word]) for word in word_counts}
# discard some frequent words, according to the subsampling equation
# create a new list of words for training
train_words = [word for word in int_words if random.random() < (1 - p_drop[word])]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
skip-gram 进阶:negative sampling
一般都是针对计算效率优化的方法:negative sampling和hierachical softmax
negative sampling实现:
negative sampling原理:
class NegativeSamplingLoss(nn.Module):
def __init__(self):
super().__init__()
def forward(self, input_vectors, output_vectors, noise_vectors):
batch_size, embed_size = input_vectors.shape
# Input vectors should be a batch of column vectors
input_vectors = input_vectors.view(batch_size, embed_size, 1)
# Output vectors should be a batch of row vectors
output_vectors = output_vectors.view(batch_size, 1, embed_size)
# bmm = batch matrix multiplication
# correct log-sigmoid loss
out_loss = torch.bmm(output_vectors, input_vectors).sigmoid().log()
out_loss = out_loss.squeeze()
# incorrect log-sigmoid loss
noise_loss = torch.bmm(noise_vectors.neg(), input_vectors).sigmoid().log()
noise_loss = noise_loss.squeeze().sum(1) # sum the losses over the sample of noise vectors
# negate and sum correct and noisy log-sigmoid losses
# return average batch loss
return -(out_loss + noise_loss).mean()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
negative sampling抽样方法:
# Get our noise distribution
# Using word frequencies calculated earlier in the notebook
word_freqs = np.array(sorted(freqs.values(), reverse=True))
unigram_dist = word_freqs/word_freqs.sum()
noise_dist = torch.from_numpy(unigram_dist**(0.75)/np.sum(unigram_dist**(0.75)))
1
2
3
4
5
6
7
negative sampling前向传递过程:
class SkipGramNeg(nn.Module):
def __init__(self, n_vocab, n_embed, noise_dist=None):
super().__init__()
self.n_vocab = n_vocab
self.n_embed = n_embed
self.noise_dist = noise_dist
# define embedding layers for input and output words
self.in_embed = nn.Embedding(n_vocab, n_embed)
self.out_embed = nn.Embedding(n_vocab, n_embed)
# Initialize embedding tables with uniform distribution
# I believe this helps with convergence
self.in_embed.weight.data.uniform_(-1, 1)
self.out_embed.weight.data.uniform_(-1, 1)
def forward_input(self, input_words):
input_vectors = self.in_embed(input_words)
return input_vectors
def forward_output(self, output_words):
output_vectors = self.out_embed(output_words)
return output_vectors
def forward_noise(self, batch_size, n_samples):
""" Generate noise vectors with shape (batch_size, n_samples, n_embed)"""
if self.noise_dist is None:
# Sample words uniformly
noise_dist = torch.ones(self.n_vocab)
else:
noise_dist = self.noise_dist
# Sample words from our noise distribution
noise_words = torch.multinomial(noise_dist,
batch_size * n_samples,
replacement=True)
device = "cuda" if model.out_embed.weight.is_cuda else "cpu"
noise_words = noise_words.to(device)
noise_vectors = self.out_embed(noise_words).view(batch_size, n_samples, self.n_embed)
return noise_vectors
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
negative sampling训练过程:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
# Get our noise distribution
# Using word frequencies calculated earlier in the notebook
word_freqs = np.array(sorted(freqs.values(), reverse=True))
unigram_dist = word_freqs/word_freqs.sum()
noise_dist = torch.from_numpy(unigram_dist**(0.75)/np.sum(unigram_dist**(0.75)))
# instantiating the model
embedding_dim = 300
model = SkipGramNeg(len(vocab_to_int), embedding_dim, noise_dist=noise_dist).to(device)
# using the loss that we defined
criterion = NegativeSamplingLoss()
optimizer = optim.Adam(model.parameters(), lr=0.003)
print_every = 1500
steps = 0
epochs = 5
# train for some number of epochs
for e in range(epochs):
# get our input, target batches
for input_words, target_words in get_batches(train_words, 512):
steps += 1
inputs, targets = torch.LongTensor(input_words), torch.LongTensor(target_words)
inputs, targets = inputs.to(device), targets.to(device)
# input, output, and noise vectors
input_vectors = model.forward_input(inputs)
output_vectors = model.forward_output(targets)
noise_vectors = model.forward_noise(inputs.shape[0], 5)
# negative sampling loss
loss = criterion(input_vectors, output_vectors, noise_vectors)
optimizer.zero_grad()
loss.backward()
optimizer.step()
深度学总结:skip-gram pytorch实现的更多相关文章
- pytorch深度学习书、论坛和比赛地址
pytorch深度学习书.论坛和比赛地址 待办 https://zhuanlan.zhihu.com/p/85353963 http://zh.d2l.ai/ https://discuss.gluo ...
- 深度学习识别CIFAR10:pytorch训练LeNet、AlexNet、VGG19实现及比较(三)
版权声明:本文为博主原创文章,欢迎转载,并请注明出处.联系方式:460356155@qq.com VGGNet在2014年ImageNet图像分类任务竞赛中有出色的表现.网络结构如下图所示: 同样的, ...
- 深度学习识别CIFAR10:pytorch训练LeNet、AlexNet、VGG19实现及比较(二)
版权声明:本文为博主原创文章,欢迎转载,并请注明出处.联系方式:460356155@qq.com AlexNet在2012年ImageNet图像分类任务竞赛中获得冠军.网络结构如下图所示: 对CIFA ...
- 深度学习框架Keras与Pytorch对比
对于许多科学家.工程师和开发人员来说,TensorFlow是他们的第一个深度学习框架.TensorFlow 1.0于2017年2月发布,可以说,它对用户不太友好. 在过去的几年里,两个主要的深度学习库 ...
- 深度学*点云语义分割:CVPR2019论文阅读
深度学*点云语义分割:CVPR2019论文阅读 Point Cloud Oversegmentation with Graph-Structured Deep Metric Learning 摘要 本 ...
- 3D点云深度学*
3D点云深度学* 在自动驾驶中关于三维点云的深度学*方法应用.三维场景语义理解的方法以及对应的关键技术介绍. 1. 数据 但是对于3D点云,数据正在迅速增长.大有从2D向3D发展的趋势,比如在open ...
- 深度学习调用TensorFlow、PyTorch等框架
深度学习调用TensorFlow.PyTorch等框架 一.开发目标目标 提供统一接口的库,它可以从C++和Python中的多个框架中运行深度学习模型.欧米诺使研究人员能够在自己选择的框架内轻松建立模 ...
- 动手学深度学习9-多层感知机pytorch
多层感知机 隐藏层 激活函数 小结 多层感知机 之前已经介绍过了线性回归和softmax回归在内的单层神经网络,然后深度学习主要学习多层模型,后续将以多层感知机(multilayer percetro ...
- 【深度学习 01】线性回归+PyTorch实现
1. 线性回归 1.1 线性模型 当输入包含d个特征,预测结果表示为: 记x为样本的特征向量,w为权重向量,上式可表示为: 对于含有n个样本的数据集,可用X来表示n个样本的特征集合,其中行代表样本,列 ...
随机推荐
- 3.11-3.15 HDFS HA
一.背景 1. Hadoop2.0之前,在HDFS集群中NameNode存在单点故障(SPOF).对于只有一个NameNode的集群, 若NameNode机器出现故障,则整个集群将无法使用,直到Nam ...
- E20170510-hm
prototype n. 原型,雏形,蓝本; omit (omitted) vt. 省略; 遗漏; autonomous adj. 自治的; 有自主权的; fold ...
- CodeForces 13A【暴力】
题意: 给你的一个十进制数n,计算对于2~n-1进制下的每个位相加和与数n-2的比值. 思路: n是1000,所以直接暴力一发? #include<cstdio> #include< ...
- 51单片机 HC05蓝牙模块
一.注意事项 1.烧写程序时,要把蓝牙tx,rx信号线拔掉,对烧写程序有影响. 2.执行HC05集命令时,均以\r\n结尾.串口中断若选择“发送新行”时,不用添加\r\n.原理相同. 二.准备软硬件 ...
- bzoj 1385: [Baltic2000]Division expression【脑洞】
加括号再去括号就是除变加,显然尽可能多的除变加是最优的,然后发现唯一不能变成乘数的是第二个数,所以把其他数乘起来mod第二个数,如果是0就是YES,否则说明最后不能除尽,就是NO #include&l ...
- bzoj 3232: 圈地游戏【分数规划+最小割】
数组开小导致TTTTTLE-- 是分数规划,设sm为所有格子价值和,二分出mid之后,用最小割来判断,也就是判断sm-dinic()>=0 这个最小割比较像最大权闭合子图,建图是s像所有点连流量 ...
- 洛谷P3537 [POI2012]SZA-Cloakroom(背包)
传送门 蠢了……还以为背包只能用来维护方案数呢……没想到背包这么神奇…… 我们用$dp[i]$表示当$c$的和为$i$时,所有的方案中使得最小的$b$最大时最小的$b$是多少 然后把所有的点按照$a$ ...
- Python实现两已知排好序的列表合并成一个排好序的列表
#方法0.5--- lst1 = [1, 3, 7, 9, 12] lst2 = [4, 8, 9, 13, 15, 19] def merge(a, b): c = [] h = j = 0 whi ...
- 1. Visio Web 形状 - 无法与 Web 服务器建立连接。请稍后重新进行搜索。处理方式
今天在Visio中使用“搜索形状”,发现不管搜什么,结果都是:Visio Web 形状 - 无法与 Web 服务器建立连接.请稍后重新进行搜索 具体解决方案如下:控制面板=>添加或删除程序=&g ...
- Log4net系列二:Log4net邮件日志以及授权码
Log4net邮件发送 上篇文章我们主要介绍Log4net生成文本格式,本篇文章主要配置邮箱发送.关于项目的引用,搭建我们就不在描述,如果不太清楚,请看上篇文章, 老规矩,我们现在配置文件中添加一个a ...