Word Embeddings: Encoding Lexical Semantics

Word Embeddings in Pytorch

import torch

import torch.nn as nn

import torch.nn.functional as F

import torch.optim as optim

torch.manual_seed(1)

word_to_ix = {"hello": 0, "world": 1}

embeds = nn.Embedding(2, 5)  # 2 words in vocab, 5 dimensional embeddings

lookup_tensor = torch.tensor([word_to_ix["hello"]], dtype=torch.long)

hello_embed = embeds(lookup_tensor)

print(hello_embed)

Out:

tensor([[ 0.6614,  0.2669,  0.0617,  0.6213, -0.4519]],

       grad_fn=<EmbeddingBackward>)

An Example: N-Gram Language Modeling

CONTEXT_SIZE = 2

EMBEDDING_DIM = 10

# We will use Shakespeare Sonnet 2

test_sentence = """When forty winters shall besiege thy brow,

And dig deep trenches in thy beauty's field,

Thy youth's proud livery so gazed on now,

Will be a totter'd weed of small worth held:

Then being asked, where all thy beauty lies,

Where all the treasure of thy lusty days;

To say, within thine own deep sunken eyes,

Were an all-eating shame, and thriftless praise.

How much more praise deserv'd thy beauty's use,

If thou couldst answer 'This fair child of mine

Shall sum my count, and make my old excuse,'

Proving his beauty by succession thine!

This were to be new made when thou art old,

And see thy blood warm when thou feel'st it cold.""".split()

# we should tokenize the input, but we will ignore that for now

# build a list of tuples.  Each tuple is ([ word_i-2, word_i-1 ], target word)

trigrams = [([test_sentence[i], test_sentence[i + 1]], test_sentence[i + 2])

            for i in range(len(test_sentence) - 2)]

vocab = set(test_sentence) #the element in set is distinct

word_to_ix = {word: i for i, word in enumerate(vocab)}

class NGramLanguageModeler(nn.Module):

    def __init__(self, vocab_size, embedding_dim, context_size):

        super(NGramLanguageModeler, self).__init__()

        self.embeddings = nn.Embedding(vocab_size, embedding_dim)

        self.linear1 = nn.Linear(context_size * embedding_dim, 128)

        self.linear2 = nn.Linear(128, vocab_size)

    def forward(self, inputs):

        embeds = self.embeddings(inputs).view((1, -1))

        out = F.relu(self.linear1(embeds))

        out = self.linear2(out)

        log_probs = F.log_softmax(out, dim=1)

        return log_probs

losses = []

loss_function = nn.NLLLoss()

model = NGramLanguageModeler(len(vocab), EMBEDDING_DIM, CONTEXT_SIZE)

optimizer = optim.SGD(model.parameters(), lr=0.001)

for epoch in range(10):

    total_loss = 0

    for context, target in trigrams:

        context_idxs = torch.tensor([word_to_ix[w] for w in context], dtype=torch.long)

        model.zero_grad()

        log_probs = model(context_idxs)

        loss = loss_function(log_probs, torch.tensor([word_to_ix[target]], dtype=torch.long))

        loss.backward()

        optimizer.step()

        total_loss += loss.item()

    losses.append(total_loss)

print(losses)

Exercise: Computing Word Embeddings: Continuous Bag-of-Words

CONTEXT_SIZE=2

raw_text= """We are about to study the idea of a computational process.

Computational processes are abstract beings that inhabit computers.

As they evolve, processes manipulate other abstract things called data.

The evolution of a process is directed by a pattern of rules

called a program. People create programs to direct processes. In effect,

we conjure the spirits of the computer with our spells.""".split()

# By deriving a set from `raw_text`, we deduplicate the array

vocab = set(raw_text)

vocab_size = len(vocab)

word_to_ix={word:i for i,word in enumerate(vocab)}

data=[]

for i in range(2,len(raw_text)-2):

    context=[raw_text[i-2],raw_text[i-1],raw_text[i+1],raw_text[i+2]]

    target=raw_text[i]

    data.append((context,target))

print(data[:5])

class CBOW(nn.Module):

    def __init__(self):

        pass

    def forward(self,inputs):

        pass

def make_context_vector(context,word_to_ix):

    idxs=[word_to_ix[w] for w in context]

    return torch.tensor(idxs,dtype=torch.long)

make_context_vector(data[0][0],word_to_ix)

Word Embeddings: Encoding Lexical Semantics的更多相关文章

Word Embeddings: Encoding Lexical Semantics（译文）
词向量:编码词汇级别的信息 url:http://pytorch.org/tutorials/beginner/nlp/word_embeddings_tutorial.html?highlight= ...
[C5W2] Sequence Models - Natural Language Processing and Word Embeddings
第二周自然语言处理与词嵌入(Natural Language Processing and Word Embeddings) 词汇表征(Word Representation) 上周我们学习了 RN ...
deeplearning.ai 序列模型 Week 2 NLP & Word Embeddings
1. Word representation One-hot representation的缺点:把每个单词独立对待,导致对相关词的泛化能力不强.比如训练出“I want a glass of ora ...
翻译 | Improving Distributional Similarity with Lessons Learned from Word Embeddings
翻译 | Improving Distributional Similarity with Lessons Learned from Word Embeddings 叶娜老师说:"读懂论文的 ...
论文阅读笔记 Word Embeddings A Survey
论文阅读笔记 Word Embeddings A Survey 收获 Word Embedding 的定义 dense, distributed, fixed-length word vectors, ...
课程五(Sequence Models)，第二周（Natural Language Processing & Word Embeddings） —— 1.Programming assignments：Operations on word vectors - Debiasing
Operations on word vectors Welcome to your first assignment of this week! Because word embeddings ar ...
[IR] Word Embeddings
From: https://www.youtube.com/watch?v=pw187aaz49o Ref: http://blog.csdn.net/abcjennifer/article/deta ...
Word Embeddings
能够充分意识到W的这些属性不过是副产品而已是很重要的.我们没有尝试着让相似的词离得近.我们没想把类比编码进不同的向量里.我们想做的不过是一个简单的任务,比如预测一个句子是不是成立的.这些属性大概也就是 ...
Papers of Word Embeddings
首先解释一下什么叫做embedding.举个例子:地图就是对于现实地理的embedding,现实的地理地形的信息其实远远超过三维但是地图通过颜色和等高线等来最大化表现现实的地理信息. embeddi ...

随机推荐

JavaMail| JavaMail配置属性
属性名含义 mail.smtp.user SMTP的缺省用户名. mail.smtp.host 要连接的SMTP服务器. mail.smtp.port 要连接的SMTP服务器的端口号,如果conne ...
Android Material Design 5.0 PickerDialog
5.0系统下的时间选择器效果图: 该项目兼容到3.0下面所以用第三方开源项目:actionbarsherlock,动画效果兼容:nineoldandroids-2.4.0.jar,格式转换器:joda ...
［读书笔记］《Android开发艺术探索》第十五章笔记
Android性能优化 Android不可能无限制的使用内存和CPU资源,过多的使用内存会导致内存溢出,即OOM. 而过多的使用CPU资源,通常是指做大量的耗时任务,会导致手机变的卡顿甚至出现程序无法 ...
Oracle 11g对依赖的推断达到字段级
在Oracle 10g下,推断依赖性仅仅达到了对象级.也就是说存储过程訪问的对象一旦发生了变化.那么Oracle就会将存储过程置为INVALID状态.所以在为表做了DDL操作后.须要把存储过程又一次进 ...
_Decoder_Interface_init xxxxxx in amrFileCodec.o
Undefined symbols for architecture arm64: "_Decoder_Interface_init", referenced from: Deco ...
UIApplicationsharedApplication的详解
iPhone应用程序是由主函数main启动,它负责调用UIApplicationMain函数,该函数的形式如下所示: int UIApplicationMain ( int argc, char *a ...
.net程序客户端更新方案
原文:.net程序客户端更新方案客户端程序一个很大的不便的地方就是程序集更新,本文这里简单的介绍一种通用的客户端更新方案.这个方案依赖程序集的动态加载,具体方案如下: 将程序集存储在一个文件数据库中 ...
【t014】拯数
[题目链接]:http://noi.qz5z.com/viewtask.asp?id=t014 [题意] [题解] 这个锁的序列,如果把末尾的0去掉; 然后再倒过来; 那么就是这个序列对应的格雷码了; ...
SQL表的最基本操作练习
use test go --select * from stu2 drop table stu2--删除表 go create table stu2 --新建一个名为stu2表 ( id int pr ...
python request get
import requests from urllib import parse # 返回response resp = requests.get("https://www.baidu.co ...

Word Embeddings: Encoding Lexical Semantics

Word Embeddings: Encoding Lexical Semantics

Word Embeddings in Pytorch

An Example: N-Gram Language Modeling

Exercise: Computing Word Embeddings: Continuous Bag-of-Words

Word Embeddings: Encoding Lexical Semantics的更多相关文章

随机推荐

热门专题