【NLP】Recurrent Neural Network and Language Models

0. Overview

What is language models?

A time series prediction problem.

It assigns a probility to a sequence of words,and the total prob of all the sequence equal one.

Many Natural Language Processing can be structured as (conditional) language modelling.

Such as Translation:

P(certain Chinese text | given English text)

Note that the Prob follows the Bayes Formula.

How to evaluate a Language Model?

Measured with cross entropy.

Three data sets:

1 Penn Treebank: www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz

2 Billion Word Corpus: code.google.com/p/1-billion-word-language-modeling-benchmark/

3 WikiText datasets: Pointer Sentinel Mixture Models. Merity et al., arXiv 2016

Overview: Three approaches to build language models:

Count based n-gram models: approximate the history of observed words with just the previous n words.

Neural n-gram models: embed the same ﬁxed n-gram history in a continues space and thus better capture correlations between histories.

Recurrent Neural Networks: we drop the ﬁxed n-gram history and compress the entire history in a ﬁxed length vector, enabling long range correlations to be captured.

1. N-Gram models:

Assumption:

Only previous history matters.

Only k-1 words are included in history

Kth order Markov model

2-gram language model:

The conditioning context, wi−1, is called the history

Estimate Probabilities:

(For example: 3-gram)

(count w1,w2,w3 appearing in the corpus)

Interpolated Back-Off:

That is , sometimes some certain phrase don’t appear in the corpus so the Prob of them is zero. To avoid this situation, we use Interpolated Back-off. That is to say, Interpolate k-gram models(k = n-1、n-2…1) into the n-gram models.

A simpal approach:

Summary for n-gram:

Good: easy to train. Fast.

Bad: Large n-grams are sparse. Hard to capture long dependencies. Cannot capture correlations between similary word distributions. Cannot resolve the word morphological problem.(running – jumping)

2. Neural N-Gram Language Models

Use A feed forward network like:

Trigram(3-gram) Neural Network Language Model for example:

Wi are hot-vectors. Pi are distributions. And shape is |V|(words in the vocabulary)

(a sampal:detail cal graph)

Define the loss：cross entopy:

Training: use Gradient Descent

And a sampal of taining:

Comparsion with Count based n-gram LMs:

Good: Better performance on unseen n-grams But poorer on seen n-grams.(Sol: direct(linear) n-gram fertures). Use smaller memory than Counted based n-gram.

Bad: The number of parameters in the models scales with n-gram size. There is a limit on the longest dependencies that an be captured.

3. Recurrent Neural Network LM

That is to say, using a recurrent neural network to build our LM.

Model and Train:

Algorithm: Back Propagation Through Time(BPTT)

Note:

Note that, the Gradient Descent depend heavily. So the improved algorithm is:

Algorithm: Truncated Back Propagation Through Time.(TBPTT)

So the Cal graph looks like this:

So the Training process and Gradient Descent:

Summary of the Recurrent NN LMs:

Good:

RNNs can represent unbounded dependencies, unlike models with a ﬁxed n-gram order.

RNNs compress histories of words into a ﬁxed size hidden vector.

The number of parameters does not grow with the length of dependencies captured, but they do grow with the amount of information stored in the hidden layer.

Bad:

RNNs are hard to learn and often will not discover long range dependencies present in the data(So we learn LSTM unit).

Increasing the size of the hidden layer, and thus memory, increases the computation and memory quadratically.

Mostly trained with Maximum Likelihood based objectives which do not encode the expected frequencies of words a priori.

Some blogs recommended:

Andrej Karpathy: The Unreasonable Eﬀectiveness of Recurrent Neural Networks karpathy.github.io/2015/05/21/rnn-effectiveness/

Yoav Goldberg: The unreasonable eﬀectiveness of Character-level Language Models nbviewer.jupyter.org/gist/yoavg/d76121dfde2618422139

Stephen Merity: Explaining and illustrating orthogonal initialization for recurrent neural networks. smerity.com/articles/2016/orthogonal_init.html

【NLP】Recurrent Neural Network and Language Models的更多相关文章

pytorch --Rnn语言模型(LSTM，BiLSTM) -- 《Recurrent neural network based language model》
论文通过实现RNN来完成了文本分类. 论文地址:88888888 模型结构图: 原理自行参考论文,code and comment: # -*- coding: utf-8 -*- # @time : ...
Recurrent Neural Network系列1--RNN（循环神经网络）概述
作者:zhbzz2007 出处:http://www.cnblogs.com/zhbzz2007 欢迎转载,也请保留这段声明.谢谢! 本文翻译自 RECURRENT NEURAL NETWORKS T ...
【NLP】自然语言处理：词向量和语言模型
声明: 这是转载自LICSTAR博士的牛文,原文载于此:http://licstar.net/archives/328 这篇博客是我看了半年的论文后,自己对 Deep Learning 在 NLP 领 ...
Recurrent Neural Network Language Modeling Toolkit代码学习
Recurrent Neural Network Language Modeling Toolkit 工具使用点击打开链接本博客地址:http://blog.csdn.net/wangxingin ...
课程五(Sequence Models)，第一周（Recurrent Neural Networks） —— 1.Programming assignments：Building a recurrent neural network - step by step
Building your Recurrent Neural Network - Step by Step Welcome to Course 5's first assignment! In thi ...
Recurrent Neural Network(循环神经网络)
Reference: Alex Graves的[Supervised Sequence Labelling with RecurrentNeural Networks] Alex是RNN最著名变种 ...
Recurrent Neural Network系列2--利用Python，Theano实现RNN
作者:zhbzz2007 出处:http://www.cnblogs.com/zhbzz2007 欢迎转载,也请保留这段声明.谢谢! 本文翻译自 RECURRENT NEURAL NETWORKS T ...
Recurrent Neural Network[survey]
0.引言我们发现传统的(如前向网络等)非循环的NN都是假设样本之间无依赖关系(至少时间和顺序上是无依赖关系),而许多学习任务却都涉及到处理序列数据,如image captioning,speech ...
(zhuan) Recurrent Neural Network
Recurrent Neural Network 2016年07月01日 Deep learning Deep learning 字数:24235 this blog from: http:/ ...

随机推荐

WPF之TextBox和PasswordBox水印效果
在博客园里看到了好多关于文本框和密码框水印效果的文章,今天有空也来实现一把,最终效果图如下: 文本框的话,稍微好一点直接可以绑定它的Text属性,因为他是个依赖属性,我用了二种方式来实现水印效果:触发 ...
centos 7 java1.8安装
java安装检查版本信息,如果版本小于1.8,执行以下命令 java -version java version "1.8.0_144"Java(TM) SE Runtime E ...
c++继承实例
#include <iostream> #include <vector> #include <string> using namespace std; class ...
Python—sys模块介绍
sys.argv 命令行参数List,第一个元素是程序本身路径 sys.exit(n) 退出程序,正常退出时exit(0) sys.version 获取Python解释程序的版本信息 sys.maxi ...
解决object at 0x01DB75F0
python在学习过程中吗,由于常常会出现代码运行没报错,但输出的却不是我们想要的结果(图表,列表等等),而出现类似 <filter object at 0x01DB75F0>的情况,比如 ...
org.apache.ibatis.binding.BindingException: Invalid bound statement (not found): com.bw.mapper.BillMapper.getBillList at org.apache.ibatis.binding.MapperMethod$SqlCommand.<init>(MapperMethod.java:225
这个错误是没有找到映射文件 org.apache.ibatis.binding.BindingException: Invalid bound statement (not found): com.b ...
上传图片（photoClip）
首先我们需要引入4个js包(这4个包总共106.6KB) <script src="__STATIC__/hammer.min.js" ></script> ...
mysql5.7 的 user表的密码字段从 password 变成了 authentication_string
来源: http://www.zhimengzhe.com/shujuku/other/267631.html 感觉还是挺坑的自己没了解清楚就动手转帖一下 mark 一下. 1.首先停止正在运行 ...
MyBatis映射文件6
之前说了由Employee找Department,这一节讲一讲由Department找Employee,显然前者是多对一的关系,而后者是一对多的关系. Department的JavaBean: pri ...
If you want an embedded database (H2, HSQL or Derby), please put it on the classpath.
学习Spring Boot 过程中遇到了下列这个问题 Description: Failed to configure a DataSource: 'url' attribute is not spe ...

【NLP】Recurrent Neural Network and Language Models

【NLP】Recurrent Neural Network and Language Models的更多相关文章

随机推荐

热门专题