【NLP】Conditional Language Models
Language Model estimates the probs that the sequences of words can be a sentence said by a human. Training it, we can get the embeddings of the whole vocabulary.
UnConditional Language Model just assigns probs to sequences of words. That’s to say, given the first n-1 words and to predict the probs of the next word.(learn the prob distribution of next word).
Beacuse of the probs chain rule, we only train this:
Conditional LMs
A conditional language model assigns probabilities to sequences of words, W =(w1,w2,…,wt) , given some conditioning context x.
For example, in the translation task, we must given the orininal sentence and its translation. The orininal sentence is the conditioning context, and by using it, we predict the objection sentence.
Data for training conditional LMs:
To train conditional language models, we need paired samples.E.X.
Such task like:Translation, summarisation, caption generation, speech recognition
How to evaluate the conditional LMs?
- Traditional methods: use the cross-entropy or perplexity.(hard to interpret,easy to implement)
- Task-specific evaluation: Compare the model’s most likely output to human-generated expected output . Such as 【BLEU】、METEOR、ROUGE…(okay to interpret,easy to implement)
- Human evaluation: Hard to implement.
Algorithmic challenges:
Given the condition context x, to find the max-probs of the the predict sequence of words, we cannot use the gready search, which might cann’t generate a real sentence.
We use the 【Beam Search】.
We draw attention to the “encoder-decoder” models that learn a function that maps x into a fixed-size vector and then uses a language model to “decode” that vector into a sequence of words,
Model: K&B2013
A simpal of Encoder – just cumsum(very easy)
A simpal of Encoder – CSM Encoder:use CNN to encode
The Decoder – RNN Decoder
The cal graph is.
Sutskever et al. Model (2014):
- Important.Classic Model
Cal Graph:
Some Tricks to Sutskever et al. Model :
- Read the Input Sequence ‘backwards’: +4BLEU
- Use an ensemble of m 【independently trained】 models (at the decode period) :
- Ensemble of 2 models: +3 BLEU
- Ensemble of 5 models: +4.5 BLEU
For example:
- we want to find the most probable (MAP) output given the input,i,e.
We use the beam search : +1BLEU
For example,the beam size is 2:
Example of A Application: Image caption generation
Encoder:CNN
Decoder:RNN or
conditional n-gram LM(different to the RNN but it is useful)
We must have some datasets already.
Kiros et al. Model has done this.
.
【NLP】Conditional Language Models的更多相关文章
- 【NLP】Conditional Language Modeling with Attention
Review: Conditional LMs Note that, in the Encoder part, we reverse the input to the ‘RNN’ and it per ...
- [转]【NLP】干货!Python NLTK结合stanford NLP工具包进行文本处理 阅读目录
[NLP]干货!Python NLTK结合stanford NLP工具包进行文本处理 原贴: https://www.cnblogs.com/baiboy/p/nltk1.html 阅读目录 目 ...
- 【NLP】Tika 文本预处理:抽取各种格式文件内容
Tika常见格式文件抽取内容并做预处理 作者 白宁超 2016年3月30日18:57:08 摘要:本文主要针对自然语言处理(NLP)过程中,重要基础部分抽取文本内容的预处理.首先我们要意识到预处理的重 ...
- 【NLP】前戏:一起走进条件随机场(一)
前戏:一起走进条件随机场 作者:白宁超 2016年8月2日13:59:46 [摘要]:条件随机场用于序列标注,数据分割等自然语言处理中,表现出很好的效果.在中文分词.中文人名识别和歧义消解等任务中都有 ...
- 【NLP】基于自然语言处理角度谈谈CRF(二)
基于自然语言处理角度谈谈CRF 作者:白宁超 2016年8月2日21:25:35 [摘要]:条件随机场用于序列标注,数据分割等自然语言处理中,表现出很好的效果.在中文分词.中文人名识别和歧义消解等任务 ...
- 【NLP】基于机器学习角度谈谈CRF(三)
基于机器学习角度谈谈CRF 作者:白宁超 2016年8月3日08:39:14 [摘要]:条件随机场用于序列标注,数据分割等自然语言处理中,表现出很好的效果.在中文分词.中文人名识别和歧义消解等任务中都 ...
- 【NLP】基于统计学习方法角度谈谈CRF(四)
基于统计学习方法角度谈谈CRF 作者:白宁超 2016年8月2日13:59:46 [摘要]:条件随机场用于序列标注,数据分割等自然语言处理中,表现出很好的效果.在中文分词.中文人名识别和歧义消解等任务 ...
- 【NLP】条件随机场知识扩展延伸(五)
条件随机场知识扩展延伸 作者:白宁超 2016年8月3日19:47:55 [摘要]:条件随机场用于序列标注,数据分割等自然语言处理中,表现出很好的效果.在中文分词.中文人名识别和歧义消解等任务中都有应 ...
- 【NLP】Recurrent Neural Network and Language Models
0. Overview What is language models? A time series prediction problem. It assigns a probility to a s ...
随机推荐
- es6 常用方法
来自 https://www.cnblogs.com/lhl66/p/9555903.html 侵删 来自 https://www.cnblogs.com/lhl66/p/8862106.html 侵 ...
- ArcGIS API for JavaScript 4.2学习笔记[20] 使用缓冲区结合Query对象进行地震点查询【重温异步操作思想】
这个例子相当复杂.我先简单说说这个例子是干啥的. 在UI上,提供了一个下拉框.两个滑动杆,以确定三个参数,使用这三个参数进行空间查询.这个例子就颇带空间查询的意思了. 第一个参数是油井类型,第二个参数 ...
- QQ路径
~/Library/Containers/com.tencent.qq/Data/Library/Application Support/QQ/46*******/Image osx10.10.3在这 ...
- Android Studio集成Flutter
首先Flutter中文网教程地址:https://flutterchina.club/get-started/install/ 1.新建环境变量 变量名:PUB_HOSTED_URL 变量值:http ...
- Checkpoint 和Breakpoint
参考:http://www.cnblogs.com/qiangshu/p/5241699.htmlhttp://www.cnblogs.com/biwork/p/3366724.html 1. Che ...
- 在java web项目中实现随项目启动的额外操作
前言 在web项目中经常会遇到在项目启动初始,会要求做一些逻辑的实现,比如实现一个消息推送服务,实现不同类型数据同步的回调操作初始化,或则通知其他客户服务器本项目即将启动,等等.对于这种要求,目前个人 ...
- bootstrapt 使用遇到问题
1.布局的时候什么时候用xs,sm,md,lg? small grid (≥ 768px) = .col-sm-*, medium grid (≥ 992px) = .col-md-*, large ...
- @FeignClient
@FeignClient("APP-PROVIDER")public interface MyFeignClient { @RequestMapping(value = " ...
- Oracle优化器
本文参照:https://www.cnblogs.com/Dreamer-1/p/6076440.html 读优化器之前建议先读: https://www.cnblogs.com/zhougongji ...
- Re:Exgcd解二元不定方程
模拟又炸了,我死亡 $exgcd$(扩展欧几里德算法)用于求$ax+by=gcd(a,b)$中$x,y$的一组解,它有很多应用,比如解二元不定方程.求逆元等等,这里详细讲解一下$exgcd$的原理. ...















