On Using Very Large Target Vocabulary for Neural Machine Translation Candidate Sampling Sampled Softmax
【softmax分类器的加速器】
https://www.tensorflow.org/api_docs/python/tf/nn/sampled_softmax_loss
This is a faster way to train a softmax classifier over a huge number of classes.
【分类的结果集过大,选取子集】
https://www.tensorflow.org/api_guides/python/nn#Candidate_Sampling
Do you want to train a multiclass or multilabel model with thousands or millions of output classes (for example, a language model with a large vocabulary)? Training with a full Softmax is slow in this case, since all of the classes are evaluated for every training example. Candidate Sampling training algorithms can speed up your step times by only considering a small randomly-chosen subset of contrastive classes (called candidates) for each batch of training examples.
https://www.tensorflow.org/extras/candidate_sampling.pdf
【 compute F(x, y) for every class y ∈ L for every training example----耗时点,这是要解决的问题】
What is Candidate Sampling Say we have a multiclass or multilabel problem where each training example (x , ) consists of i Ti a context xi a small (multi)set of target classes Ti out of a large universe L of possible classes. For example, the problem might be to predicting the next word (or the set of future words) in a sentence given the previous words.
We wish to learn a compatibility function F(x, y) which says something about the compatibility of a class y with a context x . For example the probability of the class given the context.
“Exhaustive” training methods such as softmax and logistic regression require us to compute F(x, y) for every class y ∈ L for every training example. When |L| is very large, this can be prohibitively expensive.
【the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary:子集】
https://arxiv.org/pdf/1412.2007.pdf
Neural machine translation, a recently proposed approach to machine translation based purely on neural networks, has shown promising results compared to the existing approaches such as phrase-based statistical machine translation. Despite its recent success, neural machine translation has its limitation in handling a larger vocabulary, as training complexity as well as decoding complexity increase proportionally to the number of target words. In this paper, we propose a method that allows us to use a very large target vocabulary without increasing training complexity, based on importance sampling. We show that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary. The models trained by the proposed approach are empirically found to outperform the baseline models with a small vocabulary as well as the LSTM-based neural machine translation models. Furthermore, when we use the ensemble of a few models with very large target vocabularies, we achieve the state-of-the-art translation performance (measured by BLEU) on the English->German translation and almost as high performance as state-of-the-art English->French translation system.
On Using Very Large Target Vocabulary for Neural Machine Translation Candidate Sampling Sampled Softmax的更多相关文章
- 课程五(Sequence Models),第三周(Sequence models & Attention mechanism) —— 1.Programming assignments:Neural Machine Translation with Attention
Neural Machine Translation Welcome to your first programming assignment for this week! You will buil ...
- Sequence Models Week 3 Neural Machine Translation
Neural Machine Translation Welcome to your first programming assignment for this week! You will buil ...
- 神经机器翻译 - NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE
论文:NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE 综述 背景及问题 背景: 翻译: 翻译模型学习条件分布 ...
- 对Neural Machine Translation by Jointly Learning to Align and Translate论文的详解
读论文 Neural Machine Translation by Jointly Learning to Align and Translate 这个论文是在NLP中第一个使用attention机制 ...
- Effective Approaches to Attention-based Neural Machine Translation(Global和Local attention)
这篇论文主要是提出了Global attention 和 Local attention 这个论文有一个译文,不过我没细看 Effective Approaches to Attention-base ...
- 【转载 | 翻译】Visualizing A Neural Machine Translation Model(神经机器翻译模型NMT的可视化)
转载并翻译Jay Alammar的一篇博文:Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models Wi ...
- [笔记] encoder-decoder NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE
原文地址 :[1409.0473] Neural Machine Translation by Jointly Learning to Align and Translate (arxiv.org) ...
- Introduction to Neural Machine Translation - part 1
The Noise Channel Model \(p(e)\): the language Model \(p(f|e)\): the translation model where, \(e\): ...
- 论文阅读 | Robust Neural Machine Translation with Doubly Adversarial Inputs
(1)用对抗性的源实例攻击翻译模型; (2)使用对抗性目标输入来保护翻译模型,提高其对对抗性源输入的鲁棒性. 生成对抗输入:基于梯度 (平均损失) -> AdvGen 我们的工作处理由白盒N ...
随机推荐
- 源码分析——迁移学习Inception V3网络重训练实现图片分类
1. 前言 近些年来,随着以卷积神经网络(CNN)为代表的深度学习在图像识别领域的突破,越来越多的图像识别算法不断涌现.在去年,我们初步成功尝试了图像识别在测试领域的应用:将网站样式错乱问题.无线领域 ...
- LoadRunner安装破解
安装过程 1. 运行“setup.exe” 点击安装,其中会有提示缺少“Microsoft Visual C++ 2005 SP1等运行组件”,下载这些组件.这里安装“vcredist_x86.exe ...
- [JLOI2015]有意义的字符串
4002: [JLOI2015]有意义的字符串 Time Limit: 10 Sec Memory Limit: 128 MBSubmit: 1000 Solved: 436[Submit][St ...
- ElasticSearch搜索term和terms的区别
今天同事使用ES查询印地语的文章.发现查询报错,查询语句和错误信息如下: 查询语句:{ "query":{ "bool":{ ...
- es6 - 导入导出
今天用node纠结了半天,明明是正确的语法,一直报错,原来node和chrome并不支持es6语法.... 1. npm install package.json { "name" ...
- javascript 匿名函数和模块化
任何变量,函数,数组,对象,只要不在函数内部,都被认为是全局的,这就是说,这个页面上的其它脚本也可以访问它,而且可以覆盖重写它. 解决办法是,把你的变量放在一个匿名函数内部,定义完之后立即调用它.封装 ...
- Express+Socket.IO 实现简易聊天室
代码地址如下:http://www.demodashi.com/demo/12477.html 闲暇之余研究了一下 Socket.io,搭建了一个简易版的聊天室,如有不对之处还望指正,先上效果图: 首 ...
- Ubuntu安装vncserver实现图形化远程桌面
安装 apt-get update apt-get install vnc4server 开启vnc服务 vncserver 首次启动会要求设置密码,后面可以使用vncpasswd修改: 看到 New ...
- 命令行编译sass
一.安装ruby1.需要的软件设备: 2.安装过程:点击上图“应用程序”安装即可,注意安装过程中其中三项都需要打上勾.如若没有三项都打上勾则需要修改环境变量中的path路径后添加一个分号. 3.打开c ...
- dm8148 videoM3 link源代码解析
样例:从A8送一帧jpeg图片到videoM3解码,然后在将解码的数据传递到A8, 这个流程涉及的link源代码例如以下: dm8148 link之间数据传递 1)在A8上调用IpcBitsOutLi ...