On Using Very Large Target Vocabulary for Neural Machine Translation Candidate Sampling Sampled Softmax
【softmax分类器的加速器】
https://www.tensorflow.org/api_docs/python/tf/nn/sampled_softmax_loss
This is a faster way to train a softmax classifier over a huge number of classes.
【分类的结果集过大,选取子集】
https://www.tensorflow.org/api_guides/python/nn#Candidate_Sampling
Do you want to train a multiclass or multilabel model with thousands or millions of output classes (for example, a language model with a large vocabulary)? Training with a full Softmax is slow in this case, since all of the classes are evaluated for every training example. Candidate Sampling training algorithms can speed up your step times by only considering a small randomly-chosen subset of contrastive classes (called candidates) for each batch of training examples.
https://www.tensorflow.org/extras/candidate_sampling.pdf
【 compute F(x, y) for every class y ∈ L for every training example----耗时点,这是要解决的问题】
What is Candidate Sampling Say we have a multiclass or multilabel problem where each training example (x , ) consists of i Ti a context xi a small (multi)set of target classes Ti out of a large universe L of possible classes. For example, the problem might be to predicting the next word (or the set of future words) in a sentence given the previous words.
We wish to learn a compatibility function F(x, y) which says something about the compatibility of a class y with a context x . For example the probability of the class given the context.
“Exhaustive” training methods such as softmax and logistic regression require us to compute F(x, y) for every class y ∈ L for every training example. When |L| is very large, this can be prohibitively expensive.
【the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary:子集】
https://arxiv.org/pdf/1412.2007.pdf
Neural machine translation, a recently proposed approach to machine translation based purely on neural networks, has shown promising results compared to the existing approaches such as phrase-based statistical machine translation. Despite its recent success, neural machine translation has its limitation in handling a larger vocabulary, as training complexity as well as decoding complexity increase proportionally to the number of target words. In this paper, we propose a method that allows us to use a very large target vocabulary without increasing training complexity, based on importance sampling. We show that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary. The models trained by the proposed approach are empirically found to outperform the baseline models with a small vocabulary as well as the LSTM-based neural machine translation models. Furthermore, when we use the ensemble of a few models with very large target vocabularies, we achieve the state-of-the-art translation performance (measured by BLEU) on the English->German translation and almost as high performance as state-of-the-art English->French translation system.


On Using Very Large Target Vocabulary for Neural Machine Translation Candidate Sampling Sampled Softmax的更多相关文章
- 课程五(Sequence Models),第三周(Sequence models & Attention mechanism) —— 1.Programming assignments:Neural Machine Translation with Attention
Neural Machine Translation Welcome to your first programming assignment for this week! You will buil ...
- Sequence Models Week 3 Neural Machine Translation
Neural Machine Translation Welcome to your first programming assignment for this week! You will buil ...
- 神经机器翻译 - NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE
论文:NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE 综述 背景及问题 背景: 翻译: 翻译模型学习条件分布 ...
- 对Neural Machine Translation by Jointly Learning to Align and Translate论文的详解
读论文 Neural Machine Translation by Jointly Learning to Align and Translate 这个论文是在NLP中第一个使用attention机制 ...
- Effective Approaches to Attention-based Neural Machine Translation(Global和Local attention)
这篇论文主要是提出了Global attention 和 Local attention 这个论文有一个译文,不过我没细看 Effective Approaches to Attention-base ...
- 【转载 | 翻译】Visualizing A Neural Machine Translation Model(神经机器翻译模型NMT的可视化)
转载并翻译Jay Alammar的一篇博文:Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models Wi ...
- [笔记] encoder-decoder NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE
原文地址 :[1409.0473] Neural Machine Translation by Jointly Learning to Align and Translate (arxiv.org) ...
- Introduction to Neural Machine Translation - part 1
The Noise Channel Model \(p(e)\): the language Model \(p(f|e)\): the translation model where, \(e\): ...
- 论文阅读 | Robust Neural Machine Translation with Doubly Adversarial Inputs
(1)用对抗性的源实例攻击翻译模型; (2)使用对抗性目标输入来保护翻译模型,提高其对对抗性源输入的鲁棒性. 生成对抗输入:基于梯度 (平均损失) -> AdvGen 我们的工作处理由白盒N ...
随机推荐
- 树的直径新求法、codeforces 690C3 Brain Network (hard)
树的直径新求法 讲解题目 今天考了一道题目,下面的思路二是我在考场上原创,好像没人想到这种做法,最原始的题目,考场上的题目是这样的: 你现在有1 个节点,他的标号为1,每次加入一个节点,第i 次加入的 ...
- Fennec VS. Snuke --AtCoder
题目描述 Fennec and Snuke are playing a board game.On the board, there are N cells numbered 1 through N, ...
- z-index 基础详解
关于z-index网上其实有不少博文,写得也不错,不过有些帖子比较旧,而IE也已经更新到了IE11了,所以还是重新总结一下.由于 z-index 的属性表现和层级有关,有些特点在某些层级下才表现出来, ...
- WPF文字渲染相关的问题及解决
wpf中常常遇到各种和文字渲染有关的问题. 如今列举下现象和解决方式. (1) 现象: 文字大小不一.不在同一水平线. 不同字渲染成同一个字, 或者字体发虚 原因:微软雅黑字体对中文字符的渲染支 ...
- 压缩软件Snappy的安装
1.下载源码,通过编译源码安装 tar -zxvf /home/zfll/soft/snappy-1.1.2.tar.gz cd snappy-1.1.2 ./configure make sud ...
- springnodejs
作者 : solq 最新文档请看 http://www.springnodejs.com 本文不再更新 blog : http://www.cnblogs.com/solq/p/3574640.htm ...
- H5 性能调优 工具
1.阿里测:http://www.alibench.com 2.奇云测:http://ce.cloud.360.cn 3.百度应用性能检测中心:http://apm.baidu.com 推荐理由:这3 ...
- 国内最受欢迎的7大API供应平台对比和介绍
俗话说“巧妇难为无米之炊”,数据源就是数据产生价值中的那些大米.那大数据时代企业需要哪些数据呢?根据我个人理解我觉得可以大致分为以下几类: 1.(内部)企业自身业务生产经营环节产生的内部数据[包括销售 ...
- InnoDB Insert(插入)操作(下)--mysql技术内幕
接上一篇文章,最后做的那个实验,我是想证明mysql innodb存储引擎,commit操作与flush数据到磁盘之间的关系,当与同事交流之后,他说,你应该把innodb_buffer_size的大小 ...
- bigAutocomplete实现联想
直接举例说明: //xx联想 var list = $(".js-xxxx").text();//需要联想出的内容的list,该list由后台传入,保存在jsp页面,js取隐藏域值 ...