【softmax分类器的加速器】

https://www.tensorflow.org/api_docs/python/tf/nn/sampled_softmax_loss

This is a faster way to train a softmax classifier over a huge number of classes.

【分类的结果集过大,选取子集】

https://www.tensorflow.org/api_guides/python/nn#Candidate_Sampling

Do you want to train a multiclass or multilabel model with thousands or millions of output classes (for example, a language model with a large vocabulary)? Training with a full Softmax is slow in this case, since all of the classes are evaluated for every training example. Candidate Sampling training algorithms can speed up your step times by only considering a small randomly-chosen subset of contrastive classes (called candidates) for each batch of training examples.

https://www.tensorflow.org/extras/candidate_sampling.pdf

【 compute F(x, y) for every class y ∈ L for every training example----耗时点,这是要解决的问题】

What is Candidate Sampling Say we have a multiclass or multi­label problem where each training example (x , ) consists of i Ti a context xi a small (multi)set of target classes Ti out of a large universe L of possible classes. For example, the problem might be to predicting the next word (or the set of future words) in a sentence given the previous words.

We wish to learn a compatibility function F(x, y) which says something about the compatibility of a class y with a context x . For example ­ the probability of the class given the context.

“Exhaustive” training methods such as softmax and logistic regression require us to compute F(x, y) for every class y ∈ L for every training example. When |L| is very large, this can be prohibitively expensive.

【the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary:子集】

https://arxiv.org/pdf/1412.2007.pdf

Neural machine translation, a recently proposed approach to machine translation based purely on neural networks, has shown promising results compared to the existing approaches such as phrase-based statistical machine translation. Despite its recent success, neural machine translation has its limitation in handling a larger vocabulary, as training complexity as well as decoding complexity increase proportionally to the number of target words. In this paper, we propose a method that allows us to use a very large target vocabulary without increasing training complexity, based on importance sampling. We show that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary. The models trained by the proposed approach are empirically found to outperform the baseline models with a small vocabulary as well as the LSTM-based neural machine translation models. Furthermore, when we use the ensemble of a few models with very large target vocabularies, we achieve the state-of-the-art translation performance (measured by BLEU) on the English->German translation and almost as high performance as state-of-the-art English->French translation system.

On Using Very Large Target Vocabulary for Neural Machine Translation Candidate Sampling Sampled Softmax的更多相关文章

  1. 课程五(Sequence Models),第三周(Sequence models & Attention mechanism) —— 1.Programming assignments:Neural Machine Translation with Attention

    Neural Machine Translation Welcome to your first programming assignment for this week! You will buil ...

  2. Sequence Models Week 3 Neural Machine Translation

    Neural Machine Translation Welcome to your first programming assignment for this week! You will buil ...

  3. 神经机器翻译 - NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE

    论文:NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE 综述 背景及问题 背景: 翻译: 翻译模型学习条件分布 ...

  4. 对Neural Machine Translation by Jointly Learning to Align and Translate论文的详解

    读论文 Neural Machine Translation by Jointly Learning to Align and Translate 这个论文是在NLP中第一个使用attention机制 ...

  5. Effective Approaches to Attention-based Neural Machine Translation(Global和Local attention)

    这篇论文主要是提出了Global attention 和 Local attention 这个论文有一个译文,不过我没细看 Effective Approaches to Attention-base ...

  6. 【转载 | 翻译】Visualizing A Neural Machine Translation Model(神经机器翻译模型NMT的可视化)

    转载并翻译Jay Alammar的一篇博文:Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models Wi ...

  7. [笔记] encoder-decoder NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE

    原文地址 :[1409.0473] Neural Machine Translation by Jointly Learning to Align and Translate (arxiv.org) ...

  8. Introduction to Neural Machine Translation - part 1

    The Noise Channel Model \(p(e)\): the language Model \(p(f|e)\): the translation model where, \(e\): ...

  9. 论文阅读 | Robust Neural Machine Translation with Doubly Adversarial Inputs

    (1)用对抗性的源实例攻击翻译模型; (2)使用对抗性目标输入来保护翻译模型,提高其对对抗性源输入的鲁棒性. 生成对抗输入:基于梯度 (平均损失)  ->  AdvGen 我们的工作处理由白盒N ...

随机推荐

  1. XSY 1749 tree

    题目大意 给定一棵基环树, 问你有多少条路径的长度\(\ge K\). 点数\(\le 10^5\) Solution 基环树分治模板题. 我是这样做的: 加边的时候用并查集维护点的连通性, 少加入环 ...

  2. IntelliJ IDEA设置鼠标移动到方法上提示API注释

    参考: https://www.cnblogs.com/guazi/p/6474426.html(图片转自此篇文章)

  3. centos6.7下安装配置vnc

    vnc是一款使用广泛的服务器管理软件,可以实现图形化管理,下面简单介绍一下如何在centos6.7下安装vnc. 1.安装vncserver yum install tigervnc tigervnc ...

  4. ie8 不能加载dll的问题解决

    请问是在打开IE的时候提示无法加载DLL文件吗? 请尝试重置IE: 1. 关闭所有Internet Explorer窗口. 2. 单击开始,点击运行,输入inetcpl.cpl,按回车. 3. 点击高 ...

  5. 使用ssh从外网访问内网

    一.场景如下: 各个角色的对应关系如下: 角色 描述 APP 个人笔记本,属于内网IP sshd server 公网 VPS ( 映射端口: port 2222 ),拥有公网IP ssh client ...

  6. 使用Shiro

    一.架构 要学习如何使用Shiro必须先从它的架构谈起,作为一款安全框架Shiro的设计相当精妙.Shiro的应用不依赖任何容器,它也可以在JavaSE下使用.但是最常用的环境还是JavaEE.下面以 ...

  7. maven module和project的区别

    Maven Project可以理解为父工程.Maven Module可以理解为子工程.创建Maven Module工程必须有存在的父工程,maven就是通过父子工程进行工程管理的.

  8. MySQL触发器 trigger学习

    触发器:一类特殊的事物.可监视某种数据操作,并触发相关操作(insert/update/delete).表中的某些数据改变,希望同一时候能够引起其他相关数据改变的需求. 作用:变化自己主动完毕某些语句 ...

  9. winform程序公布后,client下载报错“您的 Web 浏览器设置不同意执行未签名的应用程序”

    如题 在winserver2008服务器上操作会报错.解决的方法: IE→Internet选项→安全→可信网站,加入信任公布的IP地址

  10. F - 概率(经典问题)

    Description Sometimes some mathematical results are hard to believe. One of the common problems is t ...