Dynamic attention in tensorflow

新代码在contrib\seq2seq\python\ops\attention_decoder_fn.py

和之前代码相比不再采用conv的方式来计算乘，直接使用乘法和linear

给出了两种attention的实现传统的"bahdanau": additive (Bahdanau et al., ICLR'2015) Neural Machine Translation by Jointly Learning to Align and Translate

以及"luong": multiplicative (Luong et al., EMNLP'2015) Effective Approaches to Attention-based Neural Machine Translation

这里以 bahdanau为例

还是按照 Grammar as a Foreign Language的公式

对应代码里面

将input encoder outputs 也就是输入的attention states作为 attention values

也就是在prepare_attention中

attention_values = attention_states

那么attention keys 对应 W_1h_i的部分，采用linear来实现

attention_keys = layers.linear(

attention_states, num_units, biases_initializer=None, scope=scope)

在创建score function的

_create_attention_score_fn 中完整定义了计算过程

这里去掉luong的实现部分仅仅看bahdanau部分

with variable_scope.variable_scope(name, reuse=reuse):

if attention_option == "bahdanau":

#这里对应第一个公式最右面 query_w对应W_2, query是对应d_t

query_w = variable_scope.get_variable(

"attnW", [num_units, num_units], dtype=dtype)

#对应第一个公式最左侧的v

score_v = variable_scope.get_variable("attnV", [num_units], dtype=dtype)

def attention_score_fn(query, keys, values):

"""Put attention masks on attention_values using attention_keys and query.

Args:

query: A Tensor of shape [batch_size, num_units].

keys: A Tensor of shape [batch_size, attention_length, num_units].

values: A Tensor of shape [batch_size, attention_length, num_units].

Returns:

context_vector: A Tensor of shape [batch_size, num_units].

Raises:

ValueError: if attention_option is neither "luong" or "bahdanau".

"""

if attention_option == "bahdanau":

# transform query W_2*d_t

query = math_ops.matmul(query, query_w)

# reshape query: [batch_size, 1, num_units]

query = array_ops.reshape(query, [-1, 1, num_units])

# attn_fun 对应第一个公式的最左侧结果(=左侧) math_ops.reduce_sum(v * math_ops.tanh(keys + query), [2]) * + reduce_sum操作即是dot操作

scores = _attn_add_fun(score_v, keys, query)

# Compute alignment weights

# scores: [batch_size, length]

# alignments: [batch_size, length]

# TODO(thangluong): not normalize over padding positions.

#对应第二个公式计算softmax结果

alignments = nn_ops.softmax(scores)

# Now calculate the attention-weighted vector.

alignments = array_ops.expand_dims(alignments, 2)

#利用softmax得到的权重计算attention向量的加权加和

context_vector = math_ops.reduce_sum(alignments * values, [1])

context_vector.set_shape([None, num_units])

#context_vector即对应第三个公式 =的左侧

return context_vector

再看下计算出contenxt_vector之后的使用，这个方法正如论文中所说也和之前旧代码基本一致

也就是说将context和query进行concat之后通过linear映射依然得到num_units的长度作为attention

def _create_attention_construct_fn(name, num_units, attention_score_fn, reuse):

"""Function to compute attention vectors.

Args:

name: to label variables.

num_units: hidden state dimension.

attention_score_fn: to compute similarity between key and target states.

reuse: whether to reuse variable scope.

Returns:

attention_construct_fn: to build attention states.

"""

with variable_scope.variable_scope(name, reuse=reuse) as scope:

def construct_fn(attention_query, attention_keys, attention_values):

context = attention_score_fn(attention_query, attention_keys,

attention_values)

concat_input = array_ops.concat([attention_query, context], 1)

attention = layers.linear(

concat_input, num_units, biases_initializer=None, scope=scope)

return attention

return construct_fn

最终的使用，cell_output就是attention，而next_input是cell_input和attention的concat

# construct attention

attention = attention_construct_fn(cell_output, attention_keys,

attention_values)

cell_output = attention

# argmax decoder

cell_output = output_fn(cell_output) # logits

next_input_id = math_ops.cast(

math_ops.argmax(cell_output, 1), dtype=dtype)

done = math_ops.equal(next_input_id, end_of_sequence_id)

cell_input = array_ops.gather(embeddings, next_input_id)

# combine cell_input and attention

next_input = array_ops.concat([cell_input, attention], 1)

Dynamic attention in tensorflow的更多相关文章

论文翻译：2020_A Recursive Network with Dynamic Attention for Monaural Speech Enhancement
论文地址:基于动态注意的递归网络单耳语音增强论文代码:https://github.com/Andong-Li-speech/DARCN 引用格式:Li, A., Zheng, C., Fan, C ...
Dynamic seq2seq in tensorflow
v1.0中 tensorflow渐渐废弃了老的非dynamic的seq2seq接口,已经放到 tf.contrib.legacy_seq2seq目录下面. tf.contrib.seq2seq下面的实 ...
可视化展示attention(seq2seq with attention in tensorflow)
目前实现了基于tensorflow的支持的带attention的seq2seq.基于tf 1.0官网contrib路径下seq2seq 由于后续版本不再支持attention,迁移到melt并做了进一 ...
Effective Tensorflow[转]
Effective TensorFlow Table of Contents TensorFlow Basics Understanding static and dynamic shapes Sco ...
seq2seq attention
1.seq2seq:分为encoder和decoder a.在decoder中,第一时刻输入的是上encoder最后一时刻的状态,如果用了双向的rnn,那么一般使用逆序的最后一个时刻的输出(网上说实验 ...
attention
attention: 时序的刻画 attention 在recommendation 中的应用: 年龄的增长, 对于商品的喜好 Dynamic attention deeo model:
tensorflow 控制流操作，条件判断和循环操作
Control flow operations: conditionals and loops When building complex models such as recurrent neura ...
论文解读（GATv2）《How Attentive are Graph Attention Networks?》
论文信息论文标题:How Attentive are Graph Attention Networks?论文作者:Shaked Brody, Uri Alon, Eran Yahav论文来源:202 ...
[论文阅读] RNN 在阿里DIEN中的应用
[论文阅读] RNN 在阿里DIEN中的应用 0x00 摘要本文基于阿里推荐DIEN代码,梳理了下RNN一些概念,以及TensorFlow中的部分源码.本博客旨在帮助小伙伴们详细了解每一步骤以及为什 ...

随机推荐

Selenium上传文件方法总结
Web上本地上传图片,弹出的框Selenium是无法识别的,也就是说,selenium本身没有直接的方法去实现上传本地文件,这里总结了两种上传文件的方式. 一.利用Robot类处理文件上传. 其大致流 ...
JS的document.images函数使用示例
<!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title> ...
C# RabbitMQ延迟队列功能实战项目演练
一.需求背景当用户在商城上进行下单支付,我们假设如果8小时没有进行支付,那么就后台自动对该笔交易的状态修改为订单关闭取消,同时给用户发送一份邮件提醒.那么我们应用程序如何实现这样的需求场景呢?在之前 ...
YUV420序列转成图片
首先声明一点,这里的YUV其实不是YUV,严格来说是YCbCr.这里就先这样称呼YUV吧.本文是关于YUV420格式的视频转成图片序列的. 关于YUV格式的图片,存储如下图所示: 举个例子,一个640 ...
python测试开发django-55.xadmin使用markdown文档编辑器(django-mdeditor)
前言 markdown是一个非常好的编辑器,用过的都说好,如果搭建一个博客平台的话,需要在后台做文章编辑,可以整合一个markdown的文本编辑器. github上关于django的markdown插 ...
[Web 前端] mobx教程(三)-在React中使用Mobx
copy from : https://blog.csdn.net/smk108/article/details/85053903 Mobx提供了一个mobx-react包帮助开发者方便地在React ...
RabbitMQ 可靠投递
RabbitMQ 可靠投递标签: RabbitMQ shovel-plugin ConfirmCallback RabbitMQ消息投递背景 confirmCallback 确认模式 return ...
hive sql 常见异常
1.union Logging initialized using configuration in file:/home/xiaoju/hadoop/apache-hive-1.2.1-bin/co ...
SSE图像算法优化系列二十四: 基于形态学的图像后期抗锯齿算法--MLAA优化研究。
偶尔看到这样的一个算法,觉得还是蛮有意思的,花了将近10天多的时间研究了下相关代码. 以下为百度的结果:MLAA全称Morphological Antialiasing,意为形态抗锯齿是AMD推出的完 ...
SharePonit online 列表表单定制
1)在O365管理中心,确保启用了站点脚本定制,否则,网站不允许将页面切换到编辑模式. 2)Ribbon上,列表->表单web部件->编辑窗体如果没有Ribbon,则到列表高级设置,启用 ...

Dynamic attention in tensorflow

Dynamic attention in tensorflow的更多相关文章

随机推荐

热门专题