tf.keras.layers.Attention: Dot-product attention layer, a.k.a. Luong-style attention.
tf.keras.layers.Attention( View source on GitHub )
Dot-product attention layer, a.k.a. Luong-style attention.
Inherits From: Layer, Module
tf.keras.layers.Attention(
use_scale=False, score_mode='dot', **kwargs
)
Inputs are query tensor of shape [batch_size, Tq, dim], value tensor of shape [batch_size, Tv, dim] and key tensor of shape [batch_size, Tv, dim]. The calculation follows the steps:
Calculate scores with shape [batch_size, Tq, Tv] as a query-key dot product: scores = tf.matmul(query, key, transpose_b=True).
Use scores to calculate a distribution with shape [batch_size, Tq, Tv]: distribution = tf.nn.softmax(scores).
Use distribution to create a linear combination of value with shape [batch_size, Tq, dim]: return tf.matmul(distribution, value).
Args
`use_scale` If True, will create a scalar variable to scale the attention scores.
`dropout` Float between 0 and 1. Fraction of the units to drop for the attention scores. Defaults to 0.0.
`score_mode` Function to use to compute attention scores, one of {"dot", "concat"}. "dot" refers to the dot product between the query and key vectors. "concat" refers to the hyperbolic tangent of the concatenation of the query and key vectors.
**Call arguments**
`inputs` List of the following tensors:
`query`: Query Tensor of shape [batch_size, Tq, dim].
`value`: Value Tensor of shape [batch_size, Tv, dim].
`key`: Optional key Tensor of shape [batch_size, Tv, dim]. If not given, will use value for both key and value, which is the most common case.
**mask List of the following tensors:**
`query_mask`: A boolean mask Tensor of shape [batch_size, Tq]. If given, the output will be zero at the positions where mask==False.
`value_mask`: A boolean mask Tensor of shape [batch_size, Tv]. If given, will apply the mask such that values at positions where mask==False do not contribute to the result.
`return_attention_scores` bool, it True, returns the attention scores (after masking and softmax) as an additional output argument.
`training` Python boolean indicating whether the layer should behave in training mode (adding dropout) or in inference mode (no dropout).
`use_causal_mask` Boolean. Set to True for decoder self-attention. Adds a mask such that position i cannot attend to positions j > i. This prevents the flow of information from the future towards the past. Defaults to False.
**Output**
Attention outputs of shape [batch_size, Tq, dim]. [Optional] Attention scores after masking and softmax with shape [batch_size, Tq, Tv].
The meaning of query, value and key depend on the application. In the case of text similarity, for example, query is the sequence embeddings of the first piece of text and value is the sequence embeddings of the second piece of text. key is usually the same tensor as value.
Here is a code example for using Attention in a CNN+Attention network:
# Variable-length int sequences.
query_input = tf.keras.Input(shape=(None,), dtype='int32')
value_input = tf.keras.Input(shape=(None,), dtype='int32')
# Embedding lookup.
token_embedding = tf.keras.layers.Embedding(input_dim=1000, output_dim=64)
# Query embeddings of shape [batch_size, Tq, dimension].
query_embeddings = token_embedding(query_input)
# Value embeddings of shape [batch_size, Tv, dimension].
value_embeddings = token_embedding(value_input)
# CNN layer.
cnn_layer = tf.keras.layers.Conv1D(
filters=100,
kernel_size=4,
# Use 'same' padding so outputs have the same shape as inputs.
padding='same')
# Query encoding of shape [batch_size, Tq, filters].
query_seq_encoding = cnn_layer(query_embeddings)
# Value encoding of shape [batch_size, Tv, filters].
value_seq_encoding = cnn_layer(value_embeddings)
# Query-value attention of shape [batch_size, Tq, filters].
query_value_attention_seq = tf.keras.layers.Attention()(
[query_seq_encoding, value_seq_encoding])
# Reduce over the sequence axis to produce encodings of shape
# [batch_size, filters].
query_encoding = tf.keras.layers.GlobalAveragePooling1D()(
query_seq_encoding)
query_value_attention = tf.keras.layers.GlobalAveragePooling1D()(
query_value_attention_seq)
# Concatenate query and document encodings to produce a DNN input layer.
input_layer = tf.keras.layers.Concatenate()(
[query_encoding, query_value_attention])
# Add DNN layers, and create Model.
# ...
tf.keras.layers.Attention: Dot-product attention layer, a.k.a. Luong-style attention.的更多相关文章
- tf.keras.layers.MaxPool2D 简介
tf.keras.layers.Max2D( pool_size=(2, 2), strides=None, padding='valid', data_format=None ) pool_size ...
- TensorFlow2.0(11):tf.keras建模三部曲
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
- 一文上手Tensorflow2.0之tf.keras(三)
系列文章目录: Tensorflow2.0 介绍 Tensorflow 常见基本概念 从1.x 到2.0 的变化 Tensorflow2.0 的架构 Tensorflow2.0 的安装(CPU和GPU ...
- Tensorflow2(一)深度学习基础和tf.keras
代码和其他资料在 github 一.tf.keras概述 首先利用tf.keras实现一个简单的线性回归,如 \(f(x) = ax + b\),其中 \(x\) 代表学历,\(f(x)\) 代表收入 ...
- 基于tensorflow2.0 使用tf.keras实现Fashion MNIST
本次使用的是2.0测试版,正式版估计会很快就上线了 tf2好像更新了蛮多东西 虽然教程不多 还是找了个试试 的确简单不少,但是还是比较喜欢现在这种写法 老样子先导入库 import tensorflo ...
- 【tf.keras】实现 F1 score、precision、recall 等 metric
tf.keras.metric 里面竟然没有实现 F1 score.recall.precision 等指标,一开始觉得真不可思议.但这是有原因的,这些指标在 batch-wise 上计算都没有意义, ...
- 【tf.keras】tf.keras使用tensorflow中定义的optimizer
Update:2019/09/21 使用 tf.keras 时,请使用 tf.keras.optimizers 里面的优化器,不要使用 tf.train 里面的优化器,不然学习率衰减会出现问题. 使用 ...
- tensorflow 2.0 技巧 | 自定义tf.keras.Model的坑
自定义tf.keras.Model需要注意的点 model.save() subclass Model 是不能直接save的,save成.h5,但是能够save_weights,或者save_form ...
- 【tf.keras】AdamW: Adam with Weight decay
论文 Decoupled Weight Decay Regularization 中提到,Adam 在使用时,L2 与 weight decay 并不等价,并提出了 AdamW,在神经网络需要正则项时 ...
- tf.keras遇见的坑:Output tensors to a Model must be the output of a TensorFlow `Layer`
经过网上查找,找到了问题所在:在使用keras编程模式是,中间插入了tf.reshape()方法便遇到此问题. 解决办法:对于遇到相同问题的任何人,可以使用keras的Lambda层来包装张量流操作, ...
随机推荐
- Linux各种服务配置开机自启
一.Linux配置redis开机自启 (1)到redis配置文件中找到conf文件:vi redis.conf (2)daemonize no 修改为:daemonize yes (3)cd /etc ...
- 【记录】Python3|用百度语音 API 朗读你的小说TXT
百度语音合成官方教程_AI开放平台 百度语音合成官方demo_github.com 简单地写了一个按段落朗读文本的demo:DEMO链接_gitee.com. 有时候会请求不到数据,不知道是网络原因还 ...
- 轮播图导航组件 | 纯血鸿蒙组件库AUI
摘要: 轮播图导航(A_SwiperNav):实现沉浸式体验的App全屏轮播引导页效果.可设置图片数据(含文本.图片地址.路由.标题.子标题),可设置按钮颜色. 一.在页面当中调用轮播图导航组件 打开 ...
- 集合流之“anyMatch”的应用【返回boolean类型】
判断集合中是否存在"字符串",返回boolean类型 boolean isExit = allSku.stream().map(Product::getFeatureList) . ...
- Form验证笔记
views request.body request.POST(request.body) request.FILES(request.body) re ...
- 关于思源笔记与docker的部分问题
关于思源笔记 sevePath与思源 思源从版本1.9.8之后规定必须设置servePath绑定地址,即仅限指定地址访问. 比如,部署的时候设置的--servePath=127.0.0.1:6806, ...
- String Manipulation related with pandas
String Manipulation related with pandas String object Methods import pandas as pd import numpy as np ...
- 定制Django的Tag和Filter(一)
1.在 app 目录下创建 templatetags 目录(目录名只能是 templatetags). 如: app/ __init__.py models.py templatetags/ __in ...
- codeup之【字符串】回文串
题目描述 读入一串字符,判断是否是回文串."回文串"是一个正读和反读都一样的字符串,比如"level"或者"noon"等等就是回文串. 输入 ...
- debug与DOSBox安装&&debug命令操作
文章目录 一. DOSBox && debug使用 1.安装配置(以下方法们按照从麻烦到简便的顺序) (1) 多步骤(可行,但不推荐) (2)一步到位(强烈推荐) 2. 窗口大小 二. ...