Transformer总结
Contents
Attention
- Recurrent Models of Visual Attention [2014 deepmind NIPS]
- Neural Machine Translation by Jointly Learning to Align and Translate [ICLR 2015]
OverallSurvey
- Efficient Transformers: A Survey [paper]
- A Survey on Visual Transformer [paper]
- Transformers in Vision: A Survey [paper]
NLP
Language
- Sequence to Sequence Learning with Neural Networks [NIPS 2014] [paper] [code]
- End-To-End Memory Networks [NIPS 2015] [paper] [code]
- Attention is all you need [NIPS 2017] [paper] [code]
- Bidirectional Encoder Representations from Transformers: BERT [paper] [code] [pretrained-models]
- Reformer: The Efficient Transformer [ICLR2020] [paper] [code]
- Linformer: Self-Attention with Linear Complexity [AAAI2020] [paper] [code]
- GPT-3: Language Models are Few-Shot Learners [NIPS 2020] [paper] [code]
Speech
- Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation [INTERSPEECH 2020] [paper] [code]
CV
Backbone_Classification
Papers and Codes
- CoaT: Co-Scale Conv-Attentional Image Transformers [arxiv 2021] [paper] [code]
- SiT: Self-supervised vIsion Transformer [arxiv 2021] [paper] [code]
- VIT: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [VIT] [ICLR 2021] [paper] [code]
- Trained with extra private data: do not generalized well when trained on insufficient amounts of data
- DeiT: Data-efficient Image Transformers [arxiv2021] [paper] [code]
- Token-based strategy and build upon VIT and convolutional models
- Transformer in Transformer [arxiv 2021] [paper] [code1] [code-official]
- OmniNet: Omnidirectional Representations from Transformers [arxiv2021] [paper]
- Gaussian Context Transformer [CVPR 2021] [paper]
- General Multi-Label Image Classification With Transformers [CVPR 2021] [paper] [code]
- Scaling Local Self-Attention for Parameter Efficient Visual Backbones [CVPR 2021] [paper]
- T2T-ViT: Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet [ICCV 2021] [paper] [code]
- Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [ICCV 2021] [paper] [code]
- Bias Loss for Mobile Neural Networks [ICCV 2021] [paper] [[code()]]
- Vision Transformer with Progressive Sampling [ICCV 2021] [paper] [[code(https://github.com/yuexy/PS-ViT)]]
- Rethinking Spatial Dimensions of Vision Transformers [ICCV 2021] [paper] [code]
- Rethinking and Improving Relative Position Encoding for Vision Transformer [ICCV 2021] [paper] [code]
Interesting Repos
- Convolutional Cifar10
- vision-transformers-cifar10
- Found that performance was worse than simple resnet18
- The influence of hyper-parameters: dim of vit, etc.
- ViT-pytorch
- Using pretrained weights can get better results
Self-Supervised
- Emerging Properties in Self-Supervised Vision Transformers [ICCV 2021] [paper] [code]
- An Empirical Study of Training Self-Supervised Vision Transformers [ICCV 2021] [paper] [code]
Interpretability and Robustness
- Transformer Interpretability Beyond Attention Visualization [CVPR 2021] [paper] [code]
- On the Adversarial Robustness of Visual Transformers [arxiv 2021] [paper]
- Robustness Verification for Transformers [ICLR 2020] [paper] [code]
- Pretrained Transformers Improve Out-of-Distribution Robustness [ACL 2020] [paper] [code]
Detection
- DETR: End-to-End Object Detection with Transformers [ECCV2020] [paper] [code]
- Deformable DETR: Deformable Transformers for End-to-End Object Detection [ICLR2021] [paper] [code]
- End-to-End Object Detection with Adaptive Clustering Transformer [arxiv2020] [paper]
- UP-DETR: Unsupervised Pre-training for Object Detection with Transformers [[arxiv2020] [paper]
- Rethinking Transformer-based Set Prediction for Object Detection [arxiv2020] [paper] [zhihu]
- End-to-end Lane Shape Prediction with Transformers [WACV 2021] [paper] [code]
- ViT-FRCNN: Toward Transformer-Based Object Detection [arxiv2020] [paper]
- Line Segment Detection Using Transformers [CVPR 2021] [paper] [code]
- Facial Action Unit Detection With Transformers [CVPR 2021] [paper] [code]
- Adaptive Image Transformer for One-Shot Object Detection [CVPR 2021] [paper] [code]
- Self-attention based Text Knowledge Mining for Text Detection [CVPR 2021] [paper] [code]
- Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions [ICCV 2021] [paper] [code]
- Group-Free 3D Object Detection via Transformers [ICCV 2021] [paper] [code]
- Fast Convergence of DETR with Spatially Modulated Co-Attention [ICCV 2021] [paper] [code]
HOI
- End-to-End Human Object Interaction Detection with HOI Transformer [CVPR 2021] [paper] [code]
- HOTR: End-to-End Human-Object Interaction Detection with Transformers [CVPR 2021] [paper] [code]
Tracking
- Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking [CVPR 2021] [paper] [code]
- TransTrack: Multiple-Object Tracking with Transformer [CVPR 2021] [paper] [code]
- Transformer Tracking [CVPR 2021] [paper] [code]
- Learning Spatio-Temporal Transformer for Visual Tracking [ICCV 2021] [paper] [code]
Segmentation
- SETR : Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers [CVPR 2021] [paper] [code]
- Trans2Seg: Transparent Object Segmentation with Transformer [arxiv2021] [paper] [code]
- End-to-End Video Instance Segmentation with Transformers [arxiv2020] [paper] [zhihu]
- MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers [CVPR 2021] [paper] [official-code] [unofficial-code]
- Medical Transformer: Gated Axial-Attention for Medical Image Segmentation [arxiv 2020] [paper] [code]
- SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation [CVPR 2021] [paper] [code]
Reid
- Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer [CVPR 2021] [paper] [code]
Localization
- LoFTR: Detector-Free Local Feature Matching with Transformers [CVPR 2021] [paper] [code]
- MIST: Multiple Instance Spatial Transformer [CVPR 2021] [paper] [code]
Generation
- Variational Transformer Networks for Layout Generation [CVPR 2021] [paper] [code]
- TransGAN: Two Transformers Can Make One Strong GAN [paper] [code]
- Taming Transformers for High-Resolution Image Synthesis [CVPR 2021] [paper] [code]
- iGPT: Generative Pretraining from Pixels [ICML 2020] [paper] [code]
- Generative Adversarial Transformers [arxiv 2021] [paper] [code]
- LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity [CVPR2021] [paper[https://openaccess.thecvf.com/content/CVPR2021/html/Yang_LayoutTransformer_Scene_Layout_Generation_With_Conceptual_and_Spatial_Diversity_CVPR_2021_paper.html]] [code]
- Spatial-Temporal Transformer for Dynamic Scene Graph Generation [ICCV 2021] [paper]
Inpainting
- STTN: Learning Joint Spatial-Temporal Transformations for Video Inpainting [ECCV 2020] [paper] [code]
Image enhancement
- Pre-Trained Image Processing Transformer [CVPR 2021] [paper]
- TTSR: Learning Texture Transformer Network for Image Super-Resolution [CVPR2020] [paper] [code]
Pose Estimation
- Pose Recognition with Cascade Transformers [CVPR 2021] [paper] [code]
- TransPose: Towards Explainable Human Pose Estimation by Transformer [arxiv 2020] [paper] [code]
- Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation [ECCV 2020] [paper]
- HOT-Net: Non-Autoregressive Transformer for 3D Hand-Object Pose Estimation [ACMMM 2020] [paper]
- End-to-End Human Pose and Mesh Reconstruction with Transformers [CVPR 2021] [paper] [code]
- 3D Human Pose Estimation with Spatial and Temporal Transformers [arxiv 2020] [paper] [code]
- End-to-End Trainable Multi-Instance Pose Estimation with Transformers [arxiv 2020] [paper]
Face
- Robust Facial Expression Recognition with Convolutional Visual Transformers [arxiv 2020] [paper]
- Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition [CVPR 2021] [paper] [code]
Video Understanding
- Is Space-Time Attention All You Need for Video Understanding? [arxiv 2020] [paper] [code]
- Temporal-Relational CrossTransformers for Few-Shot Action Recognition [CVPR 2021] [paper] [code]
- Self-Supervised Video Hashing via Bidirectional Transformers [CVPR 2021] [paper]
- SSAN: Separable Self-Attention Network for Video Representation Learning [CVPR 2021] [paper]
Depth-Estimation
Prediction
- Multimodal Motion Prediction with Stacked Transformers [CVPR 2021] [paper] [code]
- Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case [paper]
- Transformer networks for trajectory forecasting [ICPR 2020] [paper] [code]
- Spatial-Channel Transformer Network for Trajectory Prediction on the Traffic Scenes [arxiv 2021] [paper] [code]
- Pedestrian Trajectory Prediction using Context-Augmented Transformer Networks [ICRA 2020] [paper] [code]
- Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory Prediction [ECCV 2020] [paper] [code]
- Hierarchical Multi-Scale Gaussian Transformer for Stock Movement Prediction [paper]
- Single-Shot Motion Completion with Transformer [arxiv2021] [paper] [code]
NAS
- HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers [CVPR 2021] [paper] [code]
- AutoFormer: Searching Transformers for Visual Recognition [ICCV 2021] [paper] [[code(https://github.com/microsoft/AutoML)]]
PointCloud
- Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [CVPR 2021] [paper] [code]
- Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos [CVPR 2021] [paper]
Fashion
Medical
- Lesion-Aware Transformers for Diabetic Retinopathy Grading [CVPR 2021] [paper]
Cross-Modal
- Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers [CVPR 2021] [paper]
- Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning [CVPR2021] [paper] [code]
- Topological Planning With Transformers for Vision-and-Language Navigation [CVPR 2021] [paper]
- Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos [CVPRR 2021] [paper]
- VLN BERT: A Recurrent Vision-and-Language BERT for Navigation [CVPR 2021] [paper] [code]
- Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling [CVPR 2021] [paper] [code]
Reference
- Attention 机制详解1,2 zhihu1 zhihu2
- 自然语言处理中的自注意力机制(Self-attention Mechanism)
- Transformer模型原理详解 [zhihu] [csdn]
- 完全解析RNN, Seq2Seq, Attention注意力机制
- Seq2Seq and transformer implementation
- End-To-End Memory Networks [zhihu]
- Illustrating the key,query,value in attention
- Transformer in CV
- CVPR2021-Papers-with-Code
- ICCV2021-Papers-with-Code
Transformer总结的更多相关文章
- Spatial Transformer Networks(空间变换神经网络)
Reference:Spatial Transformer Networks [Google.DeepMind]Reference:[Theano源码,基于Lasagne] 闲扯:大数据不如小数据 这 ...
- ABBYY PDF Transformer+怎么标志注释
ABBYY PDF Transformer+是一款可创建.编辑.添加注释及将PDF文件转换为其他可编辑格式的通用工具,可用来在PDF页面的任何位置添加注释(关于如何通过ABBYY PDF Transf ...
- OAF_文件系列6_实现OAF导出XML文件javax.xml.parsers/transformer(案例)
20150803 Created By BaoXinjian
- 泛函编程(27)-泛函编程模式-Monad Transformer
经过了一段时间的学习,我们了解了一系列泛函数据类型.我们知道,在所有编程语言中,数据类型是支持软件编程的基础.同样,泛函数据类型Foldable,Monoid,Functor,Applicative, ...
- 如何用Transformer+从PDF文档编辑数据
ABBYY PDF Transformer+是一款可创建.编辑.添加注释及将PDF文件转换为其他可编辑格式的通用工具,可使用该软件从PDF文档编辑机密信息,然后再发布它们,文本和图像均可编辑,本文将为 ...
- ABBYY PDF Transformer+ Pro支持全世界189种语言
ABBYY PDF Transformer+ Pro版支持189种语言,包括我们人类的自然语言.人造语言以及正式语言.受支持的语言可能会因产品的版本不同而各异.本文具体列举了所有ABBYY PDF T ...
- 发现PDF Transformer+转换的图像字体小了如何处理
ABBYY PDF Transformer+转换的原始图像字体太小怎么办?为了获得最佳文本识别效果,请用较高的分辨率扫描用极小字体打印的文档,否则很容易在转换识别时出错.下面小编就给大家讲讲该怎么解决 ...
- ABBYY PDF Transformer+从文件选项中创建PDF文档的教程
可使用OCR文字识别软件ABBYY PDF Transformer+从Microsoft Word.Microsoft Excel.Microsoft PowerPoint.HTML.RTF.Micr ...
- Could not find a transformer to transform "SimpleDataType{type=org.mule.transport.NullPayload
mule esb报错 com.isoftstone.esb.transformer.Json2RequestBusinessObject.transformMessage(Json2RequestBu ...
- Transformer
参考资料: [ERT大火却不懂Transformer?读这一篇就够了] https://zhuanlan.zhihu.com/p/54356280 (中文版) http://jalammar.gith ...
随机推荐
- ToDesk云电脑推出Web端,这意味着什么?
在数字化转型的浪潮中,云计算技术正在以前所未有的速度改变着我们的生活方式和工作模式.作为云计算领域的一股新生力量,ToDesk云电脑凭借其卓越的性能和便捷的使用体验,一经上线,便赢得了众多用户的青睐. ...
- 顺序表(C语言)
文章目录 1.定义顺序表结构体 2.初始化顺序表 3.插入元素 3.1顺序表头插 3.2 顺序表尾插 4.删除顺序表指定元素 5.查找元素 6.输出顺序表 7.销毁顺序表 在数据结构的世界里,顺序表是 ...
- 洛谷 P1540 [NOIP2010 提高组] 机器翻译
题目概括 给定 N 个整数,和一个容量为 M 的"字典",从头到尾依次翻译,每次翻译先看自家字典,没有的话再看别人的字典并存到自家字典,如果自家字典满了,当前单词的翻译会代替最早进 ...
- Nuxt.js 应用中的 nitro:build:before 事件钩子详解
title: Nuxt.js 应用中的 nitro:build:before 事件钩子详解 date: 2024/11/4 updated: 2024/11/4 author: cmdragon ex ...
- switch、case语句的问题
switch.case语句: 点击查看代码 int state = 1; switch(state) { case 1: { //状态1执行的程序 } case 2: { //状态2执行的程序 } d ...
- .NET操作Excel高效低内存的开源框架 - MiniExcel
.Net平台上对Excel进行操作主要有两种方式.第一种,把Excel文件看成一个数据库,通过OleDb的方式进行读取与操作:第二种,调用Excel的COM组件.两种方式各有特点. 今天给大家介绍第三 ...
- Nuxt.js 应用中的 vite:extend 事件钩子详解
title: Nuxt.js 应用中的 vite:extend 事件钩子详解 date: 2024/11/11 updated: 2024/11/11 author: cmdragon excerpt ...
- 2023NOIP A层联测32 T4 红楼 ~ Eastern Dream
2023NOIP A层联测32 T4 红楼 ~ Eastern Dream 根号分治加分块. Ps:分块后面真的用的多. 思路 考虑根号分治,将 \(x\) 分为 \(x \leq \sqrt n\) ...
- python 3.7环境安装并使用csv
因为调换需要,进了另外一个维护组,需要用python解析excel csv,所以就下载了一下他们需要的python3.7 如何做呢,看步骤 1.去官网 2.找版本 3.下源码 4.解压出来进入文件夹 ...
- markdown小小白常用语法
第一次用vscode写笔记去同步Cnblog,不知道写啥就记点常用的md语法吧 1. 标题怎么写? 利用"#" + " " 即可实现第几节标题(其中'/',表转 ...