Awesome-Visual-Captioning】的更多相关文章

http://www.ee.columbia.edu/ln/dvmm/publications/17/zhang2017visual.pdf Visual Translation Embedding Network for Visual Relation Detection Hanwang Zhang† , Zawlin Kyaw‡ , Shih-Fu Chang† , Tat-Seng Chua‡ †Columbia University, ‡National University of Si…
Image Caption: Automatically describing the content of an image domain:CV+NLP Category:(by myself, you can read the survey for detail.) CNN+RNN, with attention mechanisms Reinforcement Learning GAN Compositional Architecture: Review Network, Guiding…
CVPR2017 paper list Machine Learning 1 Spotlight 1-1A Exclusivity-Consistency Regularized Multi-View Subspace Clustering Xiaojie Guo, Xiaobo Wang, Zhen Lei, Changqing Zhang, Stan Z. Li Borrowing Treasures From the Wealthy: Deep Transfer Learning Thro…
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering-阅读总结 笔记不能简单的抄写文中的内容,得有自己的思考和理解. 一.基本信息 **\1.标题:**Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering **\2.作者:**Peter Anderson,Xiaodong…
Awesome Image Captioning 2018-12-03 19:19:56 From: https://github.com/zhjohnchan/awesome-image-captioning Papers 2010 I2t: Image parsing to text description - Yao B Z et al, P IEEE 2011. 2011 Im2Text: Describing Images Using 1 Million Captioned Photo…
Convolutional Image Captioning 2018-11-04 20:42:07 Paper: http://openaccess.thecvf.com/content_cvpr_2018/papers/Aneja_Convolutional_Image_Captioning_CVPR_2018_paper.pdf Code: https://github.com/aditya12agd5/convcap Related Papers: 1. Convolutional Se…
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal Recurrent Neural Networks ( AlexNet/VGGNet + a multimodal layer + RNNs ). Their work has two major differences from these methods. Firstly, they inco…
Link of the Paper: https://arxiv.org/abs/1805.09019 Innovations: The authors propose a CNN + CNN framework for image captioning. There are four modules in the framework: vision module ( VGG-16 ), which is adopted to "watch" images; language modu…
Link of the Paper: https://arxiv.org/abs/1806.06422 Innovations: The authors propose a novel learning based discriminative evaluation metric that is directly trained to distinguish between human and machine-generated captions. They train an automatic…
Link of the Paper: https://arxiv.org/abs/1711.09151 Motivation: LSTM units are complex and inherently sequential across time. Convolutional networks have shown advantages on machine translation and conditional image generation. Innovation: The author…