Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1412.2306
Main Points:
- An Alignment Model: Convolutional Neural Networks over image regions ( An image -> RCNN -> Top 19 detected locations in addition to the whole image -> the representations based on the pixels Ib inside each bounding box -> a set of h-dimensional vectors {vi | i = 1 ... 20} ), Bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding ( CNN - Structured Objective - BiRNN ).
- A Multimodal Recurrent Neural Network architecture: On the image side, Convolutional Neural Networks ( CNNs ) have recently emerged as a powerful class of models for image classification and object detection. On the sentence side, our work takes advantage of pretrained word vectors to obtain low-dimensional representations of words. Finally, Recurrent Neural Networks have been previously used in language modeling, but we additionally condition these models on images.
- Authors use bidirectional recurrent neural network to compute word representations in the sentence, dispensing of the need to compute dependency trees and allowing unbounded interactions of words and their context in the sentence.


Other Key Points:
- The primary challenge towards generating descriptions of images is in the design of a model that is rich enough to simultaneously reason about contents of images and their representation in the domain of natural language. Additionally, the model should be free of assumptions about specific hard-coded templates, rules or categories and instead rely on learning from the training data. The second, practical challenge is that datasets of image captions are available in large quantities on the internet, but these descriptions multiplex mentions of several entities whose locations in the images are unknown.
Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )的更多相关文章
- Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...
- Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...
- Deep Visual-Semantic Alignments for Generating Image Descriptions(深度视觉-语义对应对于生成图像描述)
https://cs.stanford.edu/people/karpathy/deepimagesent/ Abstract We present a model that generates na ...
- Paper Reading:Deep Neural Networks for YouTube Recommendations
论文:Deep Neural Networks for YouTube Recommendations 发表时间:2016 发表作者:(Google)Paul Covington, Jay Adams ...
- Paper Reading:Deep Neural Networks for Object Detection
发表时间:2013 发表作者:(Google)Szegedy C, Toshev A, Erhan D 发表刊物/会议:Advances in Neural Information Processin ...
- 论文笔记:Visual Semantic Navigation Using Scene Priors
Visual Semantic Navigation Using Scene Priors 2018-10-21 19:39:26 Paper: https://arxiv.org/pdf/1810 ...
- Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
- 论文笔记:Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association
Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language ...
- 论文:利用深度强化学习模型定位新物体(VISUAL SEMANTIC NAVIGATION USING SCENE PRIORS)
这是一篇被ICLR 2019 接收的论文.论文讨论了如何利用场景先验知识 (scene priors)来定位一个新场景(novel scene)中未曾见过的物体(unseen objects).举例来 ...
随机推荐
- 工具 | Axure基础操作 No.5
今天看了不少的关于产品思维的文章,甚有感悟.但是还是觉得一句话说的非常对,做产品就像游泳,你掌握了很多理论知识,只要一下水那些理论知识都没什么用,只有真正的一头扎进水里你才知道身体怎么去适应这样的感觉 ...
- python3爬虫-通过requests获取拉钩职位信息
import requests, json, time, tablib def send_ajax_request(data: dict): try: ajax_response = session. ...
- C++程序设计入门 引用和动态内存管理学习
引用: 引用就是另一个变量的别名,通过引用所做的读写操作实际上是作用于原变量上. 由于引用是绑定在一个对象上的,所以定义引用的时候必须初始化. 函数参数:引用传递 1.引用可做函数参数,但调用时只需 ...
- 【ZOJ 2996】(1+x)^n(二项式定理)
Please calculate the coefficient modulo 2 of x^i in (1+x)^n. Input For each case, there are two inte ...
- 含头结点的单链表C++实现(包含创建,查找,插入,追加,删除,反转,排序,合并,打印,清空,销毁等基本操作)
温馨提示:下面代码默认链表数据为字符型,本代码仅供参考,希望能对找到本随笔的人有所帮助! #include<iostream> using namespace std; typedef s ...
- jar下载地址
java开发难免需要下载额外的jar,推荐一个地址 http://www.java2s.com/Code/Jar/CatalogJar.htm
- 2.4 自己编写一个vivi驱动程序
学习目标:从零编写一个vivi驱动程序,并测试: 一. vivi驱动应用程序调用过程 上节对xawtv对vivi程序调用欧城进行了详细分析,可总结为以下流程: 二.仿照vivi.c编写myvivi.c ...
- google网站推广被拒登如何解决
前几天,有一客户向我们SINE安全公司反映,网站在google上的推广已拒登,说什么网站存在恶意软件或垃圾软件,导致google广告无法上线,还发现网站从google搜索点击进去会直接跳转到其他网站上 ...
- 一道hive面试题(窗口函数)
表student中的数据格式如下: name month degree s1 201801 As1 201802 As1 201803 Cs1 201804 As1 201805 As1 201806 ...
- python学习之路-基本数据类型1 变量的概念、数字、字符串
1 什么是数据类型? 每种编程语言都有自己的数据类型,用于标识计算机可以认识的数据,Python中主要的数据类型为字符串,整数,浮点数,列表,元祖,字典,集合七种主要的数据类型,其中以列表,字典为最主 ...