Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )

Link of the Paper: https://arxiv.org/abs/1412.2306

Main Points:

An Alignment Model: Convolutional Neural Networks over image regions ( An image -> RCNN -> Top 19 detected locations in addition to the whole image -> the representations based on the pixels I_b inside each bounding box -> a set of h-dimensional vectors {v_i | i = 1 ... 20} ), Bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding ( CNN - Structured Objective - BiRNN ).
A Multimodal Recurrent Neural Network architecture: On the image side, Convolutional Neural Networks ( CNNs ) have recently emerged as a powerful class of models for image classification and object detection. On the sentence side, our work takes advantage of pretrained word vectors to obtain low-dimensional representations of words. Finally, Recurrent Neural Networks have been previously used in language modeling, but we additionally condition these models on images.
Authors use bidirectional recurrent neural network to compute word representations in the sentence, dispensing of the need to compute dependency trees and allowing unbounded interactions of words and their context in the sentence.

Other Key Points:

The primary challenge towards generating descriptions of images is in the design of a model that is rich enough to simultaneously reason about contents of images and their representation in the domain of natural language. Additionally, the model should be free of assumptions about specific hard-coded templates, rules or categories and instead rely on learning from the training data. The second, practical challenge is that datasets of image captions are available in large quantities on the internet, but these descriptions multiplex mentions of several entities whose locations in the images are unknown.

Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )的更多相关文章

Paper Reading - Deep Captioning with Multimodal Recurrent Neural Networks ( m-RNN ) ( ICLR 2015 ) ★
Link of the Paper: https://arxiv.org/pdf/1412.6632.pdf Main Points: The authors propose a multimodal ...
Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )
Link of the Paper: https://arxiv.org/abs/1411.4555 Main Points: A generative model ( NIC, GoogLeNet ...
Deep Visual-Semantic Alignments for Generating Image Descriptions（深度视觉-语义对应对于生成图像描述）
https://cs.stanford.edu/people/karpathy/deepimagesent/ Abstract We present a model that generates na ...
Paper Reading:Deep Neural Networks for YouTube Recommendations
论文:Deep Neural Networks for YouTube Recommendations 发表时间:2016 发表作者:(Google)Paul Covington, Jay Adams ...
Paper Reading:Deep Neural Networks for Object Detection
发表时间:2013 发表作者:(Google)Szegedy C, Toshev A, Erhan D 发表刊物/会议:Advances in Neural Information Processin ...
论文笔记：Visual Semantic Navigation Using Scene Priors
Visual Semantic Navigation Using Scene Priors 2018-10-21 19:39:26 Paper: https://arxiv.org/pdf/1810 ...
Paper Reading: Stereo DSO
开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了. Stereo DSO: Large-Scale Direct Sparse ...
论文笔记：Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association
Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language ...
论文：利用深度强化学习模型定位新物体(VISUAL SEMANTIC NAVIGATION USING SCENE PRIORS)
这是一篇被ICLR 2019 接收的论文.论文讨论了如何利用场景先验知识 (scene priors)来定位一个新场景(novel scene)中未曾见过的物体(unseen objects).举例来 ...

随机推荐

分布式架构学习-Consul集群配置
简介之前公司用的是Consul进行服务发现以及服务管理,自己一直以来只是用一下,但是没有具体的深入,觉得学习不可以这样,所以稍微研究了一下. 网上有很多关于Consul的介绍和对比,我这里也不献丑了 ...
Web开发生存工具使用指南
这里安利两款我认为开发中能够极大的提高生产力的工具,Charles 和 Postman. P.S. Charles(查尔斯)..不要再读查理斯了,金刚狼中被老铁扎心的博士就叫 CharlesP.P.S ...
cmd命令操作Mysql数据库
在一次考试中,笔者因考试的电脑上没有安装操作Mysql数据库的可视化工具而不知如何操作数据库,所以在这里可以提醒各位掌握命令行来操作数据库也是非常重要的. 笔者以惨痛的教训来警惕大家. 进入正题: ...
CSS3 过渡、变形和动画
一.我们来给按钮增加一个悬停效果:#content a:hover {border: 1px solid #000000;color: #000000;text-shadow: 0px 1px whi ...
windows7平台 nginx+python 环境搭建
参考了这篇文章,感谢原文作者:https://blog.csdn.net/foxgod/article/details/78929201 最近正在学习Python,发现除了写一点py脚本在idlex上 ...
远程查看java虚拟机内存使用情况jconsole
jconsole 查看虚拟机使用情况 1.在远程机的tomcat的catalina.sh中加入配置: JAVA_OPTS="$JAVA_OPTS -Djava.rmi.server.host ...
在AI人工智能中如何巧妙学习大数据编程，成为五十万年薪的佼佼者
编辑 ai狗年大数据和人工智能的关系,首先要说什么是大数据.这些年来,大数据先是被神化,继而又被妖魔化,到了今天,其实谁也不知道别人所谓的大数据指的是什么.我大数据从业者,建了一个大数据资源共享群1 ...
Python学习：面向对象 -- 成员修饰符
成员修饰符两种成员 - 公有成员 - 私有成员, __字段名 - 无法直接访问,只能通过内部方法来间接访问私有成员简例:公有成员与私有成员 class Info: country = '中国' ...
多线程深入理解和守护线程、子线程、锁、queue、evenet等介绍
1.多线程类的继承 import threading import time class MyThreading(threading.Thread): def __init__(self,n): su ...
libcurl编译及使用
环境: libcurl版本:7.54.1 VS:Visual Studio 2013 一.编译 1.下载最新版的libcurl(curl-7.54.1.zip)(地址:https://curl.hax ...

Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )

Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )的更多相关文章

随机推荐

热门专题