综述类文章
Cross-media analysis and reasoning: advances and directions
Yu-xin PENG et al.
Front Inform Technol Electron Eng 浙江大学学报(英文版)2017 18(1):44-57
这篇文章主要讲了七个问题:
(1) theory and model for cross-media uniform representation;
(2) cross-media correlation understanding and deep mining;
(3) cross-media knowledge graph construction and learning methodologies;
(4) cross-media knowledge evolution and reasoning;
(5) cross-media description and generation;
(6) cross-media intelligent engines;
(7) cross-media intelligent applications.
个人觉得第一部分较为重要,大体提到了跨模态发展过程中比较重要的方法模型,当然只是笼统的提及,另一篇Overview的文章提及了具体的方法、数据集、准确率等(准备下周看那篇文章)。下面根据自己阅读的理解就前五部分的要点进行总结(后两部分基本上都是研究方向和意义):
 
  1. theory and model for cross-media uniform representation
作者认为对于处于易构空间的跨模态信息需要关注两个问题:
  1. how to build the shared space.
  2. how to project data into it.
文中总结了一些模型和方法:
 
CCA (Rasiwasia et al., 2010). It learns a commonly shared space by maximizing the correlation between pairwise co-occurring heterogeneous data and performs projection by linear functions.
Deep CCA (Andrew et al. 2013) extended CCA using a deep learning
technique to learn the correlations more comprehensively than those using CCA and kernel CCA.
MMD (Yang et al. 2008) the multimedia document (MMD). each MMD is a set of media objects of different modalities but carrying the same semantics. The distances between MMDs are related to each modality.
RBF network (Daras et al. 2012) radial basis function (RBF) network. address the problem of missing modalities.
The topic model:
LDA (Roller and Schulte im Walde 2013) integrated visual features into latent Dirichlet allocation (LDA) and proposed a multimodal LDA model to learn representations for textual and visual data.
M3R (Wang Y et al. 2014) the multimodal mutual topic reinforce model. It seeks to discover mutually consistent semantic topics via appropriate interactions between model factors. These schemes represent data as topic distributions, and similarities are measured by the likelihood of observed data in terms of latent topics.
PFAR (Mao et al. 2013) parallel field alignment retrieval. a manifold-based model, which considers cross-media retrieval as a manifold alignment problem using parallel fields.
Deep learning:
Autoencoder model (Ngiam et al 2011) learn uniform representations for speech audios coupled with videos of the lip movements.
Deep restricted Boltzmann machine (Srivastava and Salakhutdinov 2012)
learn joint representations for multimodal data.
Deep CCA (Andrew et al. 2013) a deep extension of the traditional CCA method.
DT-RNNs (Socher et al. 2014) dependency tree recursive neural networks. employed dependency trees to embed sentences into a vector space in order to retrieve images described by those sentences.
Autoencoders (Feng et al.2014) and (Wang W et al.2014) applied autoencoder to perform cross-modality retrieval.
Multimodal deep learning scheme (Wang et al. 2015) learn accurate and compact multimodal representations for multimodal data. This method facilitates efficient similarity search and other related applications on multimodal data.
ICMAE (Zhang et al. 2014a) an attribute discovery approach, named the independent component multimodal autoencoder (ICMAE), which can learn
shared high-level representation to identify attributes from a set of image and text pairs. Zhang et al. (2016) further proposed to learn image-text uniform representation from web social multimedia content, which is noisy, sparse, and diverse under weak supervision.
Deep-SM (Wei et al. 2017) a deep semantic matching(deep-SM) method that uses the convolutional neural network and fully connected network to map images and texts into their label vectors, achieving state-of-the-art accuracy. CMDN (Peng et al., 2016a) cross-media multiple deep network (CMDN) is a hierarchical structure with multiple deep networks, and can simultaneously preserve intra-media and inter-media information to further improve the retrieval accuracy.
 
这一部分提到的Deep-SM (Wei et al. 2017),查了一下,来自于文章Cross-Modal Retrieval With CNN Visual Features: A New Baseline, 准备接下来抽时间看看。
 
  1. cross-media correlation understanding and deep mining;
Basically, existing studies construct correlation learning on cross-media data
with representation learning, metric learning, and matrix factorization, which are usually performed in a batch learning fashion and can capture only the first-order correlations among data objects. How to develop more effective learning mechanisms to capture the high-order correlations and adapt to the
evolution that naturally exists among heterogeneous entities and heterogeneous relations, is the key research issue for future studies in cross-media correlation understanding.
 
  1. cross-media knowledge graph construction and learning methodologies;
知识图谱应用实例:The Knowledge Graph released by Google in 2012 (Singhal, 2012) provided a next-generation information retrieval service with ontology-based intelligent search based on free-style user queries. Similar techniques, e.g., Safari, were developed based on achievements in entity-centric search (Lin et al.,2012).
 
  1. cross-media knowledge evolution and reasoning;
Reinforcement learning and transfer learning, can be helpful for constructing more complex intelligent reasoning systems (Lazaric, 2012). Furthermore, lifelong learning (Lazer et al.,2014) is the key capability of advanced intelligence systems.
应用实例:Google DeepMind has constructed a machine intelligence system based on a reinforcement learning algorithm (Gibney, 2015). AlphaGo, developed by Google DeepMind, has been the first computer Go program that can beat a top professional human Go player. It even beat the world champion Lee Sedol in a five-game match.
Visual question answering (VQA) can be regarded as a good example of cross-media reasoning (Antol et al., 2015). VQA aims to provide natural
language answers for questions given in the form of combination of the image and natural language.
 
  1. cross-media description and generation;
Existing studies on visual content description can be divided into three groups.
1 The first group, based on language generation, first understands images in terms of objects, attributes, scene types, and their correlations, and then connects these semantic understanding outputs to generate a sentence description using natural language generation techniques.
2 The second group covers retrieval-based methods, retrieving content that is similar to a query and transferring the descriptions of the similar set to the
query.
3 The third group is based on deep neural networks,employing the CNN-RNN codec framework, where the convolutional neural network (CNN) is used to
extract features from images, and the recursive neural network (RNN) (Socher et al., 2011) or its variant, the long short-term memory network (LSTM) (Hochreiter and Schmidhuber, 1997), is used to encode and decode language models.
 
 

综述类文章(Peng 等)阅读笔记Cross-media analysis and reasoning: advances and directions的更多相关文章

  1. 个性探测综述阅读笔记——Recent trends in deep learning based personality detection

    目录 abstract 1. introduction 1.1 个性衡量方法 1.2 应用前景 1.3 伦理道德 2. Related works 3. Baseline methods 3.1 文本 ...

  2. JDK1.8源码阅读笔记(1)Object类

    JDK1.8源码阅读笔记(1)Object类 ​ Object 类属于 java.lang 包,此包下的所有类在使⽤时⽆需⼿动导⼊,系统会在程序编译期间⾃动 导⼊.Object 类是所有类的基类,当⼀ ...

  3. 阅读文章《DDD 领域驱动设计-如何 DDD?》的阅读笔记

    文章链接: https://www.cnblogs.com/xishuai/p/how-to-implement-ddd.html 文章作者: 田园里的蟋蟀 首先感谢作者写出这么好的文章. 以下是我的 ...

  4. [论文阅读笔记] Unsupervised Attributed Network Embedding via Cross Fusion

    [论文阅读笔记] Unsupervised Attributed Network Embedding via Cross Fusion 本文结构 解决问题 主要贡献 算法原理 实验结果 参考文献 (1 ...

  5. Hadoop阅读笔记(七)——代理模式

    关于Hadoop已经小记了六篇,<Hadoop实战>也已经翻完7章.仔细想想,这么好的一个框架,不能只是流于应用层面,跑跑数据排序.单表链接等,想得其精髓,还需深入内部. 按照<Ha ...

  6. Hadoop阅读笔记(六)——洞悉Hadoop序列化机制Writable

    酒,是个好东西,前提要适量.今天参加了公司的年会,主题就是吃.喝.吹,除了那些天生话唠外,大部分人需要加点酒来作催化剂,让一个平时沉默寡言的码农也能成为一个喷子!在大家推杯换盏之际,难免一些画面浮现脑 ...

  7. Hadoop阅读笔记(五)——重返Hadoop目录结构

    常言道:男人是视觉动物.我觉得不完全对,我的理解是范围再扩大点,不管男人女人都是视觉动物.某些场合(比如面试.初次见面等),别人没有那么多的闲暇时间听你诉说过往以塑立一个关于你的完整模型.所以,第一眼 ...

  8. Hadoop阅读笔记(三)——深入MapReduce排序和单表连接

    继上篇了解了使用MapReduce计算平均数以及去重后,我们再来一探MapReduce在排序以及单表关联上的处理方法.在MapReduce系列的第一篇就有说过,MapReduce不仅是一种分布式的计算 ...

  9. Hadoop阅读笔记(二)——利用MapReduce求平均数和去重

    前言:圣诞节来了,我怎么能虚度光阴呢?!依稀记得,那一年,大家互赠贺卡,短短几行字,字字融化在心里:那一年,大家在水果市场,寻找那些最能代表自己心意的苹果香蕉梨,摸着冰冷的水果外皮,内心早已滚烫.这一 ...

随机推荐

  1. Django Rest framework序列化流程

    目录 一 什么是序列化 二 Django REST framework配置流程之Serializer 三 Django REST framework配置流程之ModelSerializer 一 什么是 ...

  2. mysql日期模糊查找的方法

    Mysql模糊查询有以下三种方法: 1.Convert转成日期时间型,在用Like查询.select * from table1 where convert(date,DATETIME) like ' ...

  3. Linux如何监控每个进程所消耗流量

    查看整个系统的网卡流量使用情况 可以参考下这篇总结比较全面的文章 监控具体的某个进程所消耗的流程 首先,Linux没有自带这样的工具,通过这款第三方开源工具,也是比较好用,如果有其他的办法欢迎留言 # ...

  4. HTML的BODY内标签介绍

    一.基本标签 <body> <b>加粗</b> <i>斜体</i> <u>下划线</u> <s>删除线& ...

  5. CentOS 7的Linux系统优化加固

    1.关闭selinux 2.关闭防火墙 3.关闭NetworkManager 4.为系统运维管理员创建普通用户,并配置sudo(vi  sudo) 5.清空泄漏系统版本信息的文件 6.基础优化sshd ...

  6. CentOS7.X+LAMP+zabbix4.2环境下搭建Grafana6.1数据库可视化

    1.GrafanaRPM包部署(yum  install  wget) wget https://dl.grafana.com/oss/release/grafana-6.1.4-1.x86_64.r ...

  7. 初始化springbean

    public class SMSMessageHandler implements InitializingBean { @Overridepublic void afterPropertiesSet ...

  8. STM32 LoRaWAN探索板B-L072Z-LRWAN1中文用户手册

    UM2115用户手册 支持LoRaWAN和 LPWAN协议的STM32L0探索套件 前言 B-L072Z-LRWAN1探索套件采用了 Murata公司的CMWX1ZZABZ-091 LoRa模块.该探 ...

  9. 模拟webpack 实现自己的打包工具

    本框架模拟webpack打包工具 详细代码个步骤请看git地址:https://github.com/jiangzhenfei/easy-webpack 创建package.json { " ...

  10. P1330 封锁阳光大学[搜索+染色]

    题目来源:洛谷 题目描述 曹是一只爱刷街的老曹,暑假期间,他每天都欢快地在阳光大学的校园里刷街.河蟹看到欢快的曹,感到不爽.河蟹决定封锁阳光大学,不让曹刷街. 阳光大学的校园是一张由N个点构成的无向图 ...