论文笔记：Dynamic Multimodal Instance Segmentation Guided by Natural Language Queries

Dynamic Multimodal Instance Segmentation Guided by Natural Language Queries
2018-09-18 09:58:50

Paper：http://openaccess.thecvf.com/content_ECCV_2018/papers/Edgar_Margffoy-Tuay_Dynamic_Multimodal_Instance_ECCV_2018_paper.pdf

GitHub：https://github.com/BCV-Uniandes/query-objseg (PyTorch)

　　Code: https://github.com/chenxi116/TF-phrasecut-public (Tensorflow)

2. Segmentation from Natural Language Expressions　　ECCV 2016

本文就是在给定 language 后，从图像中分割出所对应的目标物体。所设计的 model，如下所示：

1. Visual Module (VM) ：

　　本文采用 Dual Path Network 92 (DPN92) 来提取 visual feature；

2. Language Module (LM)：

本文采用的是 sru，一种新型的快速的 sequential 网络结构。sru 定义为：

我们把 embedding 以及 hidden state 进行 concatenate，然后得到文本中每一个单词的表达，即： r_t. 有了这个之后，我们基于 rt 来计算一系列的动态滤波 f_k,t，定义为：

这样，我们可以根据文本 w，就可以得到文本的特征表达以及对应的动态滤波，即：

3. Synthesis Module (SM)：

SM 是我们框架的核心，用于融合多个模态的信息。如图5所示，我们首先将 I_N以及空间位置的表达，进行 concatenate，然后用 dynamic filter 对这个结果进行卷积，得到一个响应图，RESP，由 K 个 channel 组成。下一步，我们将 I_N，LOC，以及 F_t 沿着 channel dimension 进行 concatenate，得到一个表达 I’。最终，我们用 1*1 的卷积来融合所有的信息，每一个时间步骤，我们有一个输出，即作为 M_t，最终，表达为：

下一步，我们用 mSRU 来产生一个 3D 的 tensor。

4. Upsampling Module (UM) :

最终，我们采用上采样的方式，得到最终分割的 map 结果。

===== 几点疑问：

1. 作者将 spatial LOC 的信息也结合到网络中？

The same operation can also be found from the reference papers:

1. Segmentation from Natural Language Expressions ECCV 2016

2. Recurrent Multimodal Interaction for Referring Image Segmentation ICCV 2017

　　In the paper "Segmentation from Natural Language Expressions", I find the following parts to explain why we should use the spatial location information and concatenate with image feature maps.

2. Run the code successfully.

 wangxiao@AHU:/DMS$ python3 -u -m dmn_pytorch.train --backend dpn92 --num-filters 10 --lang-layers 3 --mix-we --accum-iters 1

 /usr/local/lib/python3.6/site-packages/torch/utils/cpp_extension.py:118: UserWarning: 

                                !! WARNING !!

 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

 Your compiler (c++) may be ABI-incompatible with PyTorch!

 Please use a compiler that is ABI-compatible with GCC 4.9 and above.

 See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.

 See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6

 for instructions on how to install GCC 4.9 or higher.

 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                               !! WARNING !!

   warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler))

 Argument list to program

 --data /DMS/referit_data

 --split_root /DMS/referit_data/referit/splits/referit

 --save_folder weights/

 --snapshot weights/qseg_weights.pth

 --num_workers 2

 --dataset unc

 --split train

 --val None

 --eval_first False

 --workers 4

 --no_cuda False

 --log_interval 200

 --backup_iters 10000

 --batch_size 1

 --epochs 40

 --lr 1e-05

 --patience 2

 --seed 1111

 --iou_loss False

 --start_epoch 1

 --optim_snapshot weights/qsegnet_optim.pth

 --accum_iters 1

 --pin_memory False

 --size 512

 --time -1

 --emb_size 1000

 --hid_size 1000

 --vis_size 2688

 --num_filters 10

 --mixed_size 1000

 --hid_mixed_size 1005

 --lang_layers 3

 --mixed_layers 3

 --backend dpn92

 --mix_we True

 --lstm False

 --high_res False

 --upsamp_mode bilinear

 --upsamp_size 3

 --upsamp_amplification 32

 --dmn_freeze False

 --visdom None

 --env DMN-train

 Processing unc: train set

 loading dataset refcoco into memory...

 creating index...

 index created.

 DONE (t=5.78s)

 Saving dataset corpus dictionary...

 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 42404/42404 [10:18<00:00, 68.56it/s]

 Processing unc: val set

 loading dataset refcoco into memory...

 creating index...

 index created.

 DONE (t=21.52s)

 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3811/3811 [00:53<00:00, 71.45it/s]

 Processing unc: trainval set

 loading dataset refcoco into memory...

 creating index...

 index created.

 DONE (t=4.97s)

 0it [00:00, ?it/s]

 Processing unc: testA set

 loading dataset refcoco into memory...

 creating index...

 index created.

 DONE (t=5.24s)

 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1975/1975 [00:27<00:00, 72.62it/s]

 Processing unc: testB set

 loading dataset refcoco into memory...

 creating index...

 index created.

 DONE (t=5.06s)

 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1810/1810 [00:31<00:00, 57.91it/s]

 Train begins...

 [ 1] ( 0/120624) | ms/batch 456.690311 | loss 3.530792 | lr 0.0000100

 [ 1] ( 200/120624) | ms/batch 273.972313 | loss 1.487153 | lr 0.0000100

 [ 1] ( 400/120624) | ms/batch 257.813077 | loss 1.036689 | lr 0.0000100

 [ 1] ( 600/120624) | ms/batch 251.565860 | loss 1.047311 | lr 0.0000100

 [ 1] ( 800/120624) | ms/batch 249.070073 | loss 1.657688 | lr 0.0000100

 [ 1] ( 1000/120624) | ms/batch 246.906650 | loss 1.815347 | lr 0.0000100

 [ 1] ( 1200/120624) | ms/batch 245.645234 | loss 2.601908 | lr 0.0000100

 [ 1] ( 1400/120624) | ms/batch 245.039105 | loss 1.495383 | lr 0.0000100

 [ 1] ( 1600/120624) | ms/batch 244.460579 | loss 1.441855 | lr 0.0000100

论文笔记：Dynamic Multimodal Instance Segmentation Guided by Natural Language Queries的更多相关文章

论文笔记《Feedforward semantic segmentation with zoom-out features》
[论文信息] <Feedforward semantic segmentation with zoom-out features> CVPR 2015 superpixel-level,f ...
论文笔记 - Calibrate Before Use: Improving Few-Shot Performance of Language Models
Motivation 无需参数更新的 In-Context Learning 允许使用者在无参数的更新的情况下完成新的下游任务,交互界面是纯粹的自然语言,无 NLP 技术基础的用户也可以创建 NLP ...
论文阅读笔记二十二：End-to-End Instance Segmentation with Recurrent Attention（CVPR2017）
论文源址:https://arxiv.org/abs/1605.09410 tensorflow 代码:https://github.com/renmengye/rec-attend-public 摘 ...
Multimodal —— 看图说话（Image Caption）任务的论文笔记（一）评价指标和NIC模型
看图说话(Image Caption)任务是结合CV和NLP两个领域的一种比较综合的任务,Image Caption模型的输入是一幅图像,输出是对该幅图像进行描述的一段文字.这项任务要求模型可以识别图 ...
Rank & Sort Loss for Object Detection and Instance Segmentation 论文解读（含核心源码详解）
第一印象 Rank & Sort Loss for Object Detection and Instance Segmentation 这篇文章算是我读的 detection 文章里面比较难 ...
论文笔记：A Review on Deep Learning Techniques Applied to Semantic Segmentation
A Review on Deep Learning Techniques Applied to Semantic Segmentation 2018-02-22 10:38:12 1. Intr ...
论文笔记《Fully Convolutional Networks for Semantic Segmentation》
一.Abstract 提出了一种end-to-end的做semantic segmentation的方法,也就是FCN,是我个人觉得非常厉害的一个方法. 二.亮点 1.提出了全卷积网络的概念,将Ale ...
论文笔记系列-Auto-DeepLab:Hierarchical Neural Architecture Search for Semantic Image Segmentation
Pytorch实现代码:https://github.com/MenghaoGuo/AutoDeeplab 创新点 cell-level and network-level search 以往的NAS ...
Twitter 新一代流处理利器——Heron 论文笔记之Heron架构
Twitter 新一代流处理利器--Heron 论文笔记之Heron架构标签(空格分隔): Streaming-process realtime-process Heron Architecture ...

随机推荐

Retrofit2 项目配置
在项目的 app build.gradle 文件中加入 dependencies { // Retrofit2implementation 'com.squareup.retrofit2:retro ...
Java高并发系列 — AQS
只懂volatile和CAS是不是可以无视concurrent包了呢,发现一个好链接,继续死磕,第一日: 首先,我承认很多时候要去看源码才能更好搞懂一些事,但如果站在巨人肩膀上呢?有了大概思想源码看还 ...
java学习之路--StringBuffer常见的功能和实例
---恢复内容开始--- 储存 StringBuffer append();将指定数据作为参数添加到已有数据尾处 StringBuffer insert(index,数据):可以将数据插到指定的ind ...
Jupyter notebook安装
之前就装了jupyter notebook,但今天打开来发现是python2,并且似乎没法转换到python3??? so,再把python3的版本安装一下打开CMD pip install jup ...
juqery 回车事件回车操作回车搜索
html <form class="search_wrap" method="post" action=""> <div ...
Alpine Linux常用命令
一:Alpine Linux开启SSH远程登陆 1.简介: 最重要的一个服务了,远程登陆需要用它,文件传输需要用它,必备功能.不管你是在实体机上跑,虚拟机上跑,docker里面跑,这个都是必须的. 2 ...
基于 ASP.NET Core 2.1 的 Razor Class Library 实现自定义错误页面的公用类库
注意:文中使用的是 razor pages ,建议使用 razor views ,使用 razor pages 有一个小坑,razor pages 会用到 {page} 路由参数,如果应用中也用到了这 ...
JDBC最原始的代码做查询操作
首先编写一个User类. public class User { private String username; private String password; public String get ...
java框架之MyBatis(2)-进阶&整合Spring&逆向工程
进阶内容准备 jdbc.url=jdbc:mysql://192.168.208.192:3306/test?characterEncoding=utf-8 jdbc.driver=com.mysq ...
python类中的内置函数
__init__():__init__方法在类的一个对象被建立时,马上运行.这个方法可以用来对你的对象做一些你希望的初始化.注意,这个名称的开始和结尾都是双下划线.代码例子: #!/usr/bin/p ...

论文笔记：Dynamic Multimodal Instance Segmentation Guided by Natural Language Queries

论文笔记：Dynamic Multimodal Instance Segmentation Guided by Natural Language Queries的更多相关文章

随机推荐

热门专题