ChengLin Liu_ICCV2017_Deep Direct Regression for Multi-Oriented Scene Text Detection

作者

关键词

文字检测、多方向、直接回归、4个点、one-stage

方法亮点

第一次提出Direct Regression这个概念
提出用Scale & shift方案来降低坐标位置学习的难度

方法概述

本文方法首次提出直接回归的概念，用自己搭建的FPN网络结构，直接学习四个点相对于中心点（feature map上的某个点）的偏移量，并用Scale & shift方案来缩小要学习目标的值范围。

方法框架流程

方法细节

Direct Regression概念

Faster R-CNN、SSD这种需要用anchor进行参考，学习的是目标和groundTruth参数分别相对于anchor的偏移量，这种方法称为不直接回归。直接回归指的是直接学习目标和groundTruth参数的偏移量，而不利用anchor作为中间桥梁。

Figure 1. Visualized explanation of indirect and direct regression. The solid green lines are boundaries of text “Gallery”, the dash blue lines are boundaries of text proposal, and the dashed yellow vectors are the ground truths of regression task. (a) The indirect regression predicts the offsets from a proposal. (b) The direct regression predicts the offsets from a point.

网络结构

自己设计的网络结构。

anchor方案针对长条形斜文字的问题

Figure 2. Illustration for the deficiency of anchor mechanism in detecting long and heavily inclined text words or lines. The solid yellow lines are boundaries of the text line and the dashed lines are boundaries of anchors. There is no anchor that has sufficient overlap with the text line in this image.

分类损失

Scale & shift

之所以采用这个方式是因为假设文字大小少于400。因此，本来要学习的z范围是0~400，但通过这种scale and shift之后要学习的目标范围变成了0~1，更易于回归（类似于一个normalize的过程）。

smooth-L1损失

Recalled Non-Maximum Suppression

思路：先用正常NMS，然后分数低的框都往分数最高的框那边移动（有overlap的情况下），最后再合并相近的框

GroundTruth生成

把距离文字中心线上像素距离为r的点为正样本，其他文字区域内的部分置位"NOT CARE"。对于某些偏小的文字或者是偏大的文字，都设为"NOT CARE"。之所以这样设计是因为减少text和non-text之间的confusion。

Figure 5. Visualized ground truths of multi-task. (a) The left map is the ground truth for classification task, where the yellow regions are positive, enclosed by “NOT CARE” regions colored in light sea-green. The right map is the ground truth of “top-left” channel for regression task. Values grow smaller from left to right within a word region as pixels are farther from the top left corner. (b) The corresponding input image of the ground truths.

其他细节点

数据扩增：样本随机旋转0，90，180，270度
网络的定位损失权重先减少后增加（The network should learn what the text is first and then learn to localize the text ）

实验结果

ICDAR15
MSRA-TD500

ICDAR2013

总结与收获

这篇文章是第一次提出直接回归的概念，对后来有些文章的思想还是比较有启发性的。

【论文速读】ChengLin_Liu_ICCV2017_Deep_Direct_Regression_for_Multi-Oriented_Scene_Text_Detection的更多相关文章

论文速读（Chuhui Xue——【arxiv2019】MSR_Multi-Scale Shape Regression for Scene Text Detection）
Chuhui Xue--[arxiv2019]MSR_Multi-Scale Shape Regression for Scene Text Detection 论文 Chuhui Xue--[arx ...
论文速读（Jiaming Liu——【2019】Detecting Text in the Wild with Deep Character Embedding Network ）
Jiaming Liu--[2019]Detecting Text in the Wild with Deep Character Embedding Network 论文 Jiaming Liu-- ...
论文速读（Yongchao Xu——【2018】TextField_Learning A Deep Direction Field for Irregular Scene Text）
Yongchao Xu--[2018]TextField_Learning A Deep Direction Field for Irregular Scene Text Detection 论文 Y ...
【论文速读】Cong_Yao_CVPR2017_EAST_An_Efficient_and_Accurate_Scene_Text_Detector
Cong_Yao_CVPR2017_EAST_An_Efficient_and_Accurate_Scene_Text_Detector 作者和代码非官方版tensorflow实现非官方版kera ...
【论文速读】Yuliang Liu_2017_Detecting Curve Text in the Wild_New Dataset and New Solution
Yuliang Liu_2017_Detecting Curve Text in the Wild_New Dataset and New Solution 作者和代码 caffe版代码关键词文字 ...
【论文速读】XiangBai_CVPR2018_Rotation-Sensitive Regression for Oriented Scene Text Detection
XiangBai_CVPR2018_Rotation-Sensitive Regression for Oriented Scene Text Detection 作者和代码 caffe代码关键词 ...
【论文速读】XiangBai_TIP2018_TextBoxes++_A Single-Shot Oriented Scene Text Detector
XiangBai_TIP2018_TextBoxes++_A Single-Shot Oriented Scene Text Detector 作者和代码 Minghui Liao, Baoguang ...
【论文速读】Shitala Prasad_ECCV2018】Using Object Information for Spotting Text
Shitala Prasad_ECCV2018]Using Object Information for Spotting Text 作者和代码关键词文字检测.水平文本.FasterRCNN.xy ...
【论文速读】Sheng Zhang_AAAI2018_Feature Enhancement Network_A Refined Scene Text Detector
Sheng Zhang_AAAI2018_Feature Enhancement Network_A Refined Scene Text Detector 作者关键词文字检测.水平文字.Fast ...

随机推荐

2019-2-14sql server数据库模糊查询语句
sql server数据库模糊查询语句确切匹配: select * from hs_user where ID=123 模糊查询 select * from hs_user where ID l ...
69. x 的平方根
问题描述实现 int sqrt(int x) 函数. 计算并返回 x 的平方根,其中 x 是非负整数. 由于返回类型是整数,结果只保留整数的部分,小数部分将被舍去. 示例 1: 输入: 4 输出: ...
JS对象1
1 String对象字符串创建 (1) 直接创建 var s="hello"; console.log(s); console.log(typeof s) >> he ...
Java 常见异常种类
Java Exception: 1.Error 2.Runtime Exception 运行时异常3.Exception 4.throw 用户自定义异常异常类分两大类型:Error类代表了编译和系统 ...
react_结合 redux - 高阶函数 - 高阶组件 - 前端、后台项目打包运行
Redux 独立的集中式状态管理 js 库 - 参见 My Git 不是 react 库,可以与 angular.vue 配合使用,通常和 react 用 yarn add redux import ...
Redis 中可以存储的五种基本类型
具体介绍数字还是字符? String(字符串) 二进制安全可以包含任何数据,比如jpg图片或者序列化的对象,一个键最大能存储512M --- Hash(字典) 键值对集合,即编程语言中的Map类型 ...
mysql百万级全文索引及match快速查找
建立全文索引的表的存储引擎类型必须为MyISAM 问题是match against对中文模糊搜索支持不是太好新建一个utf8 MyISAM类型的表并建立一个全文索引 : CREATE TABL ...
HDU 6311 Cover （无向图最小路径覆盖）
HDU 6311 Cover (无向图最小路径覆盖) Time Limit: 6000/3000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/ ...
yum安装mysql5.7
[root@ycj ~]# wget -i -c http://dev.mysql.com/get/mysql57-community-release-el7-10.noarch.rpm //下载安装 ...
c语言，以单词为单位逆序字符串
#include "string.h" #include "stdio.h" char * nixu(char *c) { ; int n = strlen(c ...

【论文速读】ChengLin_Liu_ICCV2017_Deep_Direct_Regression_for_Multi-Oriented_Scene_Text_Detection

ChengLin Liu_ICCV2017_Deep Direct Regression for Multi-Oriented Scene Text Detection

作者

关键词

方法亮点

方法概述

方法框架流程

方法细节

Direct Regression概念

网络结构

anchor方案针对长条形斜文字的问题

分类损失

分类损失

Recalled Non-Maximum Suppression

GroundTruth生成

其他细节点

实验结果

总结与收获

【论文速读】ChengLin_Liu_ICCV2017_Deep_Direct_Regression_for_Multi-Oriented_Scene_Text_Detection的更多相关文章

随机推荐

热门专题