Xiang Bai——【CVPR2015】Symmetry-Based Text Line Detection in Natural Scenes

方法概括
- Step 1: 采用多尺度滑窗检测文本线的中心像素点，用对称特征和表观特征训练的随机森林得到候选的字符像素区域（两种特征是作者自己提的，文章亮点所在）；
- Step 2: 利用字符像素的角度和距离约束，将候选字符像素点聚合成字符串区域；
- Step 3: 用两个CNN分类器，字符级和字符串级，过滤非字符串区域，并采用常规的方法将文本线切成单词（不是重点，很简略）

Figure 2. Schematic pipeline of our symmetry-based text-line detection algorithm. (a) Input image; (b) Response map of the symmetry detector; (c) Symmetrical point grouping; (d) Estimated
bounding boxes based on the detected symmetrical axes. (e) Detection result after false alarm removal.

创新点和贡献
- idea出发点：人眼看图像中是否有文字，不需要逐字确认，甚至只需一瞥就可以确定，这是因为文字区域本身具有和背景不同的对称性和自相似性。也就是说，想确定文字区域，可以从通过两个角度出发，第一，不检测单个文字，而是检测整个文字串，利用整个串的整体信息；第二，寻找文字串本身的特性，对称性（上下）和自相似性（内部相同，但是和背景不同）

Figure 1. Though the sizes of the characters within the yellow rectangles are small, human can easily discover and localize such text lines.

- 创新点：
  - 提出了针对文字串（character group）的对称性（symmetry）特征；
  - 和传统方法不同，不通过检测字符，笔画来确定文字区域，而是检测文字串

方法细节

　　1. Symmetry-based 文本线候选区域生成

- feature extraction
  - Symmetry template
    - （x,y）表示大矩形（4s*4s）的中心点
    - 最小矩形大小为4s*s，包括R_T,R_MT,R_MB,R_{B四个矩形}
    - 中间矩形为红色区域，大小为4s*2s，包括R_M(由R_MT,R_MB两个矩形合成）

Figure 3. Left: Template used to compute the features for symmetry axis detection, which consists of four rectangles with equal size. The height and the width of each rectangle are s and 4s, respectively. The scale of the template is determined by s. Right: The contents within the two middle rectangles are similar to each other but dissimilar to the contents of the top and bottom rectangles. Therefore, the symmetry response on the center line (the adjacent edge of the two middle rectangles) of the text region should be high.

- - Symmetry feature
    - 每个矩形的特征直方图定义如下，c表示某一种特征（直方图表示）

- - - c的含义（共5中特征）
      - brightness-L*：LAB颜色空间中的L，32bin
      - color-a* ：LAB颜色空间中的a，32bin
      - color-b* ：LAB颜色空间中的b，32bin
      - texture-T* ：文献1中提取的纹理特征，？bin
      - gradient-G* ：梯度特征，16bin
    - 三种直方图的对称性特征
      - 文字区域的上半部和下半部的对称性：
      - 文字区域的上半部与背景的差异：
      - 文字区域的下半部与背景的差异：
    - 总的symmetry feature的特征维度
      - 5种cue*3种对称性= 15维
  - appearance feature：采用文献2的LBP算法，取59个bin
  - total feature calculation ： 15维symmetry feature + 59维的appearance feature = 74维特征（注意特征是对每个中心点的特征）

- symmetry axis detection

- - 分类器：random forest-50
  - 样本：
    - 正样本：距离groundTruth小于2个pixels的像素点，共45万个
    - 负样本：距离groundTruth大于5个pixels的像素点，共45万个
  - 训练尺度：
    - 正样本：1种尺度，s等于groundTruth的bounding box的高度的一半
    - 负样本：24种尺度，s= [2,256]
  - 测试尺度：24种尺度，多种尺度进行非极大值抑制
- proposals generation
  - group pixels into fragments
    - 像素距离小于3的合并成fragments
  - aggregate the fragments into text lines
    - angular diference constraint:

- - - - distance constraint

　　2. 用CNN进行文本线噪声过滤

- 先用字符级CNN过滤，再用单词级CNN过滤（文中没有提到有关CNN的相关细节）
- 字符级样本：文献3的字符数据库
- 单词级样本：ICDAR2011，SVT, IIIT5K-word，PASCAL-VOC, BSD500的样本
- 文本线切分成单词的方法参考文献3

　　3. 多尺度进行检测

Figure 4. Procedure of text line proposal generation. (a) Input image. (b) Feature extraction at multiple scales. (c) Symmetry probability maps. (d) Axes sought in the symmetry probability maps. (e) Bounding box estimation. (f) Proposals from different scales

实验结果
- 实验速度：平均30s每张图（Matlab, 2.0GHz 8-core CPU, 64G RAM and Windows 64-bit OS）
- Symmetry和Appearance特征的实验效果

- ICDAR2011

- ICDAR2013

- SVT

- 其他语言的扩展

问题讨论
- 本方法的不足
  - 速度慢
  - 只能处理水平、近水平的文字

总结与收获点
- 现在的文字检测方法越来越偏向于利用文字上下文信息检测文本，都喜欢一开始就检测文本块，文本行，而不再像原来一样先检测单个字符，因为这种方法确实更鲁棒
- 文字的对称性特征挺好的，从低级特征中提取，可以扩展到其他问题中，先mark
- 文中举出了一些文字检测的难点的案例，非常有代表性
  - 对比度低：上图——(b), (i)，下图——(c)
  - 笔画断裂：上图——(c)
  - 光照影响：上图——(g), 下图——(a),(b)
  - 点矩阵字：上图——(a),(j)
  - 分辨率低：上图——(g)
  - 字符相连：上图——(h)
  - 单个字符：下图——(f)
  - 字符大小差异很大：下图——(d)

参考文献
1. D. R. Martin, C. Fowlkes, and J. Malik. Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Trans. Pattern Anal. Mach. Intell., 26(5):530–549, 2004.
2. T. Ojala, M. Pietik¨ainen, and T. M¨aenp¨a¨a. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell., 24(7):971–987, 2002.
3. M. Jaderberg, A. Vedaldi, and A. Zisserman. Deep features for text spotting. In Proc. of ECCV, 2014.

论文阅读（Xiang Bai——【CVPR2015】Symmetry-Based Text Line Detection in Natural Scenes）的更多相关文章

论文阅读笔记四：CTPN: Detecting Text in Natural Image with Connectionist Text Proposal Network(ECCV2016)
前面曾提到过CTPN,这里就学习一下,首先还是老套路,从论文学起吧.这里给出英文原文论文网址供大家阅读:https://arxiv.org/abs/1609.03605. CTPN,以前一直认为缩写一 ...
#论文阅读# Universial language model fine-tuing for text classification
论文链接:https://aclweb.org/anthology/P18-1031 对文章内容的总结文章研究了一些在general corous上pretrain LM,然后把得到的model t ...
论文阅读 | HotFlip: White-Box Adversarial Examples for Text Classification
[code] [pdf] 白盒 beam search 基于梯度字符级
论文阅读笔记三十五：R-FCN:Object Detection via Region-based Fully Convolutional Networks（CVPR2016）
论文源址:https://arxiv.org/abs/1605.06409 开源代码:https://github.com/PureDiors/pytorch_RFCN 摘要提出了基于区域的全卷积网 ...
论文阅读 | FPN：Feature Pyramid Networks for Object Detection
论文地址:https://arxiv.org/pdf/1612.03144v2.pdf 代码地址:https://github.com/unsky/FPN 概述 FPN是FAIR发表在CVPR 201 ...
论文阅读笔记七：Structure Inference Network:Object Detection Using Scene-Level Context and Instance-Level Relationships(CVPR2018)
结构推理网络:基于场景级与实例级目标检测原文链接:https://arxiv.org/abs/1807.00119 代码链接:https://github.com/choasup/SIN Yong ...
论文阅读（Xiang Bai——【arXiv2016】Scene Text Detection via Holistic, Multi-Channel Prediction）
Xiang Bai--[arXiv2016]Scene Text Detection via Holistic, Multi-Channel Prediction 目录作者和相关链接方法概括创新 ...
论文阅读（Xiang Bai——【PAMI2017】An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition）
白翔的CRNN论文阅读 1. 论文题目 Xiang Bai--[PAMI2017]An End-to-End Trainable Neural Network for Image-based Seq ...
论文阅读（Xiang Bai——【TIP2014】A Unified Framework for Multi-Oriented Text Detection and Recognition）
Xiang Bai--[TIP2014]A Unified Framework for Multi-Oriented Text Detection and Recognition 目录作者和相关链接 ...

随机推荐

FineUI 基于 ExtJS 的专业 ASP.NET 控件库
FineUI 基于 ExtJS 的专业 ASP.NET 控件库 http://www.fineui.com/
windows 安装MySql
转载:http://blog.csdn.net/longyuhome/article/details/7913375 Win7系统安装MySQL5.5.21图解大家都知道MySQL是一款中.小型关系 ...
初用idea建立javaweb遇到的问题与心得
1.直接用idea建立的web项目,其自动生成的web.xml里version=3.1,这样的话建立servlet-name等标签会报错(因为3.1不支持这种做法,更提倡用注解的办法),解决办法是将w ...
Oracle ITL(Interested Transaction List)理解
ITL(Interested Transaction List) ITL是位于数据块头部的事物槽列表,它是由一系列的ITS(Interested Transaction Slot,事物槽)组成,其初始 ...
20145205 《Java程序设计》实验报告五：Java网络编程及安全
20145205 <Java程序设计>实验报告五:Java网络编程及安全实验要求 1.掌握Socket程序的编写: 2.掌握密码技术的使用: 3.客户端中输入明文,利用DES算法加密,D ...
用 Python 排序数据的多种方法
用 Python 排序数据的多种方法目录 [Python HOWTOs系列]排序 Python 列表有内置就地排序的方法 list.sort(),此外还有一个内置的 sorted() 函数将一个可迭 ...
viewport
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale= ...
android中工作线程安全
当应用程序启动,创建了一个叫“main”的线程,用于管理UI相关,又叫UI线程.其他线程叫工作线程(Work Thread). Single Thread Model 一个组件的创建并不会新建一个线程 ...
PHP 检测变量是否为空
PHP 中以下值得计算结果为 false: 关键字 boolean false 整型 integer 0 浮点型 double 0.0 字符串 string "" 字符串 str ...
Memcache,Redis,MongoDB（数据缓存系统）方案对比与分析
mongodb和memcached不是一个范畴内的东西.mongodb是文档型的非关系型数据库,其优势在于查询功能比较强大,能存储海量数据.mongodb和memcached不存在谁替换谁的问题. 和 ...

论文阅读（Xiang Bai——【CVPR2015】Symmetry-Based Text Line Detection in Natural Scenes）

Xiang Bai——【CVPR2015】Symmetry-Based Text Line Detection in Natural Scenes

目录

作者和相关链接

方法概括

创新点和贡献

方法细节

1. Symmetry-based 文本线候选区域生成

2. 用CNN进行文本线噪声过滤

3. 多尺度进行检测

实验结果

问题讨论

总结与收获点

参考文献

论文阅读（Xiang Bai——【CVPR2015】Symmetry-Based Text Line Detection in Natural Scenes）的更多相关文章

随机推荐

热门专题

　　1. Symmetry-based 文本线候选区域生成

　　2. 用CNN进行文本线噪声过滤

　　3. 多尺度进行检测