行为识别(action recognition)相关资料

转自：http://blog.csdn.net/kezunhai/article/details/50176209

================华丽分割线=================这部分来自知乎====================

链接：http://www.zhihu.com/question/33272629/answer/60279003

有关action recognition in videos, 最近自己也在搞这方面的东西，该领域水很深，不过其实主流就那几招，我就班门弄斧说下video里主流的：

Deep Learning之前最work的是INRIA组的Improved Dense Trajectories(IDT) + fisher vector, paper and code:
LEAR - Improved Trajectories Video Description
基本上INRIA的东西都挺work 恩..

然后Deep Learning比较有代表性的就是VGG组的2-stream:
http://arxiv.org/abs/1406.2199
其实效果和IDT并没有太大区别，里面的结果被很多人吐槽难复现，我自己也试了一段时间才有个差不多的数字。

然后就是在这两个work上面就有很多改进的方法，目前的state-of-the-art也是很直观可以想到的是xiaoou组的IDT+2-stream:
http://wanglimin.github.io/papers/WangQT_CVPR15.pdf

还有前段时间很火，现在仍然很多人关注的G社的LSTM+2-stream:
http://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/43793.pdf

然后安利下zhongwen同学的paper:
http://www.cs.cmu.edu/~zhongwen/pdf/MED_CNN.pdf

最后你会发现paper都必需和IDT比，

================华丽分割线=================这部分也来自知乎====================

链接：http://www.zhihu.com/question/33272629/answer/60163859

视频方面的不了解，可以聊一聊静态图像下的~
[1] Action Recognition from a Distributed Representation of Pose and
Appearance, CVPR,2010
[2] Combining Randomization and Discrimination for Fine-Grained Image
Categorization, CVPR,2011
[3] Object and Action Classification with Latent Variables, BMVC, 2011
[4] Human Action Recognition by Learning Bases of Action Attributes and Parts,
ICCV, 2011
[5] Learning person-object interactions for action recognition in still images,
NIPS, 2011
[6] Weakly Supervised Learning of Interactions between Humans and Objects,
PAMI, 2012
[7] Discriminative Spatial Saliency for Image Classification, CVPR, 2012
[8] Expanded Parts Model for Human Attribute and Action Recognition in Still
Images, CVPR, 2013
[9] Coloring Action Recognition in Still Images, IJCV, 2013
[10] Semantic Pyramids for Gender and Action Recognition, TIP, 2014
[11] Actions and Attributes from Wholes and Parts, arXiv, 2015
[12] Contextual Action Recognition with R*CNN, arXiv, 2015
[13] Recognizing Actions Through Action-Specific Person Detection, TIP, 2015

2010之前的都没看过，在10年左右的这几年（11,12）主要的思路有3种：1.以所交互的物体为线索（person-object interaction），建立交互关系，如文献5,6；2.建立关于姿态（pose）的模型，通过统计姿态（或者更广泛的，部件）的分布来进行分类，如文献1,4，还有个poselet上面好像没列出来，那个用的还比较多；3.寻找具有鉴别力的区域（discriminative），抑制那些meaningless 的区域，如文献2,7。10和11也用到了这种思想。
文献9,10都利用了SIFT以外的一种特征：color name，并且描述了在动作分类中如何融合多种不同的特征。
文献12探讨如何结合上下文（因为在动作分类中会给出人的bounding box）。
比较新的工作都用CNN特征替换了SIFT特征（文献11,12,13），结果上来说12是最新的。

静态图像中以分类为主，检测的工作出现的不是很多，文献4,13中都有关于检测的工作。可能在2015之前分类的结果还不够promising。现在PASCAL VOC 2012上分类mAP已经到了89%，以后的注意力可能会更多地转向检测。

================华丽分割线=================这部分来自互联网====================

[1] http://lear.inrialpes.fr/software(干货较多，可以进去浏览浏览)

[2] Action Recognition
Paper Reading

Tian, YingLi, et
al. "Hierarchical filtered motion for action recognition in crowded
videos." Systems, Man, and Cybernetics, Part C: Applications and Reviews,
IEEE Transactions on 42.3 (2012): 313-323.

A new 3D interest point
detector, based on 2D Harris and Motion History Image (MHI). Essentially, 2D
Harris points with recent motion are selected as interest point.
A new descriptors based on HOG
on image intensity and MHI. Some filtering is performed to remove cluttered
motion and normalize descriptors.
KTH and MSR Action dataset

Yuan, Junsong,
Zicheng Liu, and Ying Wu. "Discriminative subvolume search for efficient
action detection." Computer Vision and Pattern Recognition, 2009. CVPR
2009. IEEE Conference on. IEEE, 2009.

A discriminative matching
techniques based on mutual information and nearest neighbor algorithm
A better upper bound for
Branching and Bounding to locate matched action that maximize mutual
information
The key idea is to decompose
the search space into spatial and temporal.

Lampert, Christoph
H., Matthew B. Blaschko, and Thomas Hofmann. "Beyond sliding windows:
Object localization by efficient subwindow search." Computer Vision and
Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008.

Code online: https://sites.google.com/site/christophlampert/software (Efficient
Subwindow Search)
Reducing the complexity of
sliding window from n4 to averagely n2
Branching and Bounding
techniques
Relies on a bounding funtion
that gives a upper bound of the scoring function over a set of potential box
works well with linear
classifiers and BOW features.

Li, Li-Jia, et
al. "Object Bank: A High-Level Image Representation for Scene
Classification & Semantic Feature Sparsification." NIPS. Vol. 2. No.
3. 2010.

Images are represented as a
scale-invariant map of object detector response
Detectors are applied to novel
images in multiple scales. At each scale, a 3 level spatial pyramid is applied.
Responses are concatenated to form the descriptors for the image.
200 objecst are selected from a
1000 objects pool
Evaluated In Scene classification
task
L1 and L1/L2 regularized LR is
applied to discover sparsity. The the L1/L2 group sparsity, group is defined
for each object, hence object level sparsity. Bear in mind that there are
multiple entries in the descriptors for each object. (marginal improvements)

Wang, Heng, et
al. "Dense trajectories and motion boundary descriptors for action
recognition." International journal of computer vision 103.1 (2013):
60-79.

Tracking over densely sampled
points to get trajectories, in contrast with local representation. Not really
dense sampling, grids are filtered by minEigen value criterion (Shi and Tomasi)
Motion boundary (derivative
over optical flow field), to overcome camera motion
Code online: http://lear.inrialpes.fr/people/wang/dense_trajectories
Optical Flow field is filtered
by Median Filter. based on opencv
Limit trajectory to overcome
drift. Filter static point and error trajectories.
Trajectory shape, HOG, HOF and MBH
descriptors along the trajectory
KTH (94.2%), Youtube (84.1%),
Hollywood2 (58.2%), UCF Sports (88.0%), IXMAS (93.5%), UIUC (98.4%), Olympic
Sports (74.1%), UCF50 (84.5%), HMDB51 (46.6%)

Liang, Xiaodan,
Liang Lin, and Liangliang Cao. "Learning latent spatio-temporal
compositional model for human action recognition." Proceedings of the 21st
ACM international conference on Multimedia. ACM, 2013.

Laptev STIP with HOF and HOG,
with BOW quantization
Leaf node for detecting action
parts
Or node to account for
intra-class variability
And node to aggregate action in
a frame
Root node to identify temporal
composition
Contextual interaction
(connecting leaf nodes)
Everything is formulated in a
latent SVM framework and solved by CCCP
Since the leaf node can move
around from one Or-node to another, a reconfiguration step is used to rearrange
the feature vector
UCF Youtube and Olympic Sports
dataset

Sadanand,
Sreemanananth, and Jason J. Corso. "Action bank: A high-level
representation of activity in video." Computer Vision and Pattern
Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.

98.2% KTH, 95.0% UCF Sports,
57.9% UCF50, 26.9% HMDB51
205 video clips used as
template to detect action from novel video.
Detectors are sampled from
multi viewpoint and run with multiple scales
Output of detectors are
maxpooled for ST volume through various pooling unit
"Action Spoting" for
template detector
Code online: http://www.cse.buffalo.edu/~jcorso/r/actionbank/

Liu, Jingen,
Benjamin Kuipers, and Silvio Savarese. "Recognizing human actions by
attributes." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE
Conference on. IEEE, 2011.

22 manually selected action
attributes as semantic representation
Data Driven attributes as
complementary information
Attributes as latent variable,
just the parts in DPM model
Account for the class matching,
attribute matching, attributes cooccurcance.
STIP by 1D-Gabor detector.
Gradient based + BOW over ST volume
UIUC dataset, KTH, Olympic
Sports Dataset

Niebles, Juan
Carlos, Hongcheng Wang, and Li Fei-Fei. "Unsupervised learning of human
action categories using spatial-temporal words." International Journal of
Computer Vision 79.3 (2008): 299-318.

Unsupervised video
categorizaton, using pLSA and LDA
Action Localization
Laptev's STIP is too sparse
comparing with Dollar's
Simple gradient based
descriptors and PCA applied to reduce dimensionality --> rely on codebook to
deal with invariance
K-means with Euclidean distance
metric
pLSA or LDA on top of BOW (#
topic is equal to the categories to be recognized)
Each STIP is associated with a
BOW, hence topic distribution, so it's trivial to perform Localization

Laptev, Ivan, et
al. "Learning realistic human actions from movies." Computer Vision
and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008.

Annotating videos by aligning
transcriptes
A movie dataset
Space-Time interest points +
HOG + HOF around a ST volume
ST BOW. Given a video sequence,
multiple way to segment it, each of which is called a channel
Multi-Channel \chi^2 kernel
classification. Channel selection using greedy shrink
KTH (91.8%) and Movie (18.2% ~
53.3%) dataset
STIP + HOG and HOF code: http://www.di.ens.fr/~laptev/download.html

[3] Action Recognition Datasets

Links to Datasets:

"Free Viewpoint Action Recognition using Motion History
Volumes (CVIU Nov./Dec. '06)."
D. Weinland, R. Ronfard, E. Boyer
"Actions as Space-Time Shapes (ICCV '05)."
M. Blank, L. Gorelick, E. Shechtman, M. Irani, R. Basri
"Recognizing
Human Actions: A Local SVM Approach (ICPR '04)."
C. Schuldt, I. Laptev and B. Caputo
"Propagation Networks for Recognizing Partially Ordered
Sequential Activity (CVPR '04)."
Y. Shi, Y. Huang, D. Minnen, A. Bobick, I. Essa
"Tracking Multiple Objects through Occlusions (CVPR
'05)."
Y. Huang, I. Essa
Sixth IEEE International Workshop on Performance Evaluation of
Tracking and Surveillance (PETS - ECCV 2004)

Recent
Action Recognition Papers:

D. Weinland, R. Ronfard, E. Boyer (CVIU Nov./Dec. '06)
"Free Viewpoint Action Recognition using Motion History
Volumes"
11 actors each performing 3 times 13 actions: Check Watch, Cross Arms, Scratch
Head, Sit Down, Get Up, Turn Around, Walk, Wave, Punch, Kick, Point, Pick Up,
Throw.
Multiple views of 5 synchronized and calibrated cameras are provided.
Yilmaz, M. Shah (ICCV '05)
"Recognizing Human Actions in Videos Acquired by
Uncalibrated Moving Cameras"
18 Sequences, 8 Actions: 3 x Running, 3 x Bicycling, 3 x Sitting-down, 2 x
Walking, 2 x Picking-up, 1 x Waving Hands, 1 x Forehand Stroke, 1 x Backhand
Stroke
Y. Sheikh, M. Shah (ICCV '05)
"Exploring the Space of an Action for Human Action Recognition"
6 Actions: Sitting, Standing, Falling, Walking, Dancing, Running
M. Blank, L. Gorelick, E. Shechtman, M. Irani, R. Basri (ICCV '05)
"Actions as Space-Time Shapes"
81 Sequences, 9 Actions, 9 People: Running, Walking, Bending, Jumping-Jack,
Jumping-Forward-On-Two-Legs, Jumping-In-Place-On-Two-Legs, Galloping-Sideways,
Waving-Two-Hands, Waving-One-Hand Ballet
Yilmaz, M. Shah (CVPR '05)
"Action Sketch: A Novel Action Representation"
28 Sequences, 12 Actions: 7 x Walking, 4 x Aerobics, 2 x Dancing, 2 x Sit-down,
2 x Stand-up, 2 x Kicking, 2 x Surrender, 2 x Hands-down, 2 x Tennis, 1 x
Falling
E. Shechtman, M. Irani (CVPR '05)
"Space-Time Behavioral Correlation"
Walking, Diving, Jumping, Waving Arms, Waving Hands, Ballet Figure, Water
Fountain
Y. Shi, Y. Huang, D. Minnen, A. Bobick, I. Essa (CVPR '04)
"Propagation Networks for Recognition of Partially Ordered
Sequential Actions"
Glucose Monitor Calibration
C. Schuldt, I. Laptev and B. Caputo (ICPR '04)
"Recognizing Human Actions: A Local SVM Approach."
6 Actions x 25 Subjects x 4 Scenarios
V. Parameswaran, R. Chellappa (CVPR '03)
"View
Invariants for Human Action Recognition"
25 x Walk, 6 x Run, 18 x Sit-down
D. Minnen, I. Essa, T. Starner (CVPR '03)
"Expectation Grammars: Leveraging High-Level Expectations
for Activity Recognition"
Towers of Hanoi (only hands)
Efros, A. Berg, G. Mori, J. Malik (ICCV '03)
"Recognizing Actions at a Distance"
Soccer, Tennis, Ballet

[4] CVPR 2014 Tutorial
on Emerging Topics in Human Activity Recognition

[5] http://yangxd.org/projects/surveillance/SED13

[6] Recognition of
human actions

Sample
sequences for each action (DivX-compressed)

person15_walking_d1_uncomp.avi
person15_jogging_d1_uncomp.avi
person15_running_d1_uncomp.avi
person15_boxing_d1_uncomp.avi
person15_handwaving_d1_uncomp.avi
person15_handclapping_d1_uncomp.avi

Action
database in zip-archives (DivX-compressed)
Note: The database is publicly available for non-commercial use. Please refer
to [Schuldt, Laptev and Caputo, Proc. ICPR'04, Cambridge,
UK ] if you use this database in your publications.

walking.zip(242Mb)
jogging.zip(168Mb)
running.zip(149Mb)
boxing.zip(194Mb)
handwaving.zip(218Mb)
handclapping.zip(176Mb)

Related
publications

"Recognizing Human Actions: A Local
SVM Approach",
Christian Schuldt, Ivan Laptev and Barbara Caputo; in Proc. ICPR'04,
Cambridge, UK. [Abstract PDF]

"Local Spatio-Temporal Image Features
for Motion Interpretation",
Ivan Laptev; PhD Thesis, 2004, Computational Vision and Active Perception
Laboratory (CVAP), NADA, KTH, Stockholm [Abstract, PDF]

"Local Descriptors for Spatio-Temporal
Recognition",
Ivan Laptev and Tony Lindeberg; ECCV Workshop "Spatial Coherence for
Visual Motion Analysis" [Abstract, PDF]

"Velocity adaptation of space-time
interest points",
Ivan Laptev and Tony Lindeberg; in Proc. ICPR'04, Cambridge, UK. [Abstract, PDF]

"Space-Time Interest Points",
I. Laptev and T. Lindeberg; in Proc. ICCV'03, Nice, France,
pp.I:432-439. [Abstract, PDF]

行为识别(action recognition)相关资料的更多相关文章

【计算机视觉】行为识别(action recognition)相关资料
================华丽分割线=================这部分来自知乎==================== 链接:http://www.zhihu.com/question/3 ...
Recent papers on Action Recognition | 行为识别最新论文
CVPR2019 1.An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognit ...
CNN相关资料
转子http://blog.csdn.net/qianqing13579/article/details/71076261 前言入职之后,逐渐转到深度学习方向.很早就打算写深度学习相关博客了,但是由 ...
Skeleton-Based Action Recognition with Directed Graph Neural Network
Skeleton-Based Action Recognition with Directed Graph Neural Network 摘要因为骨架信息可以鲁棒地适应动态环境和复杂的背景,所以经常 ...
Two-Stream Adaptive Graph Convolutional Network for Skeleton-Based Action Recognition
Two-Stream Adaptive Graph Convolutional Network for Skeleton-Based Action Recognition 摘要基于骨架的动作识别因为 ...
Collaborative Spatioitemporal Feature Learning for Video Action Recognition
Collaborative Spatioitemporal Feature Learning for Video Action Recognition 摘要时空特征提取在视频动作识别中是一个非常重要 ...
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition (ST-GCN)
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition 摘要动态人体骨架模型带有进行动 ...
iOS10以及xCode8相关资料收集
兼容iOS 10 资料整理笔记源文:http://www.jianshu.com/p/0cc7aad638d9 1.Notification(通知) 自从Notification被引入之后,苹果就不 ...
AssetBundle机制相关资料收集
原地址:http://www.cnblogs.com/realtimepixels/p/3652075.html AssetBundle机制相关资料收集最近网友通过网站搜索Unity3D在手机及其他 ...

随机推荐

漫游Kafka实现篇之消息和日志
消息格式消息由一个固定长度的头部和可变长度的字节数组组成.头部包含了一个版本号和CRC32校验码. /** * 具有N个字节的消息的格式如下 * * 如果版本号是0 * * 1. 1个字节的 &qu ...
makefile learning
Copy:http://graybull.is-programmer.com/posts/37758.html 本文给出万能Makefile的具体实现,以及对其中的关键点进行解析.所谓C++万能Mak ...
hihoCoder hiho一下第四十八周题目1 : 拓扑排序·二
题意: 给定一个拓扑图,其中部分结点含有1个病毒,每个结点只要收到病毒就会立即往出边所能到达的点传播,病毒数可叠加,求所有结点的病毒数总和. 思路: 根据拓扑的特点,每个入度为0的点肯定不会再被传播病 ...
Scrum&Kanban在移动开发团队的实践（二）
Scrum&Kanban在移动开发团队的实践系列: Scrum&Kanban在移动开发团队的实践 (一) Scrum&Kanban在移动开发团队的实践 (二) 在第一篇分享文章 ...
modeler与activiti进行整合
整合Activiti Modeler到业务系统(或BPM平台) http://www.kafeitu.me/activiti/2013/03/10/integrate-activiti-modeler ...
erl0006 - erlang 查看进程状态，查看当前系统那些进程比较占资源
http://lfstar.blog.163.com/blog/static/56378987201341115037437/ 查看哪些进程占用内存最高? > spawn(fun() -> ...
wxWidgets简单的多线程
#include <wx/wx.h> #include <wx/thread.h> #include <wx/event.h> #include <wx/pr ...
所有 HTTP 状态代码及其定义
所有 HTTP 状态代码及其定义. 代码指示 2xx 成功 200 正常:请求已完成. 201 正常:紧接 POST 命令. 202 正常:已接受用于处理,但处理尚未完成. 20 ...
C# Datatable的Select方法
lubiaopan 原文 Datatable的Select()方法简介 DataTable是我们在进行开发时经常用到的一个类,并且经常需要对DataTable中的数据进行筛选等操作,下面就介绍一下Da ...
搭建XMPP协议，实现自主推送消息到手机
关于服务器端向Android客户端的推送,主要有三种方式: 1.客户端定时去服务端取或者保持一个长Socket,从本质讲这个不叫推送,这是去服务端拽数据.但是实现简单,主要缺点:耗电等 2.Googl ...

行为识别(action recognition)相关资料

行为识别(action recognition)相关资料的更多相关文章

随机推荐

热门专题