【计算机视觉】行为识别(action recognition)相关资料
================华丽分割线=================这部分来自知乎====================
Deep Learning之前最work的是INRIA组的Improved Dense Trajectories(IDT) + fisher vector, paper and code:
LEAR - Improved Trajectories Video Description
基本上INRIA的东西都挺work 恩..
然后Deep Learning比较有代表性的就是VGG组的2-stream:
http://arxiv.org/abs/1406.2199
其实效果和IDT并没有太大区别,里面的结果被很多人吐槽难复现,我自己也试了一段时间才有个差不多的数字。
然后就是在这两个work上面就有很多改进的方法,目前的state-of-the-art也是很直观可以想到的是xiaoou组的IDT+2-stream:
http://wanglimin.github.io/papers/WangQT_CVPR15.pdf
还有前段时间很火,现在仍然很多人关注的G社的LSTM+2-stream:
http://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/43793.pdf
然后安利下zhongwen同学的paper:
http://www.cs.cmu.edu/~zhongwen/pdf/MED_CNN.pdf
最后你会发现paper都必需和IDT比,
================华丽分割线=================这部分也来自知乎====================
[1] Action Recognition from a Distributed Representation of Pose and Appearance, CVPR,2010
[2] Combining Randomization and Discrimination for Fine-Grained Image Categorization, CVPR,2011
[3] Object and Action Classification with Latent Variables, BMVC, 2011
[4] Human Action Recognition by Learning Bases of Action Attributes and Parts, ICCV, 2011
[5] Learning person-object interactions for action recognition in still images, NIPS, 2011
[6] Weakly Supervised Learning of Interactions between Humans and Objects, PAMI, 2012
[7] Discriminative Spatial Saliency for Image Classification, CVPR, 2012
[8] Expanded Parts Model for Human Attribute and Action Recognition in Still Images, CVPR, 2013
[9] Coloring Action Recognition in Still Images, IJCV, 2013
[10] Semantic Pyramids for Gender and Action Recognition, TIP, 2014
[11] Actions and Attributes from Wholes and Parts, arXiv, 2015
[12] Contextual Action Recognition with R*CNN, arXiv, 2015
[13] Recognizing Actions Through Action-Specific Person Detection, TIP, 2015
2010之前的都没看过,在10年左右的这几年(11,12)主要的思路有3种:1.以所交互的物体为线索(person-object interaction),建立交互关系,如文献5,6;2.建立关于姿态(pose)的模型,通过统计姿态(或者更广泛的,部件)的分布来进行分类,如文献1,4,还有个poselet上面好像没列出来,那个用的还比较多;3.寻找具有鉴别力的区域(discriminative),抑制那些meaningless 的区域,如文献2,7。10和11也用到了这种思想。
文献9,10都利用了SIFT以外的一种特征:color name,并且描述了在动作分类中如何融合多种不同的特征。
文献12探讨如何结合上下文(因为在动作分类中会给出人的bounding box)。
比较新的工作都用CNN特征替换了SIFT特征(文献11,12,13),结果上来说12是最新的。
静态图像中以分类为主,检测的工作出现的不是很多,文献4,13中都有关于检测的工作。可能在2015之前分类的结果还不够promising。现在PASCAL VOC 2012上分类mAP已经到了89%,以后的注意力可能会更多地转向检测。
================华丽分割线=================这部分来自互联网====================
[1] http://lear.inrialpes.fr/software(干货较多,可以进去浏览浏览)
[2] Action Recognition Paper Reading
- Tian, YingLi, et al. "Hierarchical filtered motion for action recognition in crowded videos." Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 42.3 (2012): 313-323.
- A new 3D interest point detector, based on 2D Harris and Motion History Image (MHI). Essentially, 2D Harris points with recent motion are selected as interest point.
- A new descriptors based on HOG on image intensity and MHI. Some filtering is performed to remove cluttered motion and normalize descriptors.
- KTH and MSR Action dataset
- Yuan, Junsong, Zicheng Liu, and Ying Wu. "Discriminative subvolume search for efficient action detection." Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009.
- A discriminative matching techniques based on mutual information and nearest neighbor algorithm
- A better upper bound for Branching and Bounding to locate matched action that maximize mutual information
- The key idea is to decompose the search space into spatial and temporal.
- Lampert, Christoph H., Matthew B. Blaschko, and Thomas Hofmann. "Beyond sliding windows: Object localization by efficient subwindow search." Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008.
- Code online: https://sites.google.com/site/christophlampert/software (Efficient
Subwindow Search) - Reducing the complexity of sliding window from n4 to averagely n2
- Branching and Bounding techniques
- Relies on a bounding funtion that gives a upper bound of the scoring function over a set of potential box
- works well with linear classifiers and BOW features.
- Code online: https://sites.google.com/site/christophlampert/software (Efficient
- Li, Li-Jia, et al. "Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification." NIPS. Vol. 2. No. 3. 2010.
- Images are represented as a scale-invariant map of object detector response
- Detectors are applied to novel images in multiple scales. At each scale, a 3 level spatial pyramid is applied. Responses are concatenated to form the descriptors for the image.
- 200 objecst are selected from a 1000 objects pool
- Evaluated In Scene classification task
- L1 and L1/L2 regularized LR is applied to discover sparsity. The the L1/L2 group sparsity, group is defined for each object, hence object level sparsity. Bear in mind that there are multiple entries in the descriptors for each object.
(marginal improvements)
- Wang, Heng, et al. "Dense trajectories and motion boundary descriptors for action recognition." International journal of computer vision 103.1 (2013): 60-79.
- Tracking over densely sampled points to get trajectories, in contrast with local representation. Not really dense sampling, grids are filtered by minEigen value criterion (Shi and Tomasi)
- Motion boundary (derivative over optical flow field), to overcome camera motion
- Code online: http://lear.inrialpes.fr/people/wang/dense_trajectories
- Optical Flow field is filtered by Median Filter. based on opencv
- Limit trajectory to overcome drift. Filter static point and error trajectories.
- Trajectory shape, HOG, HOF and MBH descriptors along the trajectory
- KTH (94.2%), Youtube (84.1%), Hollywood2 (58.2%), UCF Sports (88.0%), IXMAS (93.5%), UIUC (98.4%), Olympic Sports (74.1%), UCF50 (84.5%), HMDB51 (46.6%)
- Liang, Xiaodan, Liang Lin, and Liangliang Cao. "Learning latent spatio-temporal compositional model for human action recognition." Proceedings of the 21st ACM international conference on Multimedia. ACM, 2013.
- Laptev STIP with HOF and HOG, with BOW quantization
- Leaf node for detecting action parts
- Or node to account for intra-class variability
- And node to aggregate action in a frame
- Root node to identify temporal composition
- Contextual interaction (connecting leaf nodes)
- Everything is formulated in a latent SVM framework and solved by CCCP
- Since the leaf node can move around from one Or-node to another, a reconfiguration step is used to rearrange the feature vector
- UCF Youtube and Olympic Sports dataset
- Sadanand, Sreemanananth, and Jason J. Corso. "Action bank: A high-level representation of activity in video." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.
- 98.2% KTH, 95.0% UCF Sports, 57.9% UCF50, 26.9% HMDB51
- 205 video clips used as template to detect action from novel video.
- Detectors are sampled from multi viewpoint and run with multiple scales
- Output of detectors are maxpooled for ST volume through various pooling unit
- "Action Spoting" for template detector
- Code online: http://www.cse.buffalo.edu/~jcorso/r/actionbank/
- Liu, Jingen, Benjamin Kuipers, and Silvio Savarese. "Recognizing human actions by attributes." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011.
- 22 manually selected action attributes as semantic representation
- Data Driven attributes as complementary information
- Attributes as latent variable, just the parts in DPM model
- Account for the class matching, attribute matching, attributes cooccurcance.
- STIP by 1D-Gabor detector. Gradient based + BOW over ST volume
- UIUC dataset, KTH, Olympic Sports Dataset
- Niebles, Juan Carlos, Hongcheng Wang, and Li Fei-Fei. "Unsupervised learning of human action categories using spatial-temporal words." International Journal of Computer Vision 79.3 (2008): 299-318.
- Unsupervised video categorizaton, using pLSA and LDA
- Action Localization
- Laptev's STIP is too sparse comparing with Dollar's
- Simple gradient based descriptors and PCA applied to reduce dimensionality --> rely on codebook to deal with invariance
- K-means with Euclidean distance metric
- pLSA or LDA on top of BOW (# topic is equal to the categories to be recognized)
- Each STIP is associated with a BOW, hence topic distribution, so it's trivial to perform Localization
- Laptev, Ivan, et al. "Learning realistic human actions from movies." Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008.
- Annotating videos by aligning transcriptes
- A movie dataset
- Space-Time interest points + HOG + HOF around a ST volume
- ST BOW. Given a video sequence, multiple way to segment it, each of which is called a channel
- Multi-Channel \chi^2 kernel classification. Channel selection using greedy shrink
- KTH (91.8%) and Movie (18.2% ~ 53.3%) dataset
- STIP + HOG and HOF code: http://www.di.ens.fr/~laptev/download.html
[3] Action Recognition Datasets
Links to Datasets:
- "Free Viewpoint Action Recognition using Motion History Volumes (CVIU Nov./Dec. '06)."
D. Weinland, R. Ronfard, E. Boyer - "Actions as Space-Time Shapes (ICCV '05)."
M. Blank, L. Gorelick, E. Shechtman, M. Irani, R. Basri - "Recognizing Human Actions: A Local SVM Approach (ICPR '04)."
C. Schuldt, I. Laptev and B. Caputo - "Propagation Networks for Recognizing Partially Ordered Sequential Activity (CVPR
'04)."
Y. Shi, Y. Huang, D. Minnen, A. Bobick, I. Essa - "Tracking Multiple Objects through Occlusions (CVPR '05)."
Y. Huang, I. Essa - Sixth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS - ECCV 2004)
Recent Action Recognition Papers:
- D. Weinland, R. Ronfard, E. Boyer (CVIU Nov./Dec. '06)
"Free Viewpoint Action Recognition using Motion History Volumes"
11 actors each performing 3 times 13 actions: Check Watch, Cross Arms, Scratch Head, Sit Down, Get Up, Turn Around, Walk, Wave, Punch, Kick, Point, Pick Up, Throw.
Multiple views of 5 synchronized and calibrated cameras are provided. - A. Yilmaz, M. Shah (ICCV '05)
"Recognizing Human Actions in Videos Acquired by Uncalibrated Moving Cameras"
18 Sequences, 8 Actions: 3 x Running, 3 x Bicycling, 3 x Sitting-down, 2 x Walking, 2 x Picking-up, 1 x Waving Hands, 1 x Forehand Stroke, 1 x Backhand Stroke - Y. Sheikh, M. Shah (ICCV '05)
"Exploring the Space of an Action for Human Action Recognition"
6 Actions: Sitting, Standing, Falling, Walking, Dancing, Running - M. Blank, L. Gorelick, E. Shechtman, M. Irani, R. Basri (ICCV '05)
"Actions as Space-Time Shapes"
81 Sequences, 9 Actions, 9 People: Running, Walking, Bending, Jumping-Jack, Jumping-Forward-On-Two-Legs, Jumping-In-Place-On-Two-Legs, Galloping-Sideways, Waving-Two-Hands, Waving-One-Hand Ballet - A. Yilmaz, M. Shah (CVPR '05)
"Action Sketch: A Novel Action Representation"
28 Sequences, 12 Actions: 7 x Walking, 4 x Aerobics, 2 x Dancing, 2 x Sit-down, 2 x Stand-up, 2 x Kicking, 2 x Surrender, 2 x Hands-down, 2 x Tennis, 1 x Falling - E. Shechtman, M. Irani (CVPR '05)
"Space-Time Behavioral Correlation"
Walking, Diving, Jumping, Waving Arms, Waving Hands, Ballet Figure, Water Fountain - Y. Shi, Y. Huang, D. Minnen, A. Bobick, I. Essa (CVPR '04)
"Propagation Networks for Recognition of Partially Ordered Sequential Actions"
Glucose Monitor Calibration - C. Schuldt, I. Laptev and B. Caputo (ICPR '04)
"Recognizing Human Actions: A Local SVM Approach."
6 Actions x 25 Subjects x 4 Scenarios - V. Parameswaran, R. Chellappa (CVPR '03)
"View Invariants for Human Action Recognition"
25 x Walk, 6 x Run, 18 x Sit-down - D. Minnen, I. Essa, T. Starner (CVPR '03)
"Expectation Grammars: Leveraging High-Level Expectations for Activity Recognition"
Towers of Hanoi (only hands) - A. Efros, A. Berg, G. Mori, J. Malik (ICCV '03)
"Recognizing Actions at a Distance"
Soccer, Tennis, Ballet
[4] CVPR 2014 Tutorial on Emerging Topics in Human Activity Recognition
[5] http://yangxd.org/projects/surveillance/SED13
[6] Recognition of human actions
Sample sequences for each action (DivX-compressed)
person15_walking_d1_uncomp.avi
person15_jogging_d1_uncomp.avi
person15_running_d1_uncomp.avi
person15_boxing_d1_uncomp.avi
person15_handwaving_d1_uncomp.avi
person15_handclapping_d1_uncomp.avi
Action database in zip-archives (DivX-compressed)
Note: The database is publicly available for non-commercial use. Please refer to [Schuldt, Laptev and Caputo, Proc.
ICPR'04, Cambridge, UK ] if you use this database in your publications.
walking.zip(242Mb)
jogging.zip(168Mb)
running.zip(149Mb)
boxing.zip(194Mb)
handwaving.zip(218Mb)
handclapping.zip(176Mb)
Related publications
"Recognizing Human Actions: A Local SVM Approach",
Christian Schuldt, Ivan Laptev and Barbara Caputo; in Proc. ICPR'04, Cambridge, UK. [Abstract PDF]"Local Spatio-Temporal Image Features for Motion Interpretation",
Ivan Laptev; PhD Thesis, 2004, Computational Vision and Active Perception Laboratory (CVAP), NADA, KTH, Stockholm [Abstract, PDF]"Local Descriptors for Spatio-Temporal Recognition",
Ivan Laptev and Tony Lindeberg; ECCV Workshop "Spatial Coherence for Visual Motion Analysis" [Abstract, PDF]"Velocity adaptation of space-time interest points",
Ivan Laptev and Tony Lindeberg; in Proc. ICPR'04, Cambridge, UK. [Abstract, PDF]"Space-Time Interest Points",
I. Laptev and T. Lindeberg; in Proc. ICCV'03, Nice, France, pp.I:432-439. [Abstract, PDF]
【计算机视觉】行为识别(action recognition)相关资料的更多相关文章
- 行为识别(action recognition)相关资料
转自:http://blog.csdn.net/kezunhai/article/details/50176209 ================华丽分割线=================这部分来 ...
- Recent papers on Action Recognition | 行为识别最新论文
CVPR2019 1.An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognit ...
- CNN相关资料
转子http://blog.csdn.net/qianqing13579/article/details/71076261 前言 入职之后,逐渐转到深度学习方向.很早就打算写深度学习相关博客了,但是由 ...
- 计算机视觉(ComputerVision, CV)相关领域的站点链接
关于计算机视觉(ComputerVision, CV)相关领域的站点链接,当中有CV牛人的主页.CV研究小组的主页,CV领域的paper,代码.CV领域的最新动态.国内的应用情况等等. (1)goog ...
- Skeleton-Based Action Recognition with Directed Graph Neural Network
Skeleton-Based Action Recognition with Directed Graph Neural Network 摘要 因为骨架信息可以鲁棒地适应动态环境和复杂的背景,所以经常 ...
- Two-Stream Adaptive Graph Convolutional Network for Skeleton-Based Action Recognition
Two-Stream Adaptive Graph Convolutional Network for Skeleton-Based Action Recognition 摘要 基于骨架的动作识别因为 ...
- Collaborative Spatioitemporal Feature Learning for Video Action Recognition
Collaborative Spatioitemporal Feature Learning for Video Action Recognition 摘要 时空特征提取在视频动作识别中是一个非常重要 ...
- Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition (ST-GCN)
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition 摘要 动态人体骨架模型带有进行动 ...
- iOS10以及xCode8相关资料收集
兼容iOS 10 资料整理笔记 源文:http://www.jianshu.com/p/0cc7aad638d9 1.Notification(通知) 自从Notification被引入之后,苹果就不 ...
随机推荐
- 洛谷 P1725 琪露诺 题解
P1725 琪露诺 题目描述 在幻想乡,琪露诺是以笨蛋闻名的冰之妖精. 某一天,琪露诺又在玩速冻青蛙,就是用冰把青蛙瞬间冻起来.但是这只青蛙比以往的要聪明许多,在琪露诺来之前就已经跑到了河的对岸.于是 ...
- Processing玩抠图
突然兴起想玩一下抠图,试着用自带的Example\video来改,花了一个中午做了个小样: 分别是白色为底与黑色为底的效果,代码如下: import processing.video.*; int n ...
- (32)Vue模板语法
模板语法 文本: <span>Message: {{ msg }}</span> v-once 一次性地插值,当数据改变时,插值处的内容不会更新 <span v-once ...
- 2019暑期金华集训 Day6 计算几何
自闭集训 Day6 计算几何 内积 内积不等式: \[ (A,B)^2\le (A,A)(B,B) \] 其中\((A,B)\)表示\(A\cdot B\). (好像是废话?) 叉积 \[ A\tim ...
- Mybatis 通用Mapper增强
1.确保是个Maven项目,确保Spring与Mybatis正确配置. 2.新建一个自定义通用Mapper. /** * BaseMapper接口:使mapper包含完整的CRUD方法<br&g ...
- Android中活动被回收了怎么办
当一个活动进入到了停止状态,是有可能被系统回收的.按下返回键的时候,活动被重新创建一次,但是里面的数据就没办法重现: 这时Activity中提供了一个onSaveInstanceState()回调方法 ...
- Nginx 所使用的 epoll 模型是什么?
对于 Nginx,相信有过 Web 服务部署经验的同学都不陌生,它有以下特点: 是一个高性能的 HTTP 和反向代理服务器,也是一个 IMAP/POP3/SMTP 代理服务器. Nginx 相较于 A ...
- nginx 配置的server_name参数
nginx中的server_name指令主要用于配置基于名称虚拟主机. 一 匹配顺序,server_name指令在接到请求后的匹配顺序如下: 1.确切的server_name匹配,例如: server ...
- Prometheus初体验(三)
一.安装部署 Prometheus基于Golang编写,编译后的软件包,不依赖于任何的第三方依赖.用户只需要下载对应平台的二进制包,解压并且添加基本的配置即可正常启动Prometheus Server ...
- vim 文本替换讲解
在VIM中进行文本替换: 1. 替换当前行中的内容: :s/from/to/ (s即substitude) :s/from/to/ : 将当前行中的第一个from,替换成to.如果当前行含有多个 fr ...