================华丽分割线=================这部分来自知乎====================

链接:http://www.zhihu.com/question/33272629/answer/60279003

有关action recognition in videos, 最近自己也在搞这方面的东西,该领域水很深,不过其实主流就那几招,我就班门弄斧说下video里主流的:

Deep Learning之前最work的是INRIA组的Improved Dense Trajectories(IDT) + fisher vector, paper and code:
LEAR - Improved Trajectories Video Description

基本上INRIA的东西都挺work 恩..

然后Deep Learning比较有代表性的就是VGG组的2-stream:
http://arxiv.org/abs/1406.2199

其实效果和IDT并没有太大区别,里面的结果被很多人吐槽难复现,我自己也试了一段时间才有个差不多的数字。

然后就是在这两个work上面就有很多改进的方法,目前的state-of-the-art也是很直观可以想到的是xiaoou组的IDT+2-stream:
http://wanglimin.github.io/papers/WangQT_CVPR15.pdf

还有前段时间很火,现在仍然很多人关注的G社的LSTM+2-stream: 
http://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/43793.pdf

然后安利下zhongwen同学的paper:
http://www.cs.cmu.edu/~zhongwen/pdf/MED_CNN.pdf

最后你会发现paper都必需和IDT比,

================华丽分割线=================这部分也来自知乎====================

链接:http://www.zhihu.com/question/33272629/answer/60163859

视频方面的不了解,可以聊一聊静态图像下的~

[1] Action Recognition from a Distributed Representation of Pose and Appearance, CVPR,2010

[2] Combining Randomization and Discrimination for Fine-Grained Image Categorization, CVPR,2011

[3] Object and Action Classification with Latent Variables, BMVC, 2011

[4] Human Action Recognition by Learning Bases of Action Attributes and Parts, ICCV, 2011

[5] Learning person-object interactions for action recognition in still images, NIPS, 2011

[6] Weakly Supervised Learning of Interactions between Humans and Objects, PAMI, 2012

[7] Discriminative Spatial Saliency for Image Classification, CVPR, 2012

[8] Expanded Parts Model for Human Attribute and Action Recognition in Still Images, CVPR, 2013

[9] Coloring Action Recognition in Still Images, IJCV, 2013

[10] Semantic Pyramids for Gender and Action Recognition, TIP, 2014

[11] Actions and Attributes from Wholes and Parts, arXiv, 2015

[12] Contextual Action Recognition with R*CNN, arXiv, 2015

[13] Recognizing Actions Through Action-Specific Person Detection, TIP, 2015

2010之前的都没看过,在10年左右的这几年(11,12)主要的思路有3种:1.以所交互的物体为线索(person-object interaction),建立交互关系,如文献5,6;2.建立关于姿态(pose)的模型,通过统计姿态(或者更广泛的,部件)的分布来进行分类,如文献1,4,还有个poselet上面好像没列出来,那个用的还比较多;3.寻找具有鉴别力的区域(discriminative),抑制那些meaningless 的区域,如文献2,7。10和11也用到了这种思想。

文献9,10都利用了SIFT以外的一种特征:color name,并且描述了在动作分类中如何融合多种不同的特征。

文献12探讨如何结合上下文(因为在动作分类中会给出人的bounding box)。

比较新的工作都用CNN特征替换了SIFT特征(文献11,12,13),结果上来说12是最新的。

静态图像中以分类为主,检测的工作出现的不是很多,文献4,13中都有关于检测的工作。可能在2015之前分类的结果还不够promising。现在PASCAL VOC 2012上分类mAP已经到了89%,以后的注意力可能会更多地转向检测。

================华丽分割线=================这部分来自互联网====================

[1] http://lear.inrialpes.fr/software(干货较多,可以进去浏览浏览)

[2]  Action Recognition Paper Reading

  • Tian, YingLi, et al. "Hierarchical filtered motion for action recognition in crowded videos." Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 42.3 (2012): 313-323.

    1. A new 3D interest point detector, based on 2D Harris and Motion History Image (MHI). Essentially, 2D Harris points with recent motion are selected as interest point.
    2. A new descriptors based on HOG on image intensity and MHI. Some filtering is performed to remove cluttered motion and normalize descriptors.
    3. KTH and MSR Action dataset
  • Yuan, Junsong, Zicheng Liu, and Ying Wu. "Discriminative subvolume search for efficient action detection." Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009.

    1. A discriminative matching techniques based on mutual information and nearest neighbor algorithm
    2. A better upper bound for Branching and Bounding to locate matched action that maximize mutual information
    3. The key idea is to decompose the search space into spatial and temporal.
  • Lampert, Christoph H., Matthew B. Blaschko, and Thomas Hofmann. "Beyond sliding windows: Object localization by efficient subwindow search." Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008.

    1. Code online: https://sites.google.com/site/christophlampert/software (Efficient
      Subwindow Search)
    2. Reducing the complexity of sliding window from n4 to averagely n2
    3. Branching and Bounding techniques
    4. Relies on a bounding funtion that gives a upper bound of the scoring function over a set of potential box
    5. works well with linear classifiers and BOW features.
  • Li, Li-Jia, et al. "Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification." NIPS. Vol. 2. No. 3. 2010.

    1. Images are represented as a scale-invariant map of object detector response
    2. Detectors are applied to novel images in multiple scales. At each scale, a 3 level spatial pyramid is applied. Responses are concatenated to form the descriptors for the image.
    3. 200 objecst are selected from a 1000 objects pool
    4. Evaluated In Scene classification task
    5. L1 and L1/L2 regularized LR is applied to discover sparsity. The the L1/L2 group sparsity, group is defined for each object, hence object level sparsity. Bear in mind that there are multiple entries in the descriptors for each object.
      (marginal improvements)
  • Wang, Heng, et al. "Dense trajectories and motion boundary descriptors for action recognition." International journal of computer vision 103.1 (2013): 60-79.

    1. Tracking over densely sampled points to get trajectories, in contrast with local representation. Not really dense sampling, grids are filtered by minEigen value criterion (Shi and Tomasi)
    2. Motion boundary (derivative over optical flow field), to overcome camera motion
    3. Code online: http://lear.inrialpes.fr/people/wang/dense_trajectories
    4. Optical Flow field is filtered by Median Filter. based on opencv
    5. Limit trajectory to overcome drift. Filter static point and error trajectories.
    6. Trajectory shape, HOG, HOF and MBH descriptors along the trajectory
    7. KTH (94.2%), Youtube (84.1%), Hollywood2 (58.2%), UCF Sports (88.0%), IXMAS (93.5%), UIUC (98.4%), Olympic Sports (74.1%), UCF50 (84.5%), HMDB51 (46.6%)
  • Liang, Xiaodan, Liang Lin, and Liangliang Cao. "Learning latent spatio-temporal compositional model for human action recognition." Proceedings of the 21st ACM international conference on Multimedia. ACM, 2013.

    1. Laptev STIP with HOF and HOG, with BOW quantization
    2. Leaf node for detecting action parts
    3. Or node to account for intra-class variability
    4. And node to aggregate action in a frame
    5. Root node to identify temporal composition
    6. Contextual interaction (connecting leaf nodes)
    7. Everything is formulated in a latent SVM framework and solved by CCCP
    8. Since the leaf node can move around from one Or-node to another, a reconfiguration step is used to rearrange the feature vector
    9. UCF Youtube and Olympic Sports dataset
  • Sadanand, Sreemanananth, and Jason J. Corso. "Action bank: A high-level representation of activity in video." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.

    1. 98.2% KTH, 95.0% UCF Sports, 57.9% UCF50, 26.9% HMDB51
    2. 205 video clips used as template to detect action from novel video.
    3. Detectors are sampled from multi viewpoint and run with multiple scales
    4. Output of detectors are maxpooled for ST volume through various pooling unit
    5. "Action Spoting" for template detector
    6. Code online: http://www.cse.buffalo.edu/~jcorso/r/actionbank/
  • Liu, Jingen, Benjamin Kuipers, and Silvio Savarese. "Recognizing human actions by attributes." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011.

    1. 22 manually selected action attributes as semantic representation
    2. Data Driven attributes as complementary information
    3. Attributes as latent variable, just the parts in DPM model
    4. Account for the class matching, attribute matching, attributes cooccurcance.
    5. STIP by 1D-Gabor detector. Gradient based + BOW over ST volume
    6. UIUC dataset, KTH, Olympic Sports Dataset
  • Niebles, Juan Carlos, Hongcheng Wang, and Li Fei-Fei. "Unsupervised learning of human action categories using spatial-temporal words." International Journal of Computer Vision 79.3 (2008): 299-318.

    1. Unsupervised video categorizaton, using pLSA and LDA
    2. Action Localization
    3. Laptev's STIP is too sparse comparing with Dollar's
    4. Simple gradient based descriptors and PCA applied to reduce dimensionality --> rely on codebook to deal with invariance
    5. K-means with Euclidean distance metric
    6. pLSA or LDA on top of BOW (# topic is equal to the categories to be recognized)
    7. Each STIP is associated with a BOW, hence topic distribution, so it's trivial to perform Localization
  • Laptev, Ivan, et al. "Learning realistic human actions from movies." Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008.

    1. Annotating videos by aligning transcriptes
    2. A movie dataset
    3. Space-Time interest points + HOG + HOF around a ST volume
    4. ST BOW. Given a video sequence, multiple way to segment it, each of which is called a channel
    5. Multi-Channel \chi^2 kernel classification. Channel selection using greedy shrink
    6. KTH (91.8%) and Movie (18.2% ~ 53.3%) dataset
    7. STIP + HOG and HOF code: http://www.di.ens.fr/~laptev/download.html

[3] Action Recognition Datasets

Links to Datasets:

Recent Action Recognition Papers:

[4] CVPR 2014 Tutorial on  Emerging Topics in Human Activity Recognition

[5] http://yangxd.org/projects/surveillance/SED13

[6] Recognition of human actions

Sample sequences for each action (DivX-compressed)

person15_walking_d1_uncomp.avi
person15_jogging_d1_uncomp.avi
person15_running_d1_uncomp.avi
person15_boxing_d1_uncomp.avi
person15_handwaving_d1_uncomp.avi
person15_handclapping_d1_uncomp.avi

Action database in zip-archives (DivX-compressed)

Note: The database is publicly available for non-commercial use. Please refer to [Schuldt, Laptev and Caputo, Proc.
ICPR'04, Cambridge, UK 
]
 if you use this database in your publications.

walking.zip(242Mb)
jogging.zip(168Mb)
running.zip(149Mb)
boxing.zip(194Mb)
handwaving.zip(218Mb)
handclapping.zip(176Mb)

Related publications
"Recognizing Human Actions: A Local SVM Approach",

Christian Schuldt, Ivan Laptev and Barbara Caputo; in Proc. ICPR'04, Cambridge, UK. [Abstract PDF]"Local Spatio-Temporal Image Features for Motion Interpretation",

Ivan Laptev; PhD Thesis, 2004, Computational Vision and Active Perception Laboratory (CVAP), NADA, KTH, Stockholm [AbstractPDF]"Local Descriptors for Spatio-Temporal Recognition",

Ivan Laptev and Tony Lindeberg; ECCV Workshop "Spatial Coherence for Visual Motion Analysis" [AbstractPDF]"Velocity adaptation of space-time interest points",

Ivan Laptev and Tony Lindeberg; in Proc. ICPR'04, Cambridge, UK. [AbstractPDF]"Space-Time Interest Points",

I. Laptev and T. Lindeberg; in Proc. ICCV'03, Nice, France, pp.I:432-439. [AbstractPDF]

【计算机视觉】行为识别(action recognition)相关资料的更多相关文章

  1. 行为识别(action recognition)相关资料

    转自:http://blog.csdn.net/kezunhai/article/details/50176209 ================华丽分割线=================这部分来 ...

  2. Recent papers on Action Recognition | 行为识别最新论文

    CVPR2019 1.An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognit ...

  3. CNN相关资料

    转子http://blog.csdn.net/qianqing13579/article/details/71076261 前言 入职之后,逐渐转到深度学习方向.很早就打算写深度学习相关博客了,但是由 ...

  4. 计算机视觉(ComputerVision, CV)相关领域的站点链接

    关于计算机视觉(ComputerVision, CV)相关领域的站点链接,当中有CV牛人的主页.CV研究小组的主页,CV领域的paper,代码.CV领域的最新动态.国内的应用情况等等. (1)goog ...

  5. Skeleton-Based Action Recognition with Directed Graph Neural Network

    Skeleton-Based Action Recognition with Directed Graph Neural Network 摘要 因为骨架信息可以鲁棒地适应动态环境和复杂的背景,所以经常 ...

  6. Two-Stream Adaptive Graph Convolutional Network for Skeleton-Based Action Recognition

    Two-Stream Adaptive Graph Convolutional Network for Skeleton-Based Action Recognition 摘要 基于骨架的动作识别因为 ...

  7. Collaborative Spatioitemporal Feature Learning for Video Action Recognition

    Collaborative Spatioitemporal Feature Learning for Video Action Recognition 摘要 时空特征提取在视频动作识别中是一个非常重要 ...

  8. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition (ST-GCN)

    Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition 摘要 动态人体骨架模型带有进行动 ...

  9. iOS10以及xCode8相关资料收集

    兼容iOS 10 资料整理笔记 源文:http://www.jianshu.com/p/0cc7aad638d9 1.Notification(通知) 自从Notification被引入之后,苹果就不 ...

随机推荐

  1. 洛谷 P1725 琪露诺 题解

    P1725 琪露诺 题目描述 在幻想乡,琪露诺是以笨蛋闻名的冰之妖精. 某一天,琪露诺又在玩速冻青蛙,就是用冰把青蛙瞬间冻起来.但是这只青蛙比以往的要聪明许多,在琪露诺来之前就已经跑到了河的对岸.于是 ...

  2. Processing玩抠图

    突然兴起想玩一下抠图,试着用自带的Example\video来改,花了一个中午做了个小样: 分别是白色为底与黑色为底的效果,代码如下: import processing.video.*; int n ...

  3. (32)Vue模板语法

    模板语法 文本: <span>Message: {{ msg }}</span> v-once 一次性地插值,当数据改变时,插值处的内容不会更新 <span v-once ...

  4. 2019暑期金华集训 Day6 计算几何

    自闭集训 Day6 计算几何 内积 内积不等式: \[ (A,B)^2\le (A,A)(B,B) \] 其中\((A,B)\)表示\(A\cdot B\). (好像是废话?) 叉积 \[ A\tim ...

  5. Mybatis 通用Mapper增强

    1.确保是个Maven项目,确保Spring与Mybatis正确配置. 2.新建一个自定义通用Mapper. /** * BaseMapper接口:使mapper包含完整的CRUD方法<br&g ...

  6. Android中活动被回收了怎么办

    当一个活动进入到了停止状态,是有可能被系统回收的.按下返回键的时候,活动被重新创建一次,但是里面的数据就没办法重现: 这时Activity中提供了一个onSaveInstanceState()回调方法 ...

  7. Nginx 所使用的 epoll 模型是什么?

    对于 Nginx,相信有过 Web 服务部署经验的同学都不陌生,它有以下特点: 是一个高性能的 HTTP 和反向代理服务器,也是一个 IMAP/POP3/SMTP 代理服务器. Nginx 相较于 A ...

  8. nginx 配置的server_name参数

    nginx中的server_name指令主要用于配置基于名称虚拟主机. 一 匹配顺序,server_name指令在接到请求后的匹配顺序如下: 1.确切的server_name匹配,例如: server ...

  9. Prometheus初体验(三)

    一.安装部署 Prometheus基于Golang编写,编译后的软件包,不依赖于任何的第三方依赖.用户只需要下载对应平台的二进制包,解压并且添加基本的配置即可正常启动Prometheus Server ...

  10. vim 文本替换讲解

    在VIM中进行文本替换: 1. 替换当前行中的内容: :s/from/to/ (s即substitude) :s/from/to/ : 将当前行中的第一个from,替换成to.如果当前行含有多个 fr ...