First, a depth spatial-temporal descriptor is developed to extract the interested local regions in depth image. Then the intensity spatial-temporal descriptor and the depth spatial-temporal descriptor are combined and feeded into a linear coding framework to get an effective feature vector, which can be used for action classification. Finally, extensive experiments are conducted on a publicly available RGB-D action recognition dataset and the proposed method shows promising results.

创新点就这个了:A linear coding framework is developed to fuse the intensity spatial-temporal descriptor and the depth spatial-temporal descriptor to form robust feature vector. In addition, we further exploit the temporal intrinsics of the video sequence and design a new pooling technology to improve the description performance.

Feature extraction

STIPs is an extension of SIFT (Scale-Invariant-Feature-Transform) in 3-dimensional space and uses one of Harris3D, Cuboid or Hessian as the detector.

http://www.di.ens.fr/~laptev/download.html

patch的分割有重叠~~

算是对depth map的预处理了 ~~

So the STIPs features in the RGB images disclose more detail characters of the subjects themselves while in the depth images they extract more characters of the shape of the subjects.

Coding approaches

vector quantization (VQ)

One disadvantage of the VQ is that it introduces significant quantization errors since only one element of the codebook is selected to represent the descriptor. To remedy this, one usually has to design a nonlinear SVM as the classifier which tries to compensate the quantization errors. However, using nonlinear kernels, the SVM has to pay a high training cost, including computation and storage. Considering the above defects, localityconstrained linear coding (LLC) –a more accurate and efficient coding approach[9]is adopted to replace VQ in this paper

Pooling strategy

Similar to the VQ coding approach, the LLC coding coefficients ci are expected to be combined into a global representation of the sample for classification.

DataSet

RGBD-HuDaAct[1]video database

The video sample consists of synchronized and calibrated RGB-D frame sequences, which contains in each frame a RGB image and a depth image, respectively. The RGB and depth images in each frame have been calibrated with a standard stereocalibration method available in OpenCV so that the points with the same coordinate in RGB and depth images are corresponded.

一片简洁的paper ,给我指明了方向 ~~

RGB-D action recognition using linear coding的更多相关文章

  1. Multi-View Region Adaptive Multi-temporal DMM and RGB Action Recognition

    论文标题:Multi-View Region Adaptive Multi-temporal DMM and RGB Action Recognition 来源/作者机构情况: 解决问题/主要思想贡献 ...

  2. 201904:Action recognition based on 2D skeletons extracted from RGB videos

    论文标题:Action recognition based on 2D skeletons extracted from RGB videos 发表时间:02 April 2019 解决问题/主要思想 ...

  3. 行为识别(action recognition)相关资料

    转自:http://blog.csdn.net/kezunhai/article/details/50176209 ================华丽分割线=================这部分来 ...

  4. 论文列表 for Action recognition

    要读的论文: https://www.cnblogs.com/hizhaolei/p/10565405.html 骨架动作识别论文汇总 https://blog.csdn.net/bianxuewei ...

  5. 【ML】Two-Stream Convolutional Networks for Action Recognition in Videos

    Two-Stream Convolutional Networks for Action Recognition in Videos & Towards Good Practices for ...

  6. 论文笔记 | A Closer Look at Spatiotemporal Convolutions for Action Recognition

    ( 这篇博文为原创,如需转载本文请email我: leizhao.mail@qq.com, 并注明来源链接,THX!) 本文主要分享了一篇来自CVPR 2018的论文,A Closer Look at ...

  7. Skeleton-Based Action Recognition with Directed Graph Neural Network

    Skeleton-Based Action Recognition with Directed Graph Neural Network 摘要 因为骨架信息可以鲁棒地适应动态环境和复杂的背景,所以经常 ...

  8. Two-Stream Adaptive Graph Convolutional Network for Skeleton-Based Action Recognition

    Two-Stream Adaptive Graph Convolutional Network for Skeleton-Based Action Recognition 摘要 基于骨架的动作识别因为 ...

  9. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition (ST-GCN)

    Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition 摘要 动态人体骨架模型带有进行动 ...

随机推荐

  1. NOIP2013 货车运输 倍增

    问题描述 A 国有 n 座城市,编号从 1 到 n,城市之间有 m 条双向道路.每一条道路对车辆都有重量限制,简称限重.现在有 q 辆货车在运输货物,司机们想知道每辆车在不超过车辆限重的情况下,最多能 ...

  2. 是我太天真之被BUG按在地上疯狂摩擦

    事情是这样的,我是一个追求完美的人,特别喜欢锦上添花,去年在学习python的时候做了一个作业:多重剪贴板,今天大概是吃饱了,查了一下自己的头发以后,我觉得可以挑战一下自己,所以决定为那个小程序添加一 ...

  3. PKU 3311 Hie with the Pie 状态DP

    Floyd + 状态DP Watashi的板子 #include <cstdio> #include <cstring> #include <iostream> # ...

  4. WHU 1540 Fibonacci 递推

    武大邀请赛的网络预选赛,就去做了个签到题,居然连这个递推都没推出来,真是惭愧. 而且好久没写矩阵乘法了,来回顾一下. 题意: 求Fibonacci数列的,前n项立方和. 思路: 可以求得一下递推公式: ...

  5. Unity C# 设计模式(六)原型模式

    定义:用原型实例指定创建对象的种类,并通过拷贝这些原型来创建新的对象. 优点: 1.原型模式向客户隐藏了创建新实例的复杂性 2.原型模式允许动态增加或较少产品类. 3.原型模式简化了实例的创建结构,工 ...

  6. pig安装配置

    pig的安装配置很简单,只需要配置一下环境变量和指向hadoop conf的环境变量就行了 1.上传 2.解压 3.配置环境变量 Pig工作模式 本地模式:只需要配置PATH环境变量${PIG_HOM ...

  7. OpenGL编程逐步深入(二)在窗口中显示一个点

    准备知识 在本文中我们将会接触到OpenGl的扩展库GLEW( OpenGL Extension Wrangler Library),GLEW可以帮助我们处理OpenGl中繁琐的扩展管理.一旦初始化后 ...

  8. 打开文件对话框在xp和win7上的实现文件任意多选

    作者:朱金灿 来源:http://blog.csdn.net/clever101 在xp系统上进行文件多选,实际上其文件字符串数组的缓冲区是有限,并不能支持选择任意多个文件,为此以前我还写过一篇文章: ...

  9. android的HTTP框架之Volley

    Volley是android官方开发的一个HTTP框架,简化了利用java中原生的HTTP操作API-HttpURLConnection和HttpClient的操作. 一.首先是Volley的简单使用 ...

  10. 2008R2域控环境中 应用组策略 实现禁用USB设备使用

    本文介绍如何在Windows Server 2008 AD中禁用客户端USB端口.本文使用的系统:Windows Server 2008 R2 企业版.域功能级别:Windows Server 200 ...