RGB-D action recognition using linear coding

First, a depth spatial-temporal descriptor is developed to extract the interested local regions in depth image. Then the intensity spatial-temporal descriptor and the depth spatial-temporal descriptor are combined and feeded into a linear coding framework to get an effective feature vector, which can be used for action classification. Finally, extensive experiments are conducted on a publicly available RGB-D action recognition dataset and the proposed method shows promising results.

创新点就这个了：A linear coding framework is developed to fuse the intensity spatial-temporal descriptor and the depth spatial-temporal descriptor to form robust feature vector. In addition, we further exploit the temporal intrinsics of the video sequence and design a new pooling technology to improve the description performance.

Feature extraction

STIPs is an extension of SIFT (Scale-Invariant-Feature-Transform) in 3-dimensional space and uses one of Harris3D, Cuboid or Hessian as the detector.

http://www.di.ens.fr/~laptev/download.html

patch的分割有重叠~~

算是对depth map的预处理了 ~~

So the STIPs features in the RGB images disclose more detail characters of the subjects themselves while in the depth images they extract more characters of the shape of the subjects.

Coding approaches

vector quantization (VQ)

One disadvantage of the VQ is that it introduces significant quantization errors since only one element of the codebook is selected to represent the descriptor. To remedy this, one usually has to design a nonlinear SVM as the classifier which tries to compensate the quantization errors. However, using nonlinear kernels, the SVM has to pay a high training cost, including computation and storage. Considering the above defects, localityconstrained linear coding (LLC) –a more accurate and efficient coding approach[9]is adopted to replace VQ in this paper

Pooling strategy

Similar to the VQ coding approach, the LLC coding coefficients ci are expected to be combined into a global representation of the sample for classification.

DataSet

RGBD-HuDaAct[1]video database

The video sample consists of synchronized and calibrated RGB-D frame sequences, which contains in each frame a RGB image and a depth image, respectively. The RGB and depth images in each frame have been calibrated with a standard stereocalibration method available in OpenCV so that the points with the same coordinate in RGB and depth images are corresponded.

一片简洁的paper ，给我指明了方向 ~~

RGB-D action recognition using linear coding的更多相关文章

Multi-View Region Adaptive Multi-temporal DMM and RGB Action Recognition
论文标题:Multi-View Region Adaptive Multi-temporal DMM and RGB Action Recognition 来源/作者机构情况: 解决问题/主要思想贡献 ...
201904:Action recognition based on 2D skeletons extracted from RGB videos
论文标题:Action recognition based on 2D skeletons extracted from RGB videos 发表时间:02 April 2019 解决问题/主要思想 ...
行为识别(action recognition)相关资料
转自:http://blog.csdn.net/kezunhai/article/details/50176209 ================华丽分割线=================这部分来 ...
论文列表 for Action recognition
要读的论文: https://www.cnblogs.com/hizhaolei/p/10565405.html 骨架动作识别论文汇总 https://blog.csdn.net/bianxuewei ...
【ML】Two-Stream Convolutional Networks for Action Recognition in Videos
Two-Stream Convolutional Networks for Action Recognition in Videos & Towards Good Practices for ...
论文笔记 | A Closer Look at Spatiotemporal Convolutions for Action Recognition
( 这篇博文为原创,如需转载本文请email我: leizhao.mail@qq.com, 并注明来源链接,THX!) 本文主要分享了一篇来自CVPR 2018的论文,A Closer Look at ...
Skeleton-Based Action Recognition with Directed Graph Neural Network
Skeleton-Based Action Recognition with Directed Graph Neural Network 摘要因为骨架信息可以鲁棒地适应动态环境和复杂的背景,所以经常 ...
Two-Stream Adaptive Graph Convolutional Network for Skeleton-Based Action Recognition
Two-Stream Adaptive Graph Convolutional Network for Skeleton-Based Action Recognition 摘要基于骨架的动作识别因为 ...
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition (ST-GCN)
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition 摘要动态人体骨架模型带有进行动 ...

随机推荐

不同框架实现的todomvc
http://todomvc.com/ http://hao.jobbole.com/
ASP.NET 部分视图
ASP.NET MVC 里的部分视图,相当于 Web Form 里的 User Control.我们的页面往往会有许多重用的地方,可以进行封装重用. 使用部分视图有以下优点: 1. 可以简写代码. ...
初入AngularJS
AngularJS是一款优秀的前端JS框架,已经被用于Google的多款产品当中.AngularJS有着诸多特性,最为核心的是:MVVM.模块化.自动化双向数据绑定.语义化标签.依赖注入等等. Ang ...
一篇文章助你理解Python3中字符串编码问题
前几天给大家介绍了unicode编码和utf-8编码的理论知识,以及Python2中字符串编码问题,没来得及上车的小伙伴们可以戳这篇文章:浅谈unicode编码和utf-8编码的关系和一篇文章助你理解 ...
MPI并行计算模拟N体问题
实验内容 N体问题是指找出已知初始位置.速度和质量的多个物体在经典力学情况下的后续运动.在本次实验中,你需要模拟N个物体在二维空间中的运动情况.通过计算每两个物体之间的相互作用力,可以确定下一个时间周 ...
Python爬虫爬取一篇韩寒新浪博客
网上看到大神对Python爬虫爬到非常多实用的信息,认为非常厉害.突然对想学Python爬虫,尽管自己没学过Python.但在网上找了一些资料看了一下,看到爬取韩寒新浪博客的视频.共三集,第一节讲爬取 ...
使用LruCache和DiskLruCache来下载图片
LruCache是一个非常好用的图片缓存工具: 主要做法是:滑动图片时将图片的bitmap缓存在LruCache<String, Bitmap>中,退出程序后将图片缓存进文件中.採用Dis ...
99.重载[] * -> ->*
#include "mainwindow.h" #include <QApplication> #include <QPushButton>> //重 ...
洛谷P1919 【模板】A*B Problem升级版（FFT快速傅里叶）
题目描述给出两个n位10进制整数x和y,你需要计算x*y. 输入输出格式输入格式: 第一行一个正整数n. 第二行描述一个位数为n的正整数x. 第三行描述一个位数为n的正整数y. 输出格式: 输出一 ...
WISP > Client+AP > WDS　　的区别
最直白易懂的分别:WISP > Client+AP > WDS WISP,真正万能,兼容任何厂牌的上级AP,毋须设置上级AP,不受上级AP的信道影响,自由DHCP,所带机器或设备的IP,上 ...

RGB-D action recognition using linear coding

RGB-D action recognition using linear coding的更多相关文章

随机推荐

热门专题