List of RGBD datasets
This is an incomplete list of datasets which were captured using a Kinect or similar devices. I initially began it to keep track of semantically labelled datasets, but I have now also included some camera tracking and object pose estimation datasets. I ultimately aim to keep track of all Kinect-style datasets available for researchers to use.
Where possible links have been added to project or personal pages. Where I have not been able to find these I have used a direct link to the data
Please send suggestions for additions and corrections to me at m.firman <at> cs.ucl.ac.uk.
This page is automatically generated from a YAML file, and was last updated on 26 November, 2014.
Turntable data
These datasets capture objects under fairly controlled conditions. Bigbird is the most advanced in terms of quality of image data and camera poses, while the RGB-D object dataset is the most extensive.

RGBD Object dataset
Introduced: ICRA 2011
Device: Kinect v1
Description: 300 instances of household objects, in 51 categories. 250,000 frames in total
Labelling: Category and instance labelling. Includes auto-generated masks, but no exact 6DOF pose information.
Download: Project page

Bigbird dataset
Introduced: ICRA 2014
Device: Kinect v1 and DSLR
Description: 100 household objects
Labelling: Instance labelling. Masks, ground truth poses, registered mesh.
Download: Project page
Segmentation and pose estimation under controlled conditions
These datasets include objects arranged in controlled conditions. Clutter may be present. CAD or meshed models of the objects may or may not be provided. Most provide 6DOF ground truth pose for each object.

Object segmentation dataset
Introduced: IROS 2012
Device: Kinect v1
Description: 111 RGBD images of stacked and occluding objects on table.
Labelling: Per-pixel segmentation into objects.
Download: Project page

Willow Garage Dataset
Introduced: 2011
Device: Kinect v1
Description: Around 160 frames of household objects on a board in controlled environment.
Labelling: 6DOF pose for each object, taken from board calibration. Per-pixel labelling.
Download: Project page

'3D Model-based Object Recognition and Segmentation in Cluttered Scenes'
Introduced: IJCV 2009
Device: Minolta Vivid 910 (only depth, no RGB!)
Description: 50 frames depicting five objects in various occluding poses. No background clutter in any image.
Labelling: Pose and per-point labelling information. 3D mesh models of each of the 5 objects.
Download: Project page

'A Global Hypotheses Verifcation Method for 3D Object Recognition'
Introduced: ECCV 2012
Device: Kinect v1
Description: 50 Kinect frames, library of 35 objects
Labelling: 6DOF GT of each object (unsure how this was gathered). No per-pixel labelling.
Download: Direct link

'Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes'
Introduced: ACCV 2012
Device: Kinect v1
Description: 18,000 Kinect images, library of 15 objects.
Labelling: 6DOF pose for each object in each image. No per-pixel labelling.
Download: Project page
Kinect data from the real world

RGBD Scenes dataset
Introduced: ICRA 2011
Device: Kinect v1
Description: Real indoor scenes, featuring objects from the RGBD object dataset 'arranged' on tables, countertops etc. Video sequences of 8 scenes.
Labelling: Per-frame bounding boxes for objects from RGBD object dataset. Other objects not labelled.
Download: Project page

RGBD Scenes dataset v2
Introduced: ICRA 2014
Device: Kinect v1
Description: A second set of real indoor scenes featuring objects from the RGBD object dataset. Video sequences of 14 scenes, together with stitched point clouds and camera pose estimations.
Labelling: Labelling of points in stitched cloud into one of 9 classes (objects and furniture), plus background.
Download: Project page

'Object Disappearance for Object Discovery'
Introduced: IROS 2012
Device: Kinect v1
Description: Three datasets: Small, with still images. Medium, video data from an office environement. Large, video over several rooms. Large dataset has 7 unique objects seen in 397 frames. Data is in ROS bag format.
Labelling: Ground truth object segmentations.
Download: Project page

'Object Discovery in 3D scenes via Shape Analysis'
Introduced: ICRA 2014
Device: Kinect v1
Description: KinFu meshes of 58 very cluttered indoor scenes.
Labelling: Ground truth binary labelling (object/not object) performed on segments proposed by the algorithm, with no labelling on the mesh.
Download: Project page

Cornell-RGBD-Dataset
Introduced: NIPS 2011
Device: Kinect v1
Description: Multiple RGBD frames from 52 indoor scenes. Stitched point clouds (using RGBDSLAM).
Labelling: Per-point object-level labelling on the stitched clouds.
Download: Project page

NYU Dataset v1
Introduced: ICCV 2011 Workshop on 3D Representation and Recognition
Device: Kinect v1
Description: Around 51,000 RGBD frames from indoor scenes such as bedrooms and living rooms. Note that the updated NYU v2 dataset is typically used instead of this earlier version.
Labelling: Dense multi-class labelling for 2283 frames.
Download: Project page

NYU Dataset v2
Introduced: ECCV 2012
Device: Kinect v1
Description: ~408,000 RGBD images from 464 indoor scenes, of a somewhat larger diversity than NYU v1. Per-frame accelerometer data.
Labelling: Dense labelling of objects at a class and instance level for 1449 frames. Instance labelling is not carried across scenes. This 1449 subset is the dataset typically used in experiments.
Download: Project page

'Object Detection and Classification from Large-Scale Cluttered Indoor Scans'
Introduced: Eurographics 2014
Device: Faro Lidar scanner
Description: Faro lidar scans of ~40 academic offices, with 2-3 scans per office. Each scan is 0.25GB-2GB. Scans include depth and RGB.
Labelling: No labelling present. The labelling shown in the exemplar image is their algorithm output.
Download: Project page

SUN3D
Introduced: ICCV 2013
Device: Kinect v1
Description: Videos of indoor scenes, registered into point clouds.
Labelling: Polygons of semantic class and instance labels on frames propagated through video.
Download: Project page

B3DO: Berkeley 3-D Object Dataset
Introduced: ICCV Workshop on Consumer Depth Cameras in Computer Vision 2011
Device: Kinect v1
Description: Aim is to crowdsource collection of Kinect data, to be included in future releases. Version 1 has 849 images, from 75 scenes.
Labelling: Bounding box labelling at a class level.
Download: Project page
SLAM, registration and camera pose estimation

TUM Benchmark Dataset
Introduced: IROS 2012
Device: Kinect v1
Description: Many different scenes and scenarios for tracking and mapping, including reconstruction, robot kidnap etc.
Labelling: 6DOF ground truth from motion capture system with 10 cameras.
Download: Project page

Microsoft 7-scenes dataset
Introduced: CVPR 2013
Device: Kinect v1
Description: Kinect video from 7 indoor scenes.
Labelling: 6DOF 'ground truth' from Kinect Fusion.
Download: Project page

IROS 2011 Paper Kinect Dataset
Introduced: IROS 2011
Device: Kinect v1
Description: Lab-based setup. The aim seems to be to track the motion of camera.
Labelling: 6DOF ground truth from Vicon system
Download: Project page

'When Can We Use KinectFusion for Ground Truth Acquisition?'
Introduced: Workshop on Color-Depth Camera Fusion in Robotics, IROS 2012
Device: Kinect v1
Description: A set of 57 scenes, captured from natural environments and from artificial shapes. Each scene has a 3D mesh, volumetric data and registered depth maps.
Labelling: Frame-to-frame transformations as computed from KinectFusion. The 'office' and 'statue' scenes have LiDAR ground truth.
Download: Project page

DAFT Dataset
Introduced: ICPR 2012
Device: Kinect v1
Description: A few short sequences of different planar scenes captured under various camera motions. Used to demonstrate repeatability of feature points under transformations.
Labelling: Camera motion type. 2D homographies between the planar scene in different images.
Download: Project page

ICL-NUIM Dataset
Introduced: ICRA 2014
Device: Kinect v1 (synthesised)
Description: Eight synthetic RGBD video sequences: four from a office scene and four from a living room scene. Simulated camera trajectories are taken from a Kintinuous output from a sensor being moved around a real-world room.
Labelling: Camera trajectories for each video. Geometry of the living room scene as an .obj file.
Download: Project page

'Automatic Registration of RGB-D Scans via Salient Directions'
Introduced: ICCV 2013
Device: RGBD Laser scanner
Description: Several laser scans taken from each of a European church, city and castle scenes.
Labelling: Results of the authors' registration algorithm.
Download: Project page

Stanford 3D Scene Dataset
Introduced: SIGGRAPH 2013
Device: Xtion Pro Live (Kinect v1 equivalent)
Description: RGBD videos of six indoor and outdoor scenes, together with a dense reconstruction of each scene.
Labelling: Estimated camera pose for each frame. No ground truth pose, so not ideal for quantitative evaluation.
Download: Project page
Tracking
See also some of the human datasets for body and face tracking.

Princeton Tracking Benchmark
Introduced: ICCV 2013
Device: Kinect v1
Description: 100 RGBD videos of moving objects such as humans, balls and cars.
Labelling: Per-frame bounding box covering target object only.
Download: Project page
Datasets involving humans: Body and hands

Cornell Activity Datasets: CAD-60 and CAD-120
Introduced: PAIR 2011/IJRR 2013
Device: Kinect v1
Description: Videos of humans performing activities
Labelling: Each video given at least one label, such as eating, opening or working on computer. Skeleton joint position and orientation labelled on each frame.
Download: Project page

RGB-D Person Re-identification Dataset
Introduced: First International Workshop on Re-Identification 2012
Device: Kinect v1
Description: Front and back poses of 79 people walking forward in different poses.
Labelling: In addition to the per-person label, the dataset provides foreground masks, skeletons, 3D meshes and an estimate of the floor.
Download: Project page

Sheffield KInect Gesture (SKIG) Dataset
Introduced: IJCAI 2013
Device: Kinect v1
Description: Total of 1080 Kinect videos of six people performing one of 10 hand gesture sequences, such as 'triangle' or 'comehere'. Sequences captured under a variety of illumination and background conditions.
Labelling: The gesture being performed in each sequence.
Download: Project page

RGB-D People Dataset
Introduced: IROS 2011
Device: Kinect v1
Description: 3000+ frames of people walking and standing in a university hallway, captured from three Kinects.
Labelling: Per-frame bounding box annotations of individual people, together with a `visibility' measure.
Download: Project page

50 Salads
Introduced: UbiComp 2013
Device: Kinect v1
Description: Over 4 hours of video of 25 people preparing 2 mixed salads each
Labelling: Accelerometer data from sensors attached to cooking utensils, and labelling of steps in the recipes.
Download: Project page

Microsoft Research Cambridge-12 Kinect gesture data set
Introduced: CHI 2012
Device: Kinect v1
Description: 594 sequences and 719,359 frames of 30 people performing 12 gestures.
Labelling: Gesture performed in each video sequence, plus motion tracking of human joint locations.
Download: Project page

UR Fall Detection Dataset
Introduced: Computer Vision Theory and Applications 2014
Device: Kinect v1
Description: Videos of people falling over. Consists of 60 sequences recorded with two Kinects.
Labelling: Accelerometer data from device attached to subject.
Download: Project page

RGBD-HuDaAct
Introduced: ICCV Workshops 2011
Device: Kinect v1
Description: 30 different humans each performing the same 12 activities, e.g. 'eat a meal'. Also include a random 'background' activity. All performed in a lab environment. Around 5,000,000 frames in total.
Labelling: Which activity being performed in each sequence.
Download: Project page

Human3.6M
Introduced: PAMI 2014
Device: SwissRanger time-of-flight (+ 2D cameras)
Description: 11 different humans performing 17 different activities. Data comes from four calibrated video cameras, 1 time-of-flight camera and (static) 3D laser scans of the actors.
Labelling: 2D and 3D human joint positions, obtained from a Vicon motion capture system.
Download: Project page
Datasets involving humans: Head and face

Biwi Kinect Head Pose Database
Introduced: IJCV 2013
Device: Kinect v1
Description: 15K images of 20 different people moving their heads in different directions.
Labelling: 3D position of the head and its rotation, acquired using 'faceshift' software.
Download: Project page

Eurecom Kinect Face Dataset
Introduced: ACCV Workshop on Computer Vision with Local Binary Pattern Variants 2012
Device: Kinect v1
Description: Images of faces captured under laboritory conditions, with different levels of occlusion and illumination, and with different facial expressions.
Labelling: In addition to occlusion and expression type, each image is manually labelled with the position of six facial landmarks.
Download: Project page

3D Mask Attack Dataset
Introduced: Biometrics: Theory, Applications and Systems 2013
Device: Kinect v1
Description: 76500 frames of 17 different people, facing the camera against a plain background. Two sets of the data are captured on the real subjects two weeks apart, while the final set consists of a single person wearing a fake face mask of the 17 different people.
Labelling: Which user is in each frame. Which images are real and which are spoofed. Manually labelled eye positions.
Download: Project page

Biwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2
Introduced: IEEE Transactions on Multimedia 2010
Device: Custom active stereo setup
Description: Simultaneous audio and visual recordings of 1109 sentences spoken by 14 different people. Each sentence spoken neutrally and with an emotion. Depth images converted to 3D mesh.
Labelling: Perceived emotions for each recording. Audio labelled with phonemes.
Download: Project page

ETH Face Pose Range Image Data Set
Introduced: CVPR 2008
Device: Custom active stereo setup
Description: 10,545 images of 20 different people turning their head.
Labelling: Nose potition and coordinate frame at the nose.
Download: Project page
List of RGBD datasets的更多相关文章
- 3D Graph Neural Networks for RGBD Semantic Segmentation
3D Graph Neural Networks for RGBD Semantic Segmentation 原文章:https://www.yuque.com/lart/papers/wmu47a ...
- 全球最大的3D数据集公开了!标记好的10800张全景图
Middlebury数据集 http://vision.middlebury.edu/stereo/data/ KITTI数据集简介与使用 https://blog.csdn.net/solomon1 ...
- [转] CVonline: Image Databases
转自:CVonline by Robert Fisher 图像数据库 Index by Topic Action Databases Biological/Medical Face Databases ...
- 一起做RGB-D SLAM 第二季 (一)
小萝卜:师兄!过年啦!是不是很无聊啊!普通人的生活就是赚钱花钱,实在是很没意思啊! 师兄:是啊…… 小萝卜:他们都不懂搞科研和码代码的乐趣呀! 师兄:可不是嘛…… 小萝卜:所以今年过年,我们再做一个S ...
- 泡泡一分钟:Exploiting Points and Lines in Regression Forests for RGB-D Camera Relocalization
Exploiting Points and Lines in Regression Forests for RGB-D Camera Relocalization 利用回归森林中的点和线进行RGB-D ...
- RGB-D数据集(SLAM的和行人检测的)
移动机器人编程一般用mrpt,这个软件来做三维,里面封装了很多常用算法. http://www.mrpt.org/download-mrpt/ SLAM的数据集,其中包括机器人slam http:// ...
- 一起做RGB-D SLAM (2)
第二讲 从图像到点云 本讲中,我们将带领读者,编写一个将图像转换为点云的程序.该程序是后期处理地图的基础.最简单的点云地图即是把不同位置的点云进行拼接得到的. 当我们使用RGB-D相机时,会从相机里读 ...
- 一起做RGB-D SLAM (5)
第五讲 Visual Odometry (视觉里程计) 2016.11 更新 把原文的SIFT替换成了ORB,这样你可以在没有nonfree模块下使用本程序了. 去掉了cv::cv2Eigen函数,因 ...
- [转] CV Datasets on the web
转自:CVPapers This material is presented to ensure timely dissemination of scholarly and technical wor ...
随机推荐
- Mybatlis SQL 注入与防范
SQL注射原理 所谓SQL注入,就是通过把SQL命令插入到Web表单提交或输入域名或页面请求的查询字符串,最终达到欺骗服务器执行恶意的SQL命令.具体来说,它是利用现有应用程序,将(恶意)的SQL命令 ...
- BUG(0):用某位表示特定属性
用某个bit表示特定属性通常有两种方式: 1.指定某个特定的value #define _PAGE_VALID 0x0001 0bit 为 1 时表示此时的page entry是有效的 用法如下,此时 ...
- 如何查看api项目接口
http://www.api.com/Api/Page/index/?format_type=json&api_cate=cms&ma=8026
- Eclipse设置jre版本 或者 jdk
设置Eclipse默认的 JRE 版本 Eclipse 配置 JDK 的方法和配置 JRE 相同 windows --> Preferences --> Java --> 完成后查看 ...
- Python之路(第二十篇) subprocess模块
一.subprocess模块 subprocess英文意思:子进程 那什么是进程呢? (一)关于进程的相关理论基础知识 进程是对正在运行程序的一个抽象,进程的概念起源于操作系统,是操作系统最核心的概念 ...
- XXE总结
0x00 目录 0x01 XML基础 定义:一种标记电子文件使其具有结构性的标记语言,可以用来标记数据.定义数据类型,是一种允许用户对自己的标记语言进行定义的源语言. XML文档结构包括XML声明.D ...
- netsharp.weixin和sdk的配置信息管理
一.微信公众号后台配置 即在微信公众号后台配置类似如下的url:http://121.40.86.55/wx?oid=gh_befcc6d4c40d 这种情况下会执行WeixinServlet类的do ...
- Django 创建一个应用程序
1. 认识Django Django是一个高级的Python Web框架,它鼓励快速开发和清洁,务实的设计. 由经验丰富的开发人员构建,它负责Web开发的许多麻烦,因此您可以专注于编写应用程序,而无需 ...
- Go环境下,编译运行etcd与goreman集群管理(1)
Go环境下编译运行etcd与goreman管理 近几年了Go在比特币.区块链.云服务等相关重要领域贡献突出,作为IT行业的传承“活到老.学到光头”,保持学习心态. 周末放假,补充一二 主题:在Go环境 ...
- 利用HBuilder开发基于MUI的H5+ app中使用百度地图定位功能
定位功能有两种方法: 首先要初始化内置地图: var map = new plus.maps.Map("map"); 这里黄色的map是html里面的id: <div id= ...