深度学习数据集Deep Learning Datasets
Datasets
Symbolic Music Datasets
- Piano-midi.de: classical piano pieces (http://www.piano-midi.de/)
- Nottingham : over 1000 folk tunes (http://abc.sourceforge.net/NMD/)
- MuseData: electronic library of classical music scores (http://musedata.stanford.edu/)
- JSB Chorales: set of four-part harmonized chorales (http://www.jsbchorales.net/index.shtml)
Natural Images
- MNIST: handwritten digits (http://yann.lecun.com/exdb/mnist/)
- NIST: similar to MNIST, but larger
- Perturbed NIST: a dataset developed in Yoshua’s class (NIST with tons of deformations)
- CIFAR10 / CIFAR100: 32×32 natural image dataset with 10/100 categories (http://www.cs.utoronto.ca/~kriz/cifar.html)
- Caltech 101: pictures of objects belonging to 101 categories (http://www.vision.caltech.edu/Image_Datasets/Caltech101/)
- Caltech 256: pictures of objects belonging to 256 categories (http://www.vision.caltech.edu/Image_Datasets/Caltech256/)
- Caltech Silhouettes: 28×28 binary images contains silhouettes of the Caltech 101 dataset
- STL-10 dataset is an image recognition dataset for developing unsupervised feature learning, deep learning, self-taught learning algorithms. It is inspired by the CIFAR-10 datasetbut with some modifications.http://www.stanford.edu/~acoates//stl10/
- The Street View House Numbers (SVHN) Dataset - http://ufldl.stanford.edu/housenumbers/
- NORB: binocular images of toy figurines under various illumination and pose (http://www.cs.nyu.edu/~ylclab/data/norb-v1.0/)
- Imagenet: image database organized according to the WordNethierarchy (http://www.image-net.org/)
- Pascal VOC: various object recognition challenges (http://pascallin.ecs.soton.ac.uk/challenges/VOC/)
- Labelme: A large dataset of annotated images, http://labelme.csail.mit.edu/Release3.0/browserTools/php/dataset.php
- COIL 20: different objects imaged at every angle in a 360 rotation(http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php)
- COIL100: different objects imaged at every angle in a 360 rotation (http://www1.cs.columbia.edu/CAVE/software/softlib/coil-100.php)
- Arcade Universe- An artificial dataset generator with images containing arcade games sprites such as tetris pentomino/tetromino objects. This generator is based on the O. Breleux’s bugland dataset generator.
- A collection of datasets inspired by the ideas from BabyAISchool:
- BabyAIShapesDatasets : distinguishing between 3 simple shapes
- BabyAIImageAndQuestionDatasets : a question-image-answer dataset
- Datasets generated for the purpose of an empirical evaluation of deep architectures (DeepVsShallowComparisonICML2007):
- MnistVariations : introducing controlled variations in MNIST
- RectanglesData : discriminating between wide and tall rectangles
- ConvexNonConvex : discriminating between convex and nonconvex shapes
- BackgroundCorrelation : controlling the degree of correlation in noisy MNIST backgrounds
Faces
- Labelled Faces in the Wild: 13,000 images of faces collected from the web, labelled with the name of the person pictured (http://vis-www.cs.umass.edu/lfw/)
- Toronto Face Dataset
- Olivetti: a few images of several different people (http://www.cs.nyu.edu/~roweis/data.html)
- Multi-Pie: The CMU Multi-PIE Face Database (http://www.multipie.org/)
- Face-in-Action (http://www.flintbox.com/public/project/5486/)
- JACFEE: Japanese and Caucasian Facial Expressions of Emotion (http://www.humintell.com/jacfee/)
- FERET: The Facial Recognition Technology Database (http://www.itl.nist.gov/iad/humanid/feret/feret_master.html)
- mmifacedb: MMI Facial Expression Database (http://www.mmifacedb.com/)
- IndianFaceDatabase: http://vis-www.cs.umass.edu/~vidit/IndianFaceDatabase/)
- (e.g. The Yale Face Database (http://vision.ucsd.edu/content/yale-face-database) and The Yale Face Database B (http://vision.ucsd.edu/~leekc/ExtYaleDatabase/ExtYaleB.html)).
Text
- 20 newsgroups: classification task, mapping word occurences to newsgroup ID (http://qwone.com/~jason/20Newsgroups/)
- Reuters (RCV*) Corpuses: text/topic prediction (http://about.reuters.com/researchandstandards/corpus/)
- Penn Treebank : used for next word prediction or next character prediction (http://www.cis.upenn.edu/~treebank/)
- Broadcast News: large text dataset, classically used for next word prediction (http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC97S44)
- Wikipedia Dataset
- Multidomain sentiment analysis dataset: http://www.cs.jhu.edu/~mdredze/datasets/sentiment/
Speech
- TIMIT Speech Corpus: phoneme classification (http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1)
- Aurora : Timit with noise and additional information
- MovieLens: Two datasets available from http://www.grouplens.org.
The first dataset has 100,000 ratings for 1682 movies by 943 users,
subdivided into five disjoint subsets. The second dataset has about 1
million ratings for 3900 movies by 6040 users. - Jester: This dataset contains 4.1 million continuous ratings (-10.00 to +10.00) of 100 jokes from 73,421 users.
- Netflix Prize: Netflix released an anonymised version of their movie rating dataset; it consists of 100 million ratings, done by 480,000 users who have rated between 1 and all of the 17,770 movies.
- Book-Crossing dataset: This dataset is from the Book-Crossing community, and contains 278,858 users providing 1,149,780 ratings about 271,379 books.
Misc
- “Musk” dataset
- CMU Motion Capture Database: (http://mocap.cs.cmu.edu/)
- Brodatz dataset: texture modeling (http://www.ux.uis.no/~tranden/brodatz.html)
- Million Song dataset: http://labrosa.ee.columbia.edu/millionsong/
- Merck Molecular Activity Challenge - http://www.kaggle.com/c/MerckActivity/data
from: http://deeplearning.net/datasets/
深度学习数据集Deep Learning Datasets的更多相关文章
- 深度学习(Deep Learning)资料大全(不断更新)
Deep Learning(深度学习)学习笔记(不断更新): Deep Learning(深度学习)学习笔记之系列(一) 深度学习(Deep Learning)资料(不断更新):新增数据集,微信公众号 ...
- 学习笔记之深度学习(Deep Learning)
深度学习 - 维基百科,自由的百科全书 https://zh.wikipedia.org/wiki/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0 深度学习(deep lea ...
- 读李宏毅《一天看懂深度学习》——Deep Learning Tutorial
大牛推荐的入门用深度学习导论,刚拿到有点懵,第一次接触PPT类型的学习资料,但是耐心看下来收获还是很大的,适合我这种小白入门哈哈. 原PPT链接:http://www.slideshare.net/t ...
- 深度学习(deep learning)
最近deep learning大火,不仅仅受到学术界的关注,更在工业界受到大家的追捧.在很多重要的评测中,DL都取得了state of the art的效果.尤其是在语音识别方面,DL使得错误率下降了 ...
- 如何正确理解深度学习(Deep Learning)的概念
现在深度学习在机器学习领域是一个很热的概念,不过经过各种媒体的转载播报,这个概念也逐渐变得有些神话的感觉:例如,人们可能认为,深度学习是一种能够模拟出人脑的神经结构的机器学习方式,从而能够让计算机具有 ...
- 深度学习教程Deep Learning Tutorials
Deep Learning Tutorials Deep Learning is a new area of Machine Learning research, which has been int ...
- Caffe——清晰高效的深度学习(Deep Learning)框架
Caffe(http://caffe.berkeleyvision.org/)是一个清晰而高效的深度学习框架,其作者是博士毕业于UC Berkeley的贾扬清(http://daggerfs.com/ ...
- 深度学习研究组Deep Learning Research Groups
Deep Learning Research Groups Some labs and research groups that are actively working on deep learni ...
- 深度学习(deep learning)优化调参细节(trick)
https://blog.csdn.net/h4565445654/article/details/70477979
随机推荐
- 【LOJ】#2127. 「HAOI2015」按位或
题解 听说这是一道论文题orz \(\sum_{k = 1}^{\infty} k(p^{k} - p^{k - 1})\) 答案是这个多项式的第\(2^N - 1\)项的系数 我们反演一下,卷积变点 ...
- hdoj1879 继续畅通工程(Prime || Kruskal)
题目链接 http://acm.hdu.edu.cn/showproblem.php?pid=1879 思路 这题和hdoj1102很像,图中的有一些路已经修好了,对于这些已经修好的路,我们令还需要修 ...
- Linq To Sql 使用初探
最近有数据处理需要,就是那种从数据库中把数据取出来 ,对其中的部分字段做一些处理再吐回去的工作,从同事那里学习到了,这中活最适合使用 Linq to Sql 这种方式,不用搭建框架,不用自建实体,直接 ...
- caffe fine tune 复制预训练model的参数和freeze指定层参数
复制预训练model的参数,只需要重新copy一个train_val.prototxt.然后把不需要复制的层的名字改一下,如(fc7 -> fc7_new),然后fine tune即可. fre ...
- php7的新特性
新增操作符1.??$username = $_GET['user'] ?? '';$username = isset($_GET['user']) ? $_GET['user'] : 'nobody' ...
- iOS 11开发教程(一)
iOS 11开发概述 iOS 11是目前苹果公司用于苹果手机和苹果平板电脑的最新的操作系统.该操作系统的测试版于2017年6月6号(北京时间)被发布.本章将主要讲解iOS 11的新特性.以及使用Xco ...
- hdu1242Rescue
STL容器之优先队列 优先级队列,以前刷题的时候用的比较熟,现在竟然我只能记得它的关键字是priority_queue(太伤了).在一些定义了权重的地方这个数据结构是很有用的. 先回顾队列的定义 ...
- ARM 中必须明白的几个概念
文章具体介绍了关于ARM的22个常用概念. 1.ARM中一些常见英文缩写解释 MSB:最高有效位: LSB:最低有效位: AHB:先进的高性能总线: VPB:连接片内外设功能的VLSI外设总线: EM ...
- Django Template Language 模板语言
一.标签 tags 1.普通变量 普通变量用{{ }} 变量名由数字.字母.下划线组成 点.在模板语言中用来获取对象相应的属性值 示例 {# 取variable中的第一个参数 #} {{ variab ...
- SQLSERVER2014集群实战——DNS的坑
近几日生产环境总是偶发的出现数据库连接失败的错误,一开始并未引起重视,因为反馈的人很少,而且应用服务器与数据库服务器都处在同一机房的内网环境,相互之间的访问应该是很稳定的.直到早上有几分钟的时间里出现 ...