Announcing Microsoft Research Open Data – Datasets by Microsoft Research now available in the cloud
The Microsoft Research Outreach team has worked extensively with the external research community to enable adoption of cloud-based research infrastructure over the past few years. Through this process, we experienced the ubiquity of Jim Gray’s fourth paradigm of discovery based on data-intensive science – that is, almost all research projects have a data component to them. This data deluge also demonstrated a clear need for curated and meaningful datasets in the research community, not only in computer science but also in interdisciplinary and domain sciences.
Today we are excited to launch Microsoft Research Open Data – a new data repository in the cloud dedicated to facilitating collaboration across the global research community. Microsoft Research Open Data, in a single, convenient, cloud-hosted location, offers datasets representing many years of data curation and research efforts by Microsoft that were used in published research studies.
Why we are investing in this
The goal is to provide a simple platform to Microsoft researchers and collaborators to share datasets and related research technologies and tools. Microsoft Research Open Data is designed to simplify access to these datasets, facilitate collaboration between researchers using cloud-based resources and enable reproducibility of research. We will continue to shape and grow this repository and add features based on feedback from the community.
We recognize that there are dozens of data repositories already in use by researchers and expect that the capabilities of this repository will augment existing efforts.
Figure 1 – Dataset in Microsoft Research Open Data
“This is a game changer for the big data community. Initiatives like Microsoft Research Open Data reduce barriers to data sharing and encourage reproducibility by leveraging the power of cloud computing”
-Sam Madden, Professor, Massachusetts Institute of Technology
With data growing at an exponential rate, perceived to be over 150 ZB of data available by 2025, it is now recognized that we need to prioritize bringing processing to data versus relying on data movement through Internet bandwidth that is growing at a much slower pace. We believe that there is real utility in providing an option to bring the processing to the data. Therefore, in addition to providing an option to download the data assets, users can also copy datasets directly to an Azure based Data Science virtual machine, as shown in Figure 2.
Figure 2 – Data copied from microsoftopendata.com to an Azure based Linux virtual machine
The Data Science virtual machine comes preloaded with a variety of development tools popular with researchers and practitioners as can been seen in Figure 3.
Figure 3 Linux Data Science virtual machine
“I am often asked to share my research data and the public sharing I have done in the past has been popular. Coordinating and cataloging these datasets in one place with Azure will be helpful for both internal and external researchers, giving them easy access, encouraging collaboration, and providing convenient cloud-based access to the wealth of Microsoft Research shared data.”
-John Krumm, Principal Researcher, Microsoft Research AI
Datasets in Microsoft Research Open Data are categorized by their primary research area, as shown in Figure 4. You can find links to research projects or publications with the dataset. You can browse available datasets and download them or copy them directly to an Azure subscription through an automated workflow. To the extent possible, the repository meets the highest standards for data sharing to ensure that datasets are findable, accessible, interoperable and reusable; the entire corpus does not contain personally identifiable information. The site will continue to evolve as we get feedback from users.
Figure 4 – Dataset Categories
Microsoft Research Open Data is an outcome of the Microsoft Research Outreach Data science program and was made possible by a collaboration between many teams at Microsoft, Microsoft researchers, our industry partners, and our academic advisors.
We would love to hear your comments and feedback! Please send us a note via the Feedback feature on the sitehttp://microsoftopendata.com and tell us what you think.
Announcing Microsoft Research Open Data – Datasets by Microsoft Research now available in the cloud的更多相关文章
- 未能加载包“Microsoft SQL Server Data Tools”
直接在vs2013里的App_Data目录创建数据库,在服务器资源管理器中查看时报错: 未能加载包“Microsoft SQL Server Data Tools” 英文: The 'Microsof ...
- Microsoft SQL Server Data Tools - Business Intelligence for Visual Studio 2013 http://www.microsoft.com/en-us/download/details.aspx?id=42313
Microsoft SQL Server Data Tools - Business Intelligence for Visual Studio 2013 http://www.microsoft. ...
- “DataTable”是“System.Data.DataTable”和“Microsoft.Office.Interop.Excel.DataTable”之间的不明确的引用
“DataTable”是“System.Data.DataTable”和“Microsoft.Office.Interop.Excel.DataTable”之间的不明确的引用 造成这个错误的原因是,在 ...
- 解决VS2010在新建实体数据模型出现“在 .NET Framework Data Provider for Microsoft SQL Server Compact 3.5 中发生错误。请与提供程序供应商联系以解决此问题。”的问题
最近想试着学习ASP.NET MVC,在点击 添加--新建项--Visual C#下的数据中的ADO.NET 实体数据模型,到"选择您的数据连接"时,出现错误,"在 .N ...
- 在WebService中使用Microsoft.Practices.EnterpriseLibrary.Data配置数据库
1. 新建WebApplication1项目 1.1 新建—Web—ASP.NET Empty Web Application--WebApplication1 1.2 添加一个WebForm1 2. ...
- Microsoft.Jet.OLEDB.4.0和Microsoft.ACE.OLEDB.12.0的区别
Microsoft.Jet.OLEDB.4.0和Microsoft.ACE.OLEDB.12.0的区别 时间 2012-12-19 20:30:12 CSDN博客原文 http://blog.cs ...
- EF core2.1+MySQL报错'Void Microsoft.EntityFrameworkCore.Storage.Internal.RelationalParameterBuilder..ctor(Microsoft.EntityFrameworkCore.Storage.IRelationalTypeMapper)
一.使用.net core 2.0 EF mysql 运行一直报错如下: An unhandled exception occurred while processing the request. M ...
- win10x64启动vs2010报错:未能加载C:\Windows\Microsoft.NET\Framework\v2.0.50727\microsoft.vsa.tlb
换了新电脑,因为是win10x64系统,可能是兼容性的问题吧. 启动vs2010,在启动画面直接报错:未能加载C:\Windows\Microsoft.NET\Framework\v2.0.50727 ...
- 【Microsoft Azure 的1024种玩法】六、使用Azure Cloud Shell对Linux VirtualMachines 进行生命周期管理
[文章简介] Azure Cloud Shell 是一个用于管理 Azure 资源的.可通过浏览器访问的交互式经验证 shell. 它使用户能够灵活选择最适合自己工作方式的 shell 体验,本篇文章 ...
随机推荐
- Laravel5多图上传和Laravel5单图上传的功能实现
Laravel5文件上传默认只能上传一张图片,但是有的时候我们需要一次性上传多图就不行了,我在网上看了很多关于laravel5图片上传的文章,很多都只是介绍laravel5单图上传,多图片上传介绍少之 ...
- #WEB安全基础 : HTML/CSS | 0x10.1更多表单
来认识更多的表单吧,增加知识面 我只创建了一个index.html帮助你认识它们 以下是代码 <!DOCTYPE html> <html> <head> <m ...
- AV-TEST杀毒软件能力测试(2018年1月-12月)杀毒软件排名
2018年1月到12月,AV-TEST攻击了实验室中无数的Windows系统,在830多项单独测试中测试了7种杀毒软件和5种快捷工具. 1.测试概述 在长期测试中,实验室在各种实际场景中测试了杀毒软件 ...
- 多层json的构造,取值,还有使用bootstrap的tree view在前端展示的相关问题
bootstrap-tree view是一款非常好用的插件,它可以添加任意多层节点,效果如下所示: 使用之前需要在HTML页面添加依赖文件: <link href="bootstrap ...
- java-----理解java的三大特性之多态
的java提高篇(四)-----理解的java的三大特性之多态 面向对象编程有三大特性:封装,继承,多态. 封装隐藏了类的内部实现机制,可以在不影响使用的情况下改变类的内部结构,同时也保护了数据.对外 ...
- 2星|《重新定义物流》:形式像PPT,内容像公关稿
全书彩印,彩图大概占一半篇幅,感觉是把一些PPT配上点说明拼成了一本书.前后的彩图风格差异较大,大部分给我的感觉都是堆砌名词术语的官方宣传材料,少部分色调单一形式简单的图,像是作者们自己绘制的,反而能 ...
- phoenix API服务发布
概述 Elixir 的 Phoenix 框架对于开发 Web 应用非常方便,不仅有 RoR 的便利,还有 Erlang 的性能和高并发优势. 但是应用的发布涉及到 Erlang 和 Elixir 环境 ...
- 转://ORA-00603,ORA-27501,ORA-27300,ORA-27301,ORA-27302故障案例一则
背景介绍: 这是一套windows的rac系统.数据库后台日志报ORA-00474:SMON process terminated with error.接着报ORA-00603,ORA-27501, ...
- Linux系统中常见的目录名称以及相应内容
目录名称 应放置文件的内容 /boot 开机所需文件——内核.开机菜单以及所需配置文件等等 /dev 以文件形式存放任何设备与接口 /etc 配置文件 /home 用户家目录 /bin 存放单用户模式 ...
- day 16 包的导入
包的认识 '''包通过文件夹来管理一系列功能相近的模块包:一系列模块的集合体重点:包中一定有一个专门用来管理包中所有模块的文件包名:存放一系列模块的文件夹名字包名(包对象)存放的是管理模块的那个文件 ...