The Microsoft Research Outreach team has worked extensively with the external research community to enable adoption of cloud-based research infrastructure over the past few years. Through this process, we experienced the ubiquity of Jim Gray’s fourth paradigm of discovery based on data-intensive science – that is, almost all research projects have a data component to them. This data deluge also demonstrated a clear need for curated and meaningful datasets in the research community, not only in computer science but also in interdisciplinary and domain sciences.

Today we are excited to launch Microsoft Research Open Data – a new data repository in the cloud dedicated to facilitating collaboration across the global research community. Microsoft Research Open Data, in a single, convenient, cloud-hosted location, offers datasets representing many years of data curation and research efforts by Microsoft that were used in published research studies.

Why we are investing in this

The goal is to provide a simple platform to Microsoft researchers and collaborators to share datasets and related research technologies and tools. Microsoft Research Open Data is designed to simplify access to these datasets, facilitate collaboration between researchers using cloud-based resources and enable reproducibility of research. We will continue to shape and grow this repository and add features based on feedback from the community.
We recognize that there are dozens of data repositories already in use by researchers and expect that the capabilities of this repository will augment existing efforts.

Figure 1 – Dataset in Microsoft Research Open Data

“This is a game changer for the big data community. Initiatives like Microsoft Research Open Data reduce barriers to data sharing and encourage reproducibility by leveraging the power of cloud computing”
-Sam Madden, Professor, Massachusetts Institute of Technology

With data growing at an exponential rate, perceived to be over 150 ZB of data available by 2025, it is now recognized that we need to prioritize bringing processing to data versus relying on data movement through Internet bandwidth that is growing at a much slower pace. We believe that there is real utility in providing an option to bring the processing to the data. Therefore, in addition to providing an option to download the data assets, users can also copy datasets directly to an Azure based Data Science virtual machine, as shown in Figure 2.

Figure 2 – Data copied from microsoftopendata.com to an Azure based Linux virtual machine

The Data Science virtual machine comes preloaded with a variety of development tools popular with researchers and practitioners as can been seen in Figure 3.

Figure 3 Linux Data Science virtual machine

“I am often asked to share my research data and the public sharing I have done in the past has been popular. Coordinating and cataloging these datasets in one place with Azure will be helpful for both internal and external researchers, giving them easy access, encouraging collaboration, and providing convenient cloud-based access to the wealth of Microsoft Research shared data.” 
-John Krumm, Principal Researcher, Microsoft Research AI

Datasets in Microsoft Research Open Data are categorized by their primary research area, as shown in Figure 4. You can find links to research projects or publications with the dataset. You can browse available datasets and download them or copy them directly to an Azure subscription through an automated workflow. To the extent possible, the repository meets the highest standards for data sharing to ensure that datasets are findable, accessible, interoperable and reusable; the entire corpus does not contain personally identifiable information. The site will continue to evolve as we get feedback from users.

Figure 4 – Dataset Categories

Microsoft Research Open Data is an outcome of the Microsoft Research Outreach Data science program and was made possible by a collaboration between many teams at Microsoft, Microsoft researchers, our industry partners, and our academic advisors.

We would love to hear your comments and feedback! Please send us a note via the Feedback feature on the sitehttp://microsoftopendata.com and tell us what you think.

Announcing Microsoft Research Open Data – Datasets by Microsoft Research now available in the cloud的更多相关文章

  1. 未能加载包“Microsoft SQL Server Data Tools”

    直接在vs2013里的App_Data目录创建数据库,在服务器资源管理器中查看时报错: 未能加载包“Microsoft SQL Server Data Tools” 英文: The 'Microsof ...

  2. Microsoft SQL Server Data Tools - Business Intelligence for Visual Studio 2013 http://www.microsoft.com/en-us/download/details.aspx?id=42313

    Microsoft SQL Server Data Tools - Business Intelligence for Visual Studio 2013 http://www.microsoft. ...

  3. “DataTable”是“System.Data.DataTable”和“Microsoft.Office.Interop.Excel.DataTable”之间的不明确的引用

    “DataTable”是“System.Data.DataTable”和“Microsoft.Office.Interop.Excel.DataTable”之间的不明确的引用 造成这个错误的原因是,在 ...

  4. 解决VS2010在新建实体数据模型出现“在 .NET Framework Data Provider for Microsoft SQL Server Compact 3.5 中发生错误。请与提供程序供应商联系以解决此问题。”的问题

    最近想试着学习ASP.NET MVC,在点击 添加--新建项--Visual C#下的数据中的ADO.NET 实体数据模型,到"选择您的数据连接"时,出现错误,"在 .N ...

  5. 在WebService中使用Microsoft.Practices.EnterpriseLibrary.Data配置数据库

    1. 新建WebApplication1项目 1.1 新建—Web—ASP.NET Empty Web Application--WebApplication1 1.2 添加一个WebForm1 2. ...

  6. Microsoft.Jet.OLEDB.4.0和Microsoft.ACE.OLEDB.12.0的区别

    Microsoft.Jet.OLEDB.4.0和Microsoft.ACE.OLEDB.12.0的区别 时间 2012-12-19 20:30:12  CSDN博客原文  http://blog.cs ...

  7. EF core2.1+MySQL报错'Void Microsoft.EntityFrameworkCore.Storage.Internal.RelationalParameterBuilder..ctor(Microsoft.EntityFrameworkCore.Storage.IRelationalTypeMapper)

    一.使用.net core 2.0 EF mysql 运行一直报错如下: An unhandled exception occurred while processing the request. M ...

  8. win10x64启动vs2010报错:未能加载C:\Windows\Microsoft.NET\Framework\v2.0.50727\microsoft.vsa.tlb

    换了新电脑,因为是win10x64系统,可能是兼容性的问题吧. 启动vs2010,在启动画面直接报错:未能加载C:\Windows\Microsoft.NET\Framework\v2.0.50727 ...

  9. 【Microsoft Azure 的1024种玩法】六、使用Azure Cloud Shell对Linux VirtualMachines 进行生命周期管理

    [文章简介] Azure Cloud Shell 是一个用于管理 Azure 资源的.可通过浏览器访问的交互式经验证 shell. 它使用户能够灵活选择最适合自己工作方式的 shell 体验,本篇文章 ...

随机推荐

  1. Mybatis框架基础支持层——反射工具箱之MetaClass(7)

    简介:MetaClass是Mybatis对类级别的元信息的封装和处理,通过与属性工具类的结合, 实现了对复杂表达式的解析,实现了获取指定描述信息的功能 public class MetaClass { ...

  2. sublime前端必备插件

    1,docblockr     javascr 和 CSS快捷注释插件 在javascript中  写出函数后,/**+回车 就会出现下面函数注释补全. -----/** * @param {[typ ...

  3. 纯CSS修改checkbox复选框样式

    借鉴网友博客, 改用后整理收录 效果图: 移入: <!DOCTYPE html> <html> <head> <meta charset="UTF- ...

  4. 全球排名第一的免费开源ERP Odoo 12产品上海发布会报名开始

    Odoo V12 产品上海发布会暨企业数字化转型论坛 点击进入活动报名通道 高成本的软件开发,耗时的系统安装,繁琐的操作培训… 这一系列问题都是企业数字化管理的痛点, "软件"成为 ...

  5. 开启bin-log日志mysql报错:This function has none of DETERMINISTIC, NO SQL解决办法

    开启bin-log日志mysql报错:This function has none of DETERMINISTIC, NO SQL解决办法: 创建存储过程时 出错信息: ERROR 1418 (HY ...

  6. Oracle 10g 应用补丁PSU 10.2.0.5.180717

    最近测试了一下在Oracle 10g下面(单实例下面)升级.应用补丁PSU 10.2.0.5.180717,打这个补丁的主要原因是 Oracle 将于 2019年6月启用新的SCN兼容性,并且由于Bi ...

  7. mysql使用索引的注意事项

    使用索引的注意事项 使用索引时,有以下一些技巧和注意事项: 1.索引不会包含有NULL值的列 只要列中包含有NULL值都将不会被包含在索引中,复合索引中只要有一列含有NULL值,那么这一列对于此复合索 ...

  8. SQLServer之删除存储过程

    删除存储过程注意事项 在删除任何存储过程之前,请检查依赖对象,并且相应地修改这些对象. 如果没有更新这些对象,则删除存储过程可能会导致依赖对象和脚本失败. 若要显示现有过程的列表,请查询 sys.ob ...

  9. jquery datatable 实例操作

    var dataTables = $(".table").dataTable({ data: d,//为ajax的值,没有直接用插件自带的请求数据方式,个人觉得data的方式好控制 ...

  10. LeetCode算法题-Binary Search(Java实现)

    这是悦乐书的第297次更新,第316篇原创 01 看题和准备 今天介绍的是LeetCode算法题中Easy级别的第165题(顺位题号是704).给定n个元素的排序(按升序)整数数组nums和目标值,编 ...