I researched a lot about storage system classes given at good universities this year. This had two reasons: The first was this
post of a researcher at NetApp, about the missing of a good storage or file system class book and secondly our
own storage systems class where I was the TA.
In this post I want to give a short overview about the various different courses, their focus, and other things. Please note, the following text might contain errors or misconceptions on my part. I also might have missed other storage courses at these universities.
University of California, Santa Cruz:
Let's begin with the course of the University of California in Santa Cruz. Storage is a huge at UCSC with the Storage Systems Research Center that partners with nearly very everyone. The
ceph file system and the
crush hash function are two outcomes of their research.
The course consists of a series of lectures (two per week), lots of reading material, and a project. The lectures are about file systems beginning with uniprocessor filesystems, performance analysis and (very fast) to distributed filesystems. They also cover fault tolerance and other advanced topics. Their reading material consists of 37 papers from classics like "
File System Design for an NFS File System Appliance" to state of the art research papers like "
An Analysis of Data Corruption in the Storage Stack" (FAST 2008) that come about two weeks before.
I miss some important basics that IMHO are important for understanding storage system design, like properties of modern hard disks and I am not that into archival storage (my boss is), but it is a really good designed course. Unfortunately, the lecture slides are not available online.
Columbia University, New York
I may have missed one, but the last storage related course at Columbia University had been in 2004 by Kostas Magoutis. The course is focused on network storage and probably relies on basics from an Operating Systems class or a basis storage class. The lectures had been one per week with one to three papers are reading material per week.
Really nice is that the lecturer has posted notes how the read the papers with questions and annotations to some of the material. Interestingly, data deduplication is covered with the LBFS, the Venti paper, and Henson'sCompare-By-Hash papers.
There are three books recommended for the course "UNIX Internals (1996)", "The Design and the Implementation of the 4.4 BSD Operating System (1996)" and "NFS Illustrated (1999)".
Cornell University, New York
Advanced Distributed Storage Systems, Spring 2009:
At the Cornel University, I found the course and advanced distributed storage systems by Hakim Westherspoon (has taken part in the
OceanStore project). The lectures, given two per week, handle "Cloud Computing, "Network File Systems", the important topics of Consistency, Availability, Replication, and Scalability.
I think the major strength of this course is that it seems to focus much more than the other courses and the important concepts needed for storage system design, implementation and research than the focus on standards, products, and storage management issues. The major weakness is that the individual lectures are very focused on the research papers, whose content is presented. Even to the point that there is no single presentation scheme. I think the overall consistency of the lecture is weakened this way.
One interesting aspect of the course is that the students have to write and hand-in short summaries of the reading material papers consisting a summary (3-4 sentences), two or three major strength points, two or three weaknesses and one question of future work that should be followed in the option of the student.
The have to projects as part of the course: In the first the students have to develop a distributed file system based on Amazon Web Service infrastructure. the second is a research project, the students have to come up with by themselves.
John Hopkins University
Storage Systems, Fall 2007:
At the John Hopkins University -- where our professors of Christian Scheideler and my advisor Andre Brinkmann (as visiting PhD student) had formerly been -- I found the Storage Systems course by Randal Burns.
As usual the course consists of a lecture series (2 lectures as 50min per week), home works, and a project. I like that the course some basics like disk drive architecture that a essential to understand the design of storage systems. On the other side it is a bit short on distributed file systems.
University of Notre Dame:
The University of Notre Dame offered in 2005 the course "Distributed Storage" by Surendar Chandra.
As usual the course consists of a series of lectures (2 per week) and a project. The lectures topics are "Naming and location", "Consistency and Replication", "Distributed Storage Management", "Security", "Peer-to-Peer Storage and Sensors", and "Energy Management". The reading material consists of not less than 40 papers. My impression is that the collection of reading material differs much from the material of the other courses covered here, e.g. the well-known "classical" papers are not linked.
Technion:
Technion is the "Israel Institute of Technology" in Haifa and I said
before: I am pretty envy to the students there. However, not especially because of the "Filesystems" course.
The lecture series consists of an short introduction on disk drive architecture, RAID, sequential data processing on tapes (hey, I infer here from the pictures in the slides only), disk-based sorting, B-Trees, Hashing, concurrency and transactions as well as recovery.
The course recommends five books: "
File Structures and Analytic Approach", "Transactional Information Systems", "Principles of Database and Knowledge-Base Systems", "Database Management Systems", and "Database System Implementation". None of these books are directly filesystem related. The books match exactly to the lectures, mostly related to the basics shared between databases and storage systems, but nothing directly related to file systems.
The assignments seem to be pretty similar to ours. It seems to consist of multiple assignments about an easy filesystem implementation. However, the assignments are given also in Hebrew, so I don't understand them. I expected more from a Technion course.
University of Wisconsin in Madison:
The advanced storage systems class given at the University of Wisconsin seems to be a nicely structures class with interesting topics: It begins with local storage systems, but moves very quickly (3. topic) to distributed and mobile systems. Then important concepts like reliability and fault tolerance, performance and scalability as well as caching, replication and consistency are discussed. The reading material is a nice list of now classics like the
WAFL paper, the
AutoRAID paper, the
GoogleFS and
MapReduce, but also
Row Diagonal Parity and the
"soft update" paper.
What universities are missing:
The University of California, Berkeley is missing: The home of BSD (and therefore the Fast File System), RAID, and a lot of early work in P2P storage seems to have no course focussed on storage or file systems. I could not find classes in Stanford, Harvard, MIT, and Carnegie Mellon.
Summary
To sum these courses up a bit: Most courses have large amounts of reading material. This is unusual in Germany (or at least at
UPB). I had enough courses (especially in the SE part) without any reading material: We followed this "US style" in our course, but only with 12 papers. Most courses have a project assignment for the students where the students have to come up with an own topic. I really like this, too.
Our own courses
"Our" own storage systems course consists a lecture series with 15 lectures a 90 min and 6 assignments.
The lecture starts very slow, with "Magnetic Storage Systems" (week 1), Disk Scheduling (week 2), an introduction in MEMS and Flash storage (week 3), and RAID (week 4, 5). Next came filesystems (6,7) and storage connection technologies like SCSI (week 8) to SANS (week 9). Network and parallel file systems are treated in week 10 - 12.
The assignments consisted of programming small FUSE filesystem in C (step-by-step).
In the last third of the lecture, the courses treated advanced storage topics that are interesting for our current research project like Long Term Archiving, HPC IO (MPI IO), Contentious Data Protection (CDP), Data Deduplication and P2P Storage.
Last words:
I really liked studying and comparing the storage system lectures. These lecture provide a pretty good overview about the classical (I should call them "essential") research papers of our field and an overview about related books as long as a real storage system course book is missing.
I am impressed that so many universities have "project" assignments where the students have to come up with a topic by themselves. These lectures show want is possible on good (mainly US-) universities, with motivated students, and with the right foundations.
This blog is copied from: http://dirkmeister.blogspot.com/2009/12/storage-system-and-file-system-courses.html
- PatentTips – EMC Virtual File System
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention generally relates to net ...
- Extension of write anywhere file system layout
A file system layout apportions an underlying physical volume into one or more virtual volumes (vvol ...
- Extensible File System
An extensible file system format for portable storage media is provided. The extensible file system ...
- FUSE and File System
FUSE: File system in USErspace. So what is a file system? A file system maps file paths to file cont ...
- Union File System
目录 Union File System AUFS Docker是如何使用AUFS的 image layer 和 AUFS (docker版本不同可能会有区别,我的是在/var/lib/docker下 ...
- Low-overhead enhancement of reliability of journaled file system using solid state storage and de-duplication
A mechanism is provided in a data processing system for reliable asynchronous solid-state device bas ...
- filebench - File system and storage benchmark - 模拟生成各种各样的应用的负载 - A Model Based File System Workload Generator
兼容posix 接口的文件系统中我们不仅要测试 posix 接口是否兼容.随机读,随机写,顺序读,顺序写等读写模式下的性能.我们还要测试在不同工作负载条件下的文件系统的性能的情况:Filebench ...
- Design and Implementation of the Sun Network File System
Introduction The network file system(NFS) is a client/service application that provides shared file ...
- 谷歌三大核心技术(一)The Google File System中文版
谷歌三大核心技术(一)The Google File System中文版 The Google File System中文版 译者:alex 摘要 我们设计并实现了Google GFS文件系统,一个 ...
随机推荐
- windows取证
工具网站 : http://www.ntsecurity.nu/toolbox/ 命令行历史 :命令行模式 CMD 中使 doskey /history 命令可以显示前面输入的命令情况(例如使用 cl ...
- C#邮件发送(最坑爹的邮箱-QQ邮箱)
最近工作挺清闲的,有空的时候陪妹子出去玩玩,自己看看小说,看看电影,日子过的挺欢乐的,这个星期幡然悔悟,代码才是我的最爱,做点小东西,就写个邮件发送程序.说的邮件发送相信工作过基本上都会用到过,用户注 ...
- Python爬虫实例(三)代理的使用
一些网站会有相应的反爬虫措施,例如很多网站会检测某一段时间某个IP的访问次数,如果访问频率太快以至于看起来不像正常访客,它可能就会会禁止这个IP的访问.所以我们需要设置一些代理服务器,每隔一段时间换一 ...
- 原:wireshare使用技巧收集
/data/local/tcpdump -p -vv -s 0 -w /sdcard/ThinkDrive.pcap 先抓一个pcap的包. 1. 查看所有的链接与流量 统计->对话 这 ...
- 高速基于echarts的大数据可视化
[Author]: kwu 高速基于echarts的大数据可视化,echarts纯粹的js实现的图表工具.高速开发的过程例如以下: 1.引入echarts的依赖js库 <script type= ...
- php composer工具高速使用教程,超级简单
php依赖管理工具.用于处理packages或者libraries.基于单个工程project,在project的vender目录下保存,默认永远不会全局安装. 须要php 5.3.2+,安装资源包时 ...
- 百科知识 英特尔处理器I5 4460和4590有哪些区别
4460是855元 4590是880元 i5 4460与4590CPU主要区别在:1.主频差0.3GHz;;2.最大睿频相差0.5GHz;:3.核显(HD4600)最大动态频率相差0.1GHz ...
- android中实现简单的聊天功能
这个例子只是简单的实现了单机版的聊天功能,自己跟自己聊,啦啦~~ 主要还是展示RecyclerView控件的使用吧~ 参考我之前写的文章: android中RecyclerView控件的使用 andr ...
- 10分钟精通require.js
require.js的诞生,就是为了解决这两个问题:(1)实现js文件的异步加载,避免网页失去响应:(2)管理模块之间的依赖性,便于代码的编写和维护. 实例下载:require.js应用实例 一.re ...
- Ubuntu IntelliJ IDEA 注冊碼與Gradle相關
一.Ubuntu IntelliJ IDEA 注冊碼 在线免费生成IntelliJ IDEA 15.0(16.+)注册码 注冊參考:https://www.iteblog.com/idea/ 依次选择 ...