I researched a lot about storage system classes given at good universities this year. This had two reasons: The first was this
post of a researcher at NetApp, about the missing of a good storage or file system class book and secondly our
own storage systems class where I was the TA.
In this post I want to give a short overview about the various different courses, their focus, and other things. Please note, the following text might contain errors or misconceptions on my part. I also might have missed other storage courses at these universities.
University of California, Santa Cruz:
Let's begin with the course of the University of California in Santa Cruz. Storage is a huge at UCSC with the Storage Systems Research Center that partners with nearly very everyone. The
ceph file system and the
crush hash function are two outcomes of their research.
The course consists of a series of lectures (two per week), lots of reading material, and a project. The lectures are about file systems beginning with uniprocessor filesystems, performance analysis and (very fast) to distributed filesystems. They also cover fault tolerance and other advanced topics. Their reading material consists of 37 papers from classics like "
File System Design for an NFS File System Appliance" to state of the art research papers like "
An Analysis of Data Corruption in the Storage Stack" (FAST 2008) that come about two weeks before.
I miss some important basics that IMHO are important for understanding storage system design, like properties of modern hard disks and I am not that into archival storage (my boss is), but it is a really good designed course. Unfortunately, the lecture slides are not available online.
Columbia University, New York
I may have missed one, but the last storage related course at Columbia University had been in 2004 by Kostas Magoutis. The course is focused on network storage and probably relies on basics from an Operating Systems class or a basis storage class. The lectures had been one per week with one to three papers are reading material per week.
Really nice is that the lecturer has posted notes how the read the papers with questions and annotations to some of the material. Interestingly, data deduplication is covered with the LBFS, the Venti paper, and Henson'sCompare-By-Hash papers.
There are three books recommended for the course "UNIX Internals (1996)", "The Design and the Implementation of the 4.4 BSD Operating System (1996)" and "NFS Illustrated (1999)".
Cornell University, New York
Advanced Distributed Storage Systems, Spring 2009:
At the Cornel University, I found the course and advanced distributed storage systems by Hakim Westherspoon (has taken part in the
OceanStore project). The lectures, given two per week, handle "Cloud Computing, "Network File Systems", the important topics of Consistency, Availability, Replication, and Scalability.
I think the major strength of this course is that it seems to focus much more than the other courses and the important concepts needed for storage system design, implementation and research than the focus on standards, products, and storage management issues. The major weakness is that the individual lectures are very focused on the research papers, whose content is presented. Even to the point that there is no single presentation scheme. I think the overall consistency of the lecture is weakened this way.
One interesting aspect of the course is that the students have to write and hand-in short summaries of the reading material papers consisting a summary (3-4 sentences), two or three major strength points, two or three weaknesses and one question of future work that should be followed in the option of the student.
The have to projects as part of the course: In the first the students have to develop a distributed file system based on Amazon Web Service infrastructure. the second is a research project, the students have to come up with by themselves.
John Hopkins University
Storage Systems, Fall 2007:
At the John Hopkins University -- where our professors of Christian Scheideler and my advisor Andre Brinkmann (as visiting PhD student) had formerly been -- I found the Storage Systems course by Randal Burns.
As usual the course consists of a lecture series (2 lectures as 50min per week), home works, and a project. I like that the course some basics like disk drive architecture that a essential to understand the design of storage systems. On the other side it is a bit short on distributed file systems.
University of Notre Dame:
The University of Notre Dame offered in 2005 the course "Distributed Storage" by Surendar Chandra.
As usual the course consists of a series of lectures (2 per week) and a project. The lectures topics are "Naming and location", "Consistency and Replication", "Distributed Storage Management", "Security", "Peer-to-Peer Storage and Sensors", and "Energy Management". The reading material consists of not less than 40 papers. My impression is that the collection of reading material differs much from the material of the other courses covered here, e.g. the well-known "classical" papers are not linked.
Technion:
Technion is the "Israel Institute of Technology" in Haifa and I said
before: I am pretty envy to the students there. However, not especially because of the "Filesystems" course.
The lecture series consists of an short introduction on disk drive architecture, RAID, sequential data processing on tapes (hey, I infer here from the pictures in the slides only), disk-based sorting, B-Trees, Hashing, concurrency and transactions as well as recovery.
The course recommends five books: "
File Structures and Analytic Approach", "Transactional Information Systems", "Principles of Database and Knowledge-Base Systems", "Database Management Systems", and "Database System Implementation". None of these books are directly filesystem related. The books match exactly to the lectures, mostly related to the basics shared between databases and storage systems, but nothing directly related to file systems.
The assignments seem to be pretty similar to ours. It seems to consist of multiple assignments about an easy filesystem implementation. However, the assignments are given also in Hebrew, so I don't understand them. I expected more from a Technion course.
University of Wisconsin in Madison:
The advanced storage systems class given at the University of Wisconsin seems to be a nicely structures class with interesting topics: It begins with local storage systems, but moves very quickly (3. topic) to distributed and mobile systems. Then important concepts like reliability and fault tolerance, performance and scalability as well as caching, replication and consistency are discussed. The reading material is a nice list of now classics like the
WAFL paper, the
AutoRAID paper, the
GoogleFS and
MapReduce, but also
Row Diagonal Parity and the
"soft update" paper.
What universities are missing:
The University of California, Berkeley is missing: The home of BSD (and therefore the Fast File System), RAID, and a lot of early work in P2P storage seems to have no course focussed on storage or file systems. I could not find classes in Stanford, Harvard, MIT, and Carnegie Mellon.
Summary
To sum these courses up a bit: Most courses have large amounts of reading material. This is unusual in Germany (or at least at
UPB). I had enough courses (especially in the SE part) without any reading material: We followed this "US style" in our course, but only with 12 papers. Most courses have a project assignment for the students where the students have to come up with an own topic. I really like this, too.
Our own courses
"Our" own storage systems course consists a lecture series with 15 lectures a 90 min and 6 assignments.
The lecture starts very slow, with "Magnetic Storage Systems" (week 1), Disk Scheduling (week 2), an introduction in MEMS and Flash storage (week 3), and RAID (week 4, 5). Next came filesystems (6,7) and storage connection technologies like SCSI (week 8) to SANS (week 9). Network and parallel file systems are treated in week 10 - 12.
The assignments consisted of programming small FUSE filesystem in C (step-by-step).
In the last third of the lecture, the courses treated advanced storage topics that are interesting for our current research project like Long Term Archiving, HPC IO (MPI IO), Contentious Data Protection (CDP), Data Deduplication and P2P Storage.
Last words:
I really liked studying and comparing the storage system lectures. These lecture provide a pretty good overview about the classical (I should call them "essential") research papers of our field and an overview about related books as long as a real storage system course book is missing.
I am impressed that so many universities have "project" assignments where the students have to come up with a topic by themselves. These lectures show want is possible on good (mainly US-) universities, with motivated students, and with the right foundations.
This blog is copied from: http://dirkmeister.blogspot.com/2009/12/storage-system-and-file-system-courses.html
- PatentTips – EMC Virtual File System
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention generally relates to net ...
- Extension of write anywhere file system layout
A file system layout apportions an underlying physical volume into one or more virtual volumes (vvol ...
- Extensible File System
An extensible file system format for portable storage media is provided. The extensible file system ...
- FUSE and File System
FUSE: File system in USErspace. So what is a file system? A file system maps file paths to file cont ...
- Union File System
目录 Union File System AUFS Docker是如何使用AUFS的 image layer 和 AUFS (docker版本不同可能会有区别,我的是在/var/lib/docker下 ...
- Low-overhead enhancement of reliability of journaled file system using solid state storage and de-duplication
A mechanism is provided in a data processing system for reliable asynchronous solid-state device bas ...
- filebench - File system and storage benchmark - 模拟生成各种各样的应用的负载 - A Model Based File System Workload Generator
兼容posix 接口的文件系统中我们不仅要测试 posix 接口是否兼容.随机读,随机写,顺序读,顺序写等读写模式下的性能.我们还要测试在不同工作负载条件下的文件系统的性能的情况:Filebench ...
- Design and Implementation of the Sun Network File System
Introduction The network file system(NFS) is a client/service application that provides shared file ...
- 谷歌三大核心技术(一)The Google File System中文版
谷歌三大核心技术(一)The Google File System中文版 The Google File System中文版 译者:alex 摘要 我们设计并实现了Google GFS文件系统,一个 ...
随机推荐
- Distinct Subsequences leetcode java
题目: Given a string S and a string T, count the number of distinct subsequences of T in S. A subseque ...
- Sudoku Solver leetcode java
题目: Write a program to solve a Sudoku puzzle by filling the empty cells. Empty cells are indicated b ...
- Centos安装FTP服务器和配置
安装 yum install vsftpd 启动/重启/关闭 /sbin/service vsftpd start /sbin/service vsftpd restart /sbin/service ...
- 转:pytorch版的bilstm+crf实现sequence label
http://blog.csdn.net/appleml/article/details/78664824 在理解CRF的时候费了一些功夫,将一些难以理解的地方稍微做了下标注,隔三差五看看加强记忆, ...
- Syntax error missing ; before *
[问题] I have a header file like so: #pragma once #include "gamestate.h" #include "Ex ...
- Masonry应用【美图秀秀首页界面自动布局】
Masonry在此实现时候,并没有比NSLayoutConstraint简单,相反我觉得还不如NSLayoutConstraint. [self.topView mas_makeConstraints ...
- redis 安装报错
CentOS 6.5 安装 Redis 执行 make #error "Newer version of jemalloc required" 根据你系统安装时或之后安装的选项的情 ...
- MATLAB中的集合运算
matlab里关于集合运算和二进制数的运算的函数 intersect:集合交集ismember :是否集合中元素setdiff :集合差集setxor :集合异或(不在交集中的元素)union :两个 ...
- MyCat - 使用篇
Mycat水平拆分之十种分片规则: http://www.cnblogs.com/756623607-zhang/p/6656022.html 数据库路由中间件MyCat - 使用篇(5) 配置MyC ...
- asp.net 常用于客户端注册的机器信息
项目需要:根据客户端信息去获取用户登录信息 1.根据客户端信息,并查询数据库是否有匹配.如果没有则重新插入客户端信息: 2.根据客户端的设置提交用户登录信息,用户登录成功后,查询以前是否有过配置信息, ...