Understanding NFS Caching

Filesystem caching is a great tool for improving performance, but it is important to balance performance with data safety. Caching over NFS involves caches at several different levels, so it is not immediately obvious which combination of options ensures a good compromise between performance and safety.

Client-side caching

the NFS client has the async mount option, which caches writes in the client's RAM until certain conditions are met: delays sending application writes to the server until any of these events occur:

The NFS client treats the sync mount option differently than some other file systems (refer to mount(8) for a description of the generic sync and async mount options). If neither sync nor async is specified (or if the async option is specified), the NFS client delays sending application writes to the server until any of these events occur:

  • Memory pressure forces reclamation of system memory resources.
  • An application flushes file data explicitly with sync(2), msync(2), or fsync(3).
  • An application closes a file with close(2).
  • The file is locked/unlocked via fcntl(2).

In other words, under normal circumstances, data written by an
application may not immediately appear on the server that hosts the
file.

If the sync option is specified on a mount point, any system call that
writes data to files on that mount point causes that data to
be flushed to the server before the system call returns control to user
space. This provides greater data cache coherence among
clients, but at a significant performance cost.

See nfs(5) for more details. In other words, when writing data to a file or set of files, rather
than flush to the server on each write(2) call, the system waits until the file is closed or
the application expliticly calls fsync(3) or another sync function. Since you're relying on the
application correctly request its data to be synced, I was concerned about relying on this cache
in a general circumstance, when potentially poorly-written applications could be never syncing
their data. However, given that close(2) causes the data to be synced, this seems like a non-issue,
and asking on the linux-nfs mailing list clarified in more detail how this works:

In NFSv3, the close() will cause the client to flush all data to stable storage.
The client will also flush data to stable storage on a chmod, since
that could potentially affect its ability to write back the data. It
will not bother to do so for rename.
An application should normally be able to rely on the data being
safely on disk in both these situations provided that the server
honours the NFS protocol (with a caveat that an ill-timed 'kill -9'
could interrupt the process of flushing).

All metadata operations such as create, chmod, rename, etc. will cause
the server to flush the file metadata to disk assuming that you set
the (highly recommended) "sync" export option. If "sync" is set, the
server will also honour COMMIT requests by flushing the data to stable
storage.

If, OTOH, your server lists the "async" export option as being set,
then COMMIT is considered a no-op, and it will not bother to
explicitly flush metadata operations to stable storage. Performance
will scream, but be prepared to lose data if that server crashes. This
is all technically a violation of the NFS spec, however you have been
given rope...

Therefore, using async on the client is safe and will provide a pretty significant performance boost.

It's also important to look at soft verses hard mounts. A soft mount will give up attempting to write to
a server that is unavailable after a specific timeout and number of retries. In my experience, this hasn't
worked well and I often end up with processes stuck in uninterruptable sleep blocking on an NFS mountpoint
anyway. As per the manpage, hard is highly recommended to ensure data integrity:

Determines the recovery behavior of the NFS client after an NFS request times out. If neither option is specified (or
if the hard option is specified), NFS requests are retried indefinitely. If the soft option is specified, then the
NFS client fails an NFS request after retrans retransmissions have been sent, causing the NFS client to return an
error to the calling application.

NB: A so-called "soft" timeout can cause silent data corruption in certain cases. As such, use the soft option only
when client responsiveness is more important than data integrity. Using NFS over TCP or increasing the value of the
retrans option may mitigate some of the risks of using the soft option.

Note that the intr option allows you to interrupt a request waiting on a hard NFS mount by sending it
the SIGKILL signal. However, on kernels newer than 2.6.25 this is provided by default, and the intr
option is deprecated. You should still be aware of it though in case you are working with an older kernel.

Given my poor experience using soft (the timeouts don't seem to actually work) and the increased
risk of data loss, hard seems like the most appropriate option to use. The common problem mentioned
with using hard is if the server goes away (e.g hardware failure and it is down for an extended period
of time), there used to be no way to unmount that mountpoint or let processes blocking on it complete. There
are now a few ways to mitigate this:

  • bring up a fake NFS server on the same IP address as the offline server, which can then reject the requests that are waiting
    for a response. I've even seen this done with a local ethernet alias interface, e.g ifconfig eth0:nfstmp <server ip>/32 up
  • use the fsid=<unique number> on the server side in /etc/exports. This creates a static unique identifier for the export,
    so you won't get a "Stale NFS File Handle" error on the client if the server is restarted or goes offline. These ID numbers
    must be unique and be greater than 1, since 1 is used by NFSv4 as the root export.
  • try "lazy" unmounting the mountpoint with umount -l /path/to/mountpoint

If the above fails to work, you will probably have to reboot the client in order to clear the stuck mountpoint.

Server-side caching

Confusingly, the NFS server options (found in /etc/exports) are also called sync and async, see exports(5) for details:

async This option allows the NFS server to violate the NFS protocol
and reply to requests before any changes made by that request
have been committed to stable storage (e.g. disc drive).

   Using  this  option usually improves performance, but at the cost that an unclean server restart (i.e. a crash) can cause data
to be lost or corrupted.

sync Reply to requests only after the changes have been committed to stable storage (see async above).

   In releases of nfs-utils up to and including 1.0.0, the async option was the default.  In all releases after  1.0.0,  sync  is
the default, and async must be explicitly requested if needed. To help make system administrators aware of this change,
exportfs will issue a warning if neither sync nor async is specified.

Thus if you use async on the server side, the data will be confirmed to be written as soon as it hits the server's RAM. In the case of a power failure, this data would be lost. Conversely, sync waits for the data to be written to the disk or other stable storage (and confirmed) before returning a success. It is clear that sync is the appropriate option to use on the server side.

Recommended Options

In conclusion, these options seem to provide a good balance of stability and performance when using NFS:

  • Client Side:

    • hard - forces requests to retry indefinitely to avoid corruption
    • intr - this allows hard mounts to be interrupted (though is unnecessary on kernels newer than 2.6.25)
    • async - queue up writes and flush them in logical groups for more efficient writing
    • tcp - using TCP is more reliable than UDP since it requires confirmation of receipt of packets
  • Server Side:
    • fsid - specifies a unique, static identifier for this export; see above for more details
    • sync - ensures that data is really flushed to stable storage when the server says it is

https://avidandrew.com/understanding-nfs-caching.html

Understanding NFS Caching的更多相关文章

  1. Code First :使用Entity. Framework编程(7) ----转发 收藏

    第7章 高级概念 The Code First modeling functionality that you have seen so far should be enough to get you ...

  2. [翻译]Understanding Weak References(理解弱引用)

    原文 Understanding Weak References Posted by enicholas on May 4, 2006 at 5:06 PM PDT 译文 我面试的这几个人怎么这么渣啊 ...

  3. Understanding Virtual Memory

    Understanding Virtual Memory by Norm Murray and Neil Horman Introduction Definitions The Life of a P ...

  4. Understanding Weak References

    Understanding Weak References Posted by enicholas on May 4, 2006 at 5:06 PM PDT Some time ago I was ...

  5. The Guide To Understanding mysqlreport

    The Guide To Understanding mysqlreport This guide to understanding mysqlreport explains everything t ...

  6. Hibernate: Truly Understanding the Second-Level and Query Caches--reference

    I've written multiple articles here at Javalobby on the Hibernate O/R mapping tool, and they are usu ...

  7. UNDERSTANDING VOLATILE VIA EXAMPLE--reference

    We have spent last couple of months stabilizing the lock detection functionality in Plumbr. During t ...

  8. 正式生产环境下hadoop集群的DNS+NFS+ssh免password登陆配置

    博客地址:http://www.loveweir.com/ 环境虚拟机centos6.5 主机名h1  IP 192.168.137.11  作为DNS FNS的server 主机名h2  IP 19 ...

  9. nfs error

    mount -t nfs 10.173.55.154:/oradata /oradatamount: wrong fs type, bad option, bad superblock on 10.1 ...

随机推荐

  1. ue4 htcvivi简单配置

    1 主角视口:相机与控制器挂载 Chaperone设置,主角bp上加上SteamVRChaperone用于提示用户可用区域边界 MotionController1里面选项设置为Right右手,然后下边 ...

  2. uva11584 Partitioning by Palindromes

    题目大意: 给出一个字符串,把他划分成尽量少的回文串,问最少的回文串个数 /* 先预处理所有回文子串 dp[i]表示字符1~i划分成的最小回文串的个数 */ #include<iostream& ...

  3. 剑指Offer的学习笔记(C#篇)-- 二叉树的深度(详讲递归)

    题目描述 输入一棵二叉树,求该树的深度.从根结点到叶结点依次经过的结点(含根.叶结点)形成树的一条路径,最长路径的长度为树的深度. 一 . 思维发散 借助这个题目,我想用一个更好理解的方法说一说递归. ...

  4. 今天是 Java 诞生日,Java 24 岁了!

    今天是 Java 诞生日,Java 今年 24 岁了,比栈长还年轻..还有得搞,别慌!作为一名Java语言的学习者,对Java的起源和发展有个大概的了解应是必要的. 1991年,Sun公司成立Gree ...

  5. C 语言实例 - 判断字母

    C 语言实例 - 判断字母 C 语言实例 C 语言实例 用户输入一个字符,判断该字符是否为一个字母. 实例 #include <stdio.h> int main() { char c; ...

  6. QDU第一届程序设计大赛——E到I题解法(非官方题解)

    题目链接https://qduoj.com/contest/28/problems,密码:qdu1230 E题: 思路:先进行排序,然后去暴力模拟就可以,但可能WA了几次,导致此题没解出来,有点可惜 ...

  7. DMA性能测试

    本程序主要用来计算DMA数据读写过程中所花费的总得时间周期,依据公式T=tStart+ceil(L/4)*2+ceil(L/256)*tTransform*2 因为tTransform是一个常量(通常 ...

  8. 记录一个修改application.properties时遇到的坑

    有一个需求是会频繁修改配置文件中的常量,为了方便就会用unzip解压war包,修改propertites中的值后重新打war 包,部署,但是发现修改的值没有起作用,,一直在纠结...后来发现其实在编译 ...

  9. HDU 1029 一道微软面试题

    http://acm.hdu.edu.cn/showproblem.php?pid=1029 给定一个数组,其中有一个相同的数字是出现了大于等于(n + 1) / 2次的.要求找出来. 1.明显排序后 ...

  10. Aspose.word直接转pdf

    using System; using System.Collections.Generic; using System.Linq; using System.Web; using System.We ...