rsync 文件校验及同步原理

参考:http://rsync.samba.org/how-rsync-works.html

我们关注的是其发送与接收校验文件的算法,这里附上原文和我老婆(^_^)的翻译:

The Sender

The sender process reads the file index numbers and associated block checksum sets one at a time from the generator.

发送进程一次从生成器读取一个文件索引号和关联的块校验集合

For each file id the generator sends it will store the block checksums and build a hash index of them for rapid lookup.

 对于生成器发送的每个文件ID,它会存储数据块校验和并生成它们的哈希索引,以进行快速查找 。

Then the local file is read and a checksum is generated for the block beginning with the first byte of the local file. This block checksum is looked for in the set that was sent by the generator, and if no match is found, the non-matching byte will be appended to the non-matching data and the block starting at the next byte will be compared. This is what is referred to as the “rolling checksum”

 然后会读取本地文件,并为以本地文件的第一个字节开头的数据块生成校验和。此数据块校验和在由生成器发送的集中查找,如果未找到匹配,则会将非匹配字节附加到非匹配数据,并且会比较以下一字节开头的数据块。 这称为“rolling checksum” .

If a block checksum match is found it is considered a matching block and any accumulated non-matching data will be sent to the receiver followed by the offset and length in the receiver's file of the matching block and the block checksum generator will be advanced to the next byte after the matching block.

 如果找到数据块校验和匹配,则会将它视为匹配块,所有累积的非匹配数据将被加上在接收端的文件中的匹配数据块的偏移量和长度之后发送到接收端,并且数据块校验和生成器将提前到匹配块之后的下一字节。

Matching blocks can be identified in this way even if the blocks are reordered or at different offsets. This process is the very heart of the rsync algorithm.

可以以这种方式标识匹配块,即使重新排列数据块的顺序或数据块的偏移量不同。此过程是 rsync 算法的核心。

In this way, the sender will give the receiver instructions for how to reconstruct the source file into a new destination file. These instructions detail all the matching data that can be copied from the basis file (if one exists for the transfe), and includes any raw data that was not available locally. At the end of each file's processing a whole-file checksum is sent and the sender proceeds with the next file.

Generating the rolling checksums and searching for matches in the checksum set sent by the generator require a good deal of CPU power. Of all the rsync processes it is the sender that is the most CPU intensive.

The Receiver

The receiver will read from the sender data for each file identified by the file index number. It will open the local file (called the basis) and will create a temporary file.

The receiver will expect to read non-matched data and/or to match records all in sequence for the final file contents. When non-matched data is read it will be written to the temp-file. When a block match record is received the receiver will seek to the block offset in the basis file and copy the block to the temp-file. In this way the temp-file is built from beginning to end.

The file's checksum is generated as the temp-file is built. At the end of the file, this checksum is compared with the file checksum from the sender. If the file checksums do not match the temp-file is deleted. If the file fails once it will be reprocessed in a second phase, and if it fails twice an error is reported.

After the temp-file has been completed, its ownership and permissions and modification time are set. It is then renamed to replace the basis file.

Copying data from the basis file to the temp-file make the receiver the most disk intensive of all the rsync processes. Small files may still be in disk cache mitigating this but for large files the cache may thrash as the generator has moved on to other files and there is further latency caused by the sender. As data is read possibly at random from one file and written to another, if the working set is larger than the disk cache, then what is called a seek storm can occur, further hurting performance.

将数据从基础文件复制到临时文件会使receiver在所有rsync进程中最耗磁盘。小文件可以仍处于缓解此作用的磁盘缓存中,但对于大型文件,由于生成器已移动到其他文件,并且存在sender引起的进一步延迟,缓存可能会"抖动"(thrash)。 数据可能从一个文件随机读取,写入另一文件,如果工作集大于磁盘缓存,则会发生"寻道风暴"(seek storm),进一步影响性能。

看到这儿可能还是一头雾水,好吧,刚bing一下,前面已经有人栽树了,有图有真相:

http://coolshell.cn/articles/7425.html#more-7425

本文为原创内容,转载请注明出自 梁小白博客(http://biangbiang.cnblogs.com)
 
分类: Linux开源软件
标签: rsync

rsync 文件校验及同步原理的更多相关文章

  1. rsync 文件校验及同步原理及rsync server配置

    参考:http://rsync.samba.org/how-rsync-works.html 我们关注的是其发送与接收校验文件的算法,这里附上原文和我老婆(^_^)的翻译: The Sender Th ...

  2. rsync 文件同步和备份

    rsync 是同步文件的利器,一般用于多个机器之间的文件同步与备份,同时也支持在本地的不同目录之间互相同步文件.在这种场景下,rsync 远比 cp 命令和 ftp 命令更加合适,它只会同步需要更新的 ...

  3. rsync + inotify-tools实现文件的实时同步

    文章摘自:http://lxw66.blog.51cto.com/5547576/1331048 rsync 帮助文档:http://man.linuxde.net/rsync 最近有个想法就是部署一 ...

  4. CentOS系统rsync文件同步 安装配置

    rsync是类unix系统下的数据镜像备份工具,从软件的命名上就可以看出来了——remote sync 它的特性如下: 可以镜像保存整个目录树和文件系统. 可以很容易做到保持原来文件的权限.时间.软硬 ...

  5. sersync + rsync 实现文件的实时同步

    这里有一点要特别注意了,就是在你完成备份之后,先不要把本地的文件都给删除了,先把服务停了之后再删除文件, 因为你已删除,检查到两边不一致,他又会把备份端给删除了.所以特别得注意了.这里吃过一次亏. 还 ...

  6. Rsync文件同步

    Rsync文件同步 本章结构 关于rsync 1.一款增量备份工具,remote sync,远程同步,支持本地复制或者与其他SSH.rsync主机同步,官方网站:http://rsync.samba. ...

  7. inotify 与 rsync文件同步实现与问题

    首先分别介绍inotify 与 rsync的使用,然后用两者实现实时文件同步,最后说一下这样的系统存在什么样的问题. 1. inotify 这个具体使用网上很多,参考 inotify-tools 命令 ...

  8. rsync文件同步详解

    一.  环境和测试说明 rsync(remote sync)是unix及类unix平台下的数据镜像备份软件,它不像FTP那样需要全备份,rsync可以根据数据的变化进行差异备份,从而减少数据流量,提高 ...

  9. Rsync文件同步服务

    Rsync简介 Rsync是一款开源的.快速的.多功能的.可实现全量及增量的本地或远程数据同步备份的优秀工具,适用于Unix/Linux/Windows等多种操作系统. Rsync的特性 支持拷贝特殊 ...

随机推荐

  1. 国籍控件(js源码)

    国籍控件(js源码) 一直苦于没有好的国籍控件可以用,于是抽空写了一个国籍控件,现分享给大家. 主要功能和界面介绍 国籍控件主要支持中文.英文过滤以及键盘上下事件. 源码介绍 国籍控件核心是两个文件, ...

  2. ASP.NET状态服务及session丢失问题解决方案总结

    原文:ASP.NET状态服务及session丢失问题解决方案总结[转载] asp.net Session的实现: asp.net的Session是基于HttpModule技术做的,HttpModule ...

  3. asp.net web api2.0 ajax跨域解决方案

    asp.net web api2.0 ajax跨域解决方案 Web Api的优缺点就不说了,直接说怎么跨域,我搜了一下,主要是有两种.  一,ASP.NET Web API支持JSONP,分两种 1, ...

  4. Kafka的常用管理命令

    1. 查看kafka都有那些topic a. list/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --list --zookeeper test ...

  5. PhpStorm创建Drupal模块项目开发教程(5)

    Drupal项目开发中,问题跟踪器的设置,可以保证信息的交互.是开发中,不可或缺的部分. 接下来,就PhpStorm IDE中,问题跟踪器集成的配置操作就行图文解说. Settings | Tasks ...

  6. Mac OSX系统安装和配置Zend Server 6教程(2)

    继上一节安装好Zend Server 6以后,我们需要修改配置文件.首先修改服务器监听端口.默认的情况下Zend Server 6安装以后的端口是10088.一般开发者使用的都是HTTP默认端口80. ...

  7. 【C#版本详情回顾】C#2.0主要功能列表

    泛型 优点:类型安全/重用代码/提升性能 应用:泛型接口.泛型类.泛型类型参数.泛型方法.泛型事件和泛型委托 命名空间:System.Collections.Generic 特性:泛型约束,defau ...

  8. Unit Of Work-工作单元

    Unit Of Work-工作单元 阅读目录: 概念中的理解 代码中的实现 后记 掀起了你的盖头来,让我看你的眼睛,你的眼睛明又亮呀,好像那水波一模样:掀起了你的盖头来,让我看你的脸儿,看看你的脸儿红 ...

  9. 使用shell+awk完成Hive查询结果格式化输出

    好久不写,一方面是工作原因,有些东西没发直接发,另外的也是习惯给丢了,内因所致.今天是个好日子,走起! btw,实际上这种格式化输出应该不只限于某一种需求,差不多是通用的. 需求: --基本的:当前H ...

  10. Linux系统编程:客户端-服务器用FIFO进行通信

    先放下代码  回来在解释 头文件: clientinfo.h struct CLIENTINFO{ ]; int leftarg; int rightarg; char op; }; typedef ...