HDFS Snapshots
Overview
HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system. Some common use cases of snapshots are data backup, protection against user errors and disaster recovery.
HDFS 快照是文件系一个时间点的只读的副本。快照可以是部分文件系统,或者整个文件系统。一些场景使用快照的场景是数据备份,防止用户误操作和灾难恢复。
The implementation of HDFS Snapshots is efficient:
- Snapshot creation is instantaneous: the cost is O(1) excluding the inode lookup time.
- Additional memory is used only when modifications are made relative to a snapshot: memory usage is O(M), where M is the number of modified files/directories.
- Blocks in datanodes are not copied: the snapshot files record the block list and the file size. There is no data copying.
- Snapshots do not adversely affect regular HDFS operations: modifications are recorded in reverse chronological order so that the current data can be accessed directly. The snapshot data is computed by subtracting the modifications from the current data.
使用HDFS 快照是高效的:
· 快照创建是瞬间的:成本是0(1)排除查找信息节点的时间 。
· 额外的内存使用仅仅当对快照进行修改时产生:内存使用时0(M),M是修改文件/目录的数量。
· 在datanode中的块不会被拷贝:快照文件记录这些块列表和文件大小。不会产生数据拷贝。
· 快照不会对日常的HDFS操作产生不利的影响:修改被按反向时间排序记录,这样当前数据可以直接的访问。快照数据是由当前数据减去修改数据计算出来的。
Snapshottable Directories
Snapshots can be taken on any directory once the directory has been set as snapshottable. A snapshottable directory is able to accommodate 65,536 simultaneous snapshots. There is no limit on the number of snapshottable directories. Administrators may set any directory to be snapshottable. If there are snapshots in a snapshottable directory, the directory can be neither deleted nor renamed before all the snapshots are deleted.
快照可以产生在任何被设置为snapshottable的目录中。一个snapshottable目录可以同时容纳65536个快照。snapshottable目录没有个数上限,管理员可以设置任意个snapshottable。如果一个snapshottable中存在快照,那么这个目录在删除所有快照之前,不能删除或改名。
Nested snapshottable directories are currently not allowed. In other words, a directory cannot be set to snapshottable if one of its ancestors/descendants is a snapshottable directory.
嵌套的snapshottable目录在现在并不支持。换句话说,如果一个目录的父目录/子目录是一个snapshottable目录的话,那么其不能设置为snapshottable。
Snapshot Paths
For a snapshottable directory, the path component ".snapshot" is used for accessing its snapshots. Suppose /foo is a snapshottable directory, /foo/bar is a file/directory in /foo, and /foo has a snapshot s0. Then, the path
/foo/.snapshot/s0/bar
对于一个snapshottable目录,”.snapshot”组件有利于访问其快照。假设/foo是一个snapshottable目录,/foo/bar是 /foo中的一个文件/目录,/foo有一个快照s0,那么这个路径
/foo/.snapshot/s0/bar
refers to the snapshot copy of /foo/bar. The usual API and CLI can work with the ".snapshot" paths. The following are some examples.
列出一个snapshottable目录中所有的快照:关联到快照副本/foo/bar。一般的API和CLI都可以在”.snapshot”路径上工作。下面是一些例子
- Listing all the snapshots under a snapshottable directory:
- 列出一个snapshottable目录下所有的快照:
hdfs dfs -ls /foo/.snapshot
- Listing the files in snapshot s0:
- 列出在快照s0中的所有文件:
hdfs dfs -ls /foo/.snapshot/s0
- Copying a file from snapshot s0:
- copy一个文件从快照s0:
hdfs dfs -cp -ptopax /foo/.snapshot/s0/bar /tmp
Note that this example uses the preserve option to preserve timestamps, ownership, permission, ACLs and XAttrs.
注意这个例子使用了保存选项来保存时间戳,所有权,权限,ACLS和XAttrs
Upgrading to a version of HDFS with snapshots
The HDFS snapshot feature introduces a new reserved path name used to interact with snapshots: .snapshot. When upgrading from an older version of HDFS, existing paths named .snapshot need to first be renamed or deleted to avoid conflicting with the reserved path. See the upgrade section in the HDFS user guide for more information.
HDFS快照特性引用了一个新的保留路径名,来进行快照交互:.snapshot。当HDFS从一个旧版本升级时,现存的路径名称.snapshot需要首先重命名或者删除,来避免保留路径的冲突。更多详细类容,参考HDFS用户指南升级部分。
Snapshot Operations
Administrator Operations
The operations described in this section require superuser privilege.
本节中描述的操作需要超级用户权限
Allow Snapshots
Allowing snapshots of a directory to be created. If the operation completes successfully, the directory becomes snapshottable.
允许一个快照目录被创建。如果这个操作成功完成,这个目录就变成snapshottable
- Command(命令):
hdfs dfsadmin -allowSnapshot <path>
- Arguments(参数):
|
path |
The path of the snapshottable directory. |
See also the corresponding Java API void allowSnapshot(Path path) in HdfsAdmin.
也可以参考Hdfsadmin中相关JAVA API void allowSnapshot(Path path)。
Disallow Snapshots
Disallowing snapshots of a directory to be created. All snapshots of the directory must be deleted before disallowing snapshots.
禁止快照目录创建。在禁止快照之前目录中的所有快照必须删除。
- Command(命令):
hdfs dfsadmin -disallowSnapshot <path>
- Arguments(参数):
|
path |
The path of the snapshottable directory. |
See also the corresponding Java API void disallowSnapshot(Path path) in HdfsAdmin.
也可以参考Hdfsadmin中相关JAVA API void disallowSnapshot(Path path)。
User Operations
The section describes user operations. Note that HDFS superuser can perform all the operations without satisfying the permission requirement in the individual operations.
本节介绍用户操作。注意HDFS超级用户,可以执行除了个人操作需要满足的安全权限之外的所有操作。
Create Snapshots
Create a snapshot of a snapshottable directory. This operation requires owner privilege of the snapshottable directory.
在snapshottable目录中创建一个一个快照。这个操作需要拥有snapshottabl目录所有者权限。
- Command(命令):
hdfs dfs -createSnapshot <path> [<snapshotName>]
- Arguments(参数):
|
path |
The path of the snapshottable directory. |
|
snapshotName |
The snapshot name, which is an optional argument. When it is omitted, a default name is generated using a timestamp with the format "'s'yyyyMMdd-HHmmss.SSS", e.g. "s20130412-151029.033". |
See also the corresponding Java API Path createSnapshot(Path path) and Path createSnapshot(Path path, String snapshotName) in FileSystem. The snapshot path is returned in these methods.
也可以参考文件系统中相关JAVA API Path createSanpshot(Path path)和Path createSnapshot(Path path,String snapshotName)。在这些方法中返回了快照路径。
Delete Snapshots
Delete a snapshot of from a snapshottable directory. This operation requires owner privilege of the snapshottable directory.
从一个snapshottable目录中删除快照。这个操作需要拥有snapshottabl目录所有者权限。
- Command:
hdfs dfs -deleteSnapshot <path> <snapshotName>
- Arguments:
|
path |
The path of the snapshottable directory. |
|
snapshotName |
The snapshot name. |
See also the corresponding Java API void deleteSnapshot(Path path, String snapshotName) in FileSystem.
Rename Snapshots
Rename a snapshot. This operation requires owner privilege of the snapshottable directory.
重命名一个快照。这个操作需要拥有snapshottabl目录所有者权限。
- Command:
hdfs dfs -renameSnapshot <path> <oldName> <newName>
- Arguments:
|
path |
The path of the snapshottable directory. |
|
oldName |
The old snapshot name. |
|
newName |
The new snapshot name. |
See also the corresponding Java API void renameSnapshot(Path path, String oldName, String newName) in FileSystem.
也可以参考文件系统中相关JAVA API void renameSnapshot(Path path, String oldName, String newName)
Get Snapshottable Directory Listing
Get all the snapshottable directories where the current user has permission to take snapshtos.
获得当前用户有权限产生快照的所有snapshottabl目录
- Command:
hdfs lsSnapshottableDir
- Arguments: none
See also the corresponding Java API SnapshottableDirectoryStatus[] getSnapshottableDirectoryListing() in DistributedFileSystem.
也可以参考分布式文件系统中相关JAVA API SnapshottableDirectoryStatus[] getSnapshottableDirectoryListing()。
Get Snapshots Difference Report
Get the differences between two snapshots. This operation requires read access privilege for all files/directories in both snapshots.
在2个快照之间获得差异。这个操作需要在2个快照中,所有文件/目录的读和访问权限。
- Command:
hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot>
- Arguments:
|
path |
The path of the snapshottable directory. |
|
fromSnapshot |
The name of the starting snapshot. |
|
toSnapshot |
The name of the ending snapshot. |
- Results:
|
+ |
The file/directory has been created. |
|
- |
The file/directory has been deleted. |
|
M |
The file/directory has been modified. |
|
R |
The file/directory has been renamed. |
A RENAME entry indicates a file/directory has been renamed but is still under the same snapshottable directory. A file/directory is reported as deleted if it was renamed to outside of the snapshottble directory. A file/directory renamed from outside of the snapshottble directory is reported as newly created.
一个RENAME提示一个文件/目录被重命名,但是仍然存在相同的snapshottabl目录中。如果一个文件/目录被重命名到snapshottabl目录外,那么会打印为删除。从snapshottabl目录之外重命名进来的文件/目录,被打印为新创建。
The snapshot difference report does not guarantee the same operation sequence. For example, if we rename the directory "/foo" to "/foo2", and then append new data to the file "/foo2/bar", the difference report will be:
快照差异报告不能保证相同操作的顺序。例如,如果我们将目录”/foo”重命名为”/foo2″,然后增加一个新文件为”/foo2/bar”,这个差异报告将是:
R. /foo -> /foo2
M. /foo/bar
I.e., the changes on the files/directories under a renamed directory is reported using the original path before the rename ("/foo/bar" in the above example).
即,在一个目录重命名下的文件/目录 变更,在报告的时候,是使用原来未重命名之前的名称。(例如上面的”/foo/bar”)
See also the corresponding Java API SnapshotDiffReport getSnapshotDiffReport(Path path, String fromSnapshot, String toSnapshot) in DistributedFileSystem.
也可以参考分布式文件系统中相关JAVA API SnapshotDiffReport getSnapshotDiffReport(Path path, String fromSnapshot, String toSnapshot)。
HDFS Snapshots的更多相关文章
- [HDFS Manual] CH8 HDFS Snapshots
HDFS Snapshots HDFS Snapshots 1. 概述 1.1 Snapshottable目录 1.2 快照路径 2. 带快照的更新 3. 快照操作 3.1 管理操作 3.2 用户操作 ...
- 四:HDFS Snapshots
1.介绍 HDFS快照保存某个时间点的文件系统快照,可以是部分的文件系统,也可以是全部的文件系统.快照用来做数据备份和灾备.有以下特点: 1.快照几乎是实时瞬间完成的 2.只有在做快照时文件系统有修改 ...
- Hadoop 2.x HDFS新特性
Hadoop 2.x HDFS新特性 1.HDFS联邦 2. HDFS HA(要用到zookeeper等,留在后面再讲) 3.HDFS快照 回顾: HDFS两层模型 Namespa ...
- HDFS笔记——技术点汇总
目录 · 概况 · 原理 · HDFS 架构 · 块 · NameNode · SecondaryNameNode · fsimage与edits合并 · DataNode · 数据读写 · 容错机制 ...
- 【转载 Hadoop&Spark 动手实践 2】Hadoop2.7.3 HDFS理论与动手实践
简介 HDFS(Hadoop Distributed File System )Hadoop分布式文件系统.是根据google发表的论文翻版的.论文为GFS(Google File System)Go ...
- HDFS 命令大全
目录 概要 用户命令 dfs 命令 追加文件内容 查看文件内容 得到文件的校验信息 修改用户组 修改文件权限 修改文件所属用户 本地拷贝到 hdfs hdfs 拷贝到本地 获取目录,文件数量及大小 h ...
- Hadoop学习笔记—HDFS
目录 搭建安装 三个核心组件 安装 配置环境变量 配置各上述三组件守护进程的相关属性 启停 监控和性能 Hadoop Rack Awareness yarn的NodeManagers监控 命令 hdf ...
- 从零自学Hadoop(10):Hadoop1.x与Hadoop2.x
阅读目录 序 里程碑 Hadoop1.x与Hadoop2.x 系列索引 本文版权归mephisto和博客园共有,欢迎转载,但须保留此段声明,并给出原文链接,谢谢合作. 文章是哥(mephisto)写的 ...
- 从零自学Hadoop(11):Hadoop命令上
阅读目录 序 概述 Hadoop Common Commands User Commands Administration Commands File System Shell 引用 系列索引 本文版 ...
随机推荐
- PMP模拟考试-1
1. A manufacturing project has a schedule performance index (SPI) of 0.89 and a cost performance ind ...
- /var/spool/postfix/maildrop/ 中有大量的文件
今天查看硬盘剩余的容量,发现‘/’目录下占用了大量的空间:可我在这个目录下面没有放什么东西:仔细查看在/var/spool/postfix/maildrop/ 中发现了大量的文件.怎么会有这么多的文件 ...
- Qt生成ui文件对应的.h和.cpp文件
在VS中,可以通过CMake设定QT5_WRAP_UI来编译a.ui到ui_a.h, 要想快速生成a.h和a.cpp,经过尝试,必须使用Qt Creator,否则就手写.
- STL——仿函数(函数对象)
一.仿函数(也叫函数对象)概观 仿函数的作用主要在哪里?从第6章可以看出,STL所提供的各种算法,往往有两个版本,其中一个版本表现出最常用(或最直观)的某种运算,第二个版本则表现出最泛化的演算流程,允 ...
- Andoid数据存储之SQLite数据库
SQLite是一个嵌入式的并且是一个轻量级的数据库: SQLite数据库支持大部分SQL语法, 允许使用SQL语句操作数据库, 其本质是一个文件, 不需要安装启动: SQLite数据库打开只是打开了一 ...
- 启用PAE后虚拟地址到物理地址的转换
34 注册:2013-10 帖子:2013 精华:34 --> [原创]启用PAE后虚拟地址到物理地址的转换 安于此生 2013-11-3 20:54 16073 由常规的两级页表转换得不到 ...
- 【转载】Eclipse智能提示及快捷键
1.java智能提示 (1). 打开Eclipse,选择打开" Window - Preferences". (2). 在目录树上选择"Java-Editor-Conte ...
- 【转载】经典.net面试题目【为了笔试。。。。。】
. 简述 private. protected. public. internal 修饰符的访问权限. 答 . private : 私有成员, 在类的内部才可以访问. protected : 保护成员 ...
- slice 定义和用法
定义和用法 slice() 方法可从已有的数组中返回选定的元素. 语法 arrayObject.slice(start,end) 参数 描述 start 必需.规定从何处开始选取.如果是负数,那么它规 ...
- jQuery事件处理(四)
看了几天,决定整理一下jQuery事件处理的整体设计思路 1.通过add方法给选中的元素注册事件处理程序(通过缓存系统将事件储存到cache,而不是绑定到元素上) a.在存储之前,会为事件处理程序增加 ...