为什么想用英文写了?我获取知识、技术的大部分途径都是通过英文,所以按照自己的理解用英文写下来也比较容易,另外,很多term都是不能翻译的,如果要持续学习技术和知识,那就不但要习惯去阅读,听,还要写,说。可惜从IBM出来后,很少有机会和人去说了,只能写了。就当提高自己英文水平吧

I am going to explain how Secondary Name Node in this article. It is not called SNN any more but Checkpoint node which is more precise.

FS Image & Edit log

Edit log is an append-only file which stores the metadata changes for durability. When client issues a command like  'create file','chmod' and 'rename' , NN will write this operation ( the command itself and the data) to Edit log as an edit entry / log which is uniquely identified by an ID , we called transaction ID as each operation is considered a transaction. Transacttion ID is an auto-increment variable.  NN will also apply this to the in-memory medatadata structure. The form of the entry looks like below. During run time, NN does not write directly to fsimage as writing to fsimage is a time-consuming operation.

tranID: 7

put

'/xx/xx/log.txt',rwx------, ricky, ricky, 1024, $timestamp.....


tranID: 8

chmod 777

'/xx/xx/log.txt'

FS Image is the metadata file of HDFS namespace in NN . It is always outdated because NN does not write it frequently. Actually, when NN starts up, it will read FS image and merge Edit log to generate a new FS Image, the old one and Edit logs will be deleted. Imaging if there are many edit entries accumulated along the time(months), it will take much time to starup NN next time as there are too many entries to merge with the fsimage.  Also considering this situaion, NN is down due to power outage, it tried to restore / catch up the newest state during startup by merging fsimage and too many edit logs and user is expecting it is up ASAP~~~.

Why SNN / Checkpoint and how it works

I already pointed one of the reason above. Another reason is merging fs image with edit logs is expensive operation(I/O, CPU intensive) which may restrict the client accessing to the NN. So, we will use a seperate machine as Checkpoint node to hand over the load to. So, NN remains active all the time.

How it works?

The fsimage file name contains the transaction id. For example, fsimage_7, means this image file has all the transaction up to transaction id 7. Similarly,  The current in-use edit log file is edits_in_progress_$tranID(edits_in_progress_21) which means transaction whose ID aboves 21 will be stored in this file. Old edit log file will be renamed to edits_8-20 and no write to it any more. So, what happens when checkpointing?

  1. CPN gets the known most recent transaction id from current in-use edit log which is 21.
  2. CPN gets the fs image transaction id which is 7.
  3. CPN  will download the edit logs whose name is edit_8-20.
  4. CPN will check if it has the fs image file whose transaction is 7(it usually has).
  5. CPN will do the checkpointing by merging the fsimage_7 and edits_8-20,  generating new fsimage_20.
  6. CPN will send the fsimage_20 to NN . so, both of them has the newly generated fsimage_20. Next time, CNP will not download it from NN.

When the checkpoint get triggered? according to official doc, either of the 2 meets will trigger checkpointing.

  • dfs.namenode.checkpoint.period, set to 1 hour by default, specifies the maximum delay between two consecutive checkpoints, and
  • dfs.namenode.checkpoint.txns, set to 1 million by default, defines the number of uncheckpointed transactions on the NameNode which will force an urgent checkpoint, even if the checkpoint period has not been reached.

Please note that, CPN does not keep in-memory metadata and does not have block locations which are from block reports sent by DNs.

Backup Node

See here. It looks to me that Backup Node is better than CPN.

HDFS essay 2 - Clarify Name Node / Checkpoint Node/ Backup Node的更多相关文章

  1. Node.js入门:Node.js&NPM的安装与配置

    Node.js安装与配置      Node.js已经诞生两年有余,由于一直处于快速开发中,过去的一些安装配置介绍多数针对0.4.x版本而言的,并非适合最新的0.6.x的版本情况了,对此,我们将在0. ...

  2. 【Node.js】利用node.js搭建服务器并访问静态网页

    node.js是一门服务端的语言,下面讲讲如何利用node.js提供给我们的api来搭建服务器,并且访问静态网页 项目结构如下 ------------------------------------ ...

  3. 【总文档】rac增加新节点的方法步骤 How to Add Node/Instance or Remove Node/Instance in 10gR2, 11gR1, 11gR2 and 12c Oracle Clusterware and RAC

    [总文档]How to Add Node/Instance or Remove Node/Instance in 10gR2, 11gR1, 11gR2 and 12c Oracle Clusterw ...

  4. 安装nvm之后node不可用,“node”不是内部或外部命令,也不是可运行的程序或批处理文件(ng)

    安装nvm: 1.下载nvm压缩包地址:https://github.com/coreybutler/nvm-windows/releases 2.下载后解压在目标文件夹中,我这里是H:\applic ...

  5. node.js&pm2搭建node生产环境

    node.js下载地址https://nodejs.org/en/download/stable/ 下载截图 建议采用稳定编译过的版本,source code稍麻烦,编译过的直接可用,安装超级简单,红 ...

  6. Node.js-sublime text3 配置node.js(ERROR: The process "node.exe" not found.)

    默认已经安装好sublime.node和npm 1.sublime的node.js插件下载 由于在package control上经常下载失败,所以这里直接从GitHub上进行下载! GitHub下载 ...

  7. AutoIt with XML: Add a child/grandchild node or remove any node

    Sometimes, we have to use AutoIt script to edit an xml, add a node or remove a node, to make some de ...

  8. 树的遍历 迭代算法——思路:初始化stack,pop stack利用pop的node,push new node to stack,可以考虑迭代一颗树 因为后序遍历最后还要要访问根结点一次,所以要访问根结点两次是难点

    144. Binary Tree Preorder Traversal Given a binary tree, return the preorder traversal of its nodes' ...

  9. 初学node node开发环境搭建 node模块化 commonJS原理

    由于Node.js平台是在后端运行JavaScript代码,所以,必须首先在本机安装Node环境. 学习node,首先要装node,和它的包管理工具,这两个都是傻瓜式安装,百度一下就安装了. 安装完之 ...

随机推荐

  1. easy-im:一款基于netty的即时通讯系统

    介绍 easy-im是面向开发者的一款轻量级.开箱即用的即时通讯系统,帮助开发者快速搭建消息推送等功能. 基于easy-im,你可以快速实现以下功能: + 聊天软件 + IoT消息推送 基本用法 项目 ...

  2. 修改本机默认的jdk版本

    因为开发需要使用多个jdk,在修改jdk版本时遇到了一些问题 在系统变量的%JAVA_HOME%中修改了jdk的路径,但是重启后java -version版本并没有改变. 在网上找到一篇文章,修改了注 ...

  3. 10.31课程.this指向

    作用域: 浏览器给js的生存环境(栈). 作用域链: js中的关键字例如var.function...都可以提前声明,然后js由上到下逐级执行,有就使用,没有就在它的父级元素中查找.这就叫做作用域链. ...

  4. wso2 ei 6.4.0安装笔记

    目的:将最新版(6.4.0)部署在linux服务器,与Api Manager部署在同一环境 环境: Centos 7.3 Jdk 8 Mysql 5.7 问题一: 将H2替换为Mysql5.7数据库时 ...

  5. Python入门 —— 04字符串解析

    字符串 -字符串是 Python 中最常用的数据类型.(可以说是大多数语言都常用) 1. 创建字符串 ( '' 或 "" 和 '''''')(单,双和三引号)(字符串可以为空) - ...

  6. ASP.NET MVC4.0 后台获取不大前台传来的file

    <td>选择图片</td> <td> <input type="file" id="uploadImg" name=& ...

  7. laravel5.5源码阅读草稿——入口

    laravel的启动需要通过路由.中间件.控制器.模型.视图最后出现在浏览器.而路由.中间件.模型,这些功能都有自己的类,比如Route::any().DB::table().$this->mi ...

  8. python3 练习题100例 (九)

    题目九:题目:古典问题:有一对兔子,从出生后第3个月起每个月都生一对兔子,小兔子长到第三个月后每个月又生一对兔子,假如兔子都不死,问每个月的兔子总数为多少?程序分析:兔子的规律为数列1,1,2,3,5 ...

  9. 爬虫-爬虫介绍及Scrapy简介

    在编写案例之前首先理解几个问题,1:什么是爬虫2:为什么说python是门友好的爬虫语言?3:选用哪种框架编写爬虫程序 一:什么是爬虫? 爬虫 webSpider 也称之为网络蜘蛛,是使用一段编写好的 ...

  10. Codevs1332_上白泽慧音_KEY

    题目传送门 裸的模板题,Tarjan求联通量.  code: #include <cstdio> #include <vector> #define min(a,b) a< ...