原文链接:http://blog.madhukaraphatak.com/secondary-namenode---what-it-really-do/

Secondary Namenode is one of the poorly named component in Hadoop. By its name, it gives a sense that its a backup for the Namenode.But in reality its not. Lot of beginners in Hadoop get confused about what exactly SecondaryNamenode does and why its present in HDFS.So in this blog post I try to explain the role of secondary namenode in HDFS.

By its name, you may assume that it has something to do with Namenode and you are right. So before we dig into Secondary Namenode lets see what exactly Namenode does.

Namenode

Namenode holds the meta data for the HDFS like Namespace information, block information etc. When in use, all this information is stored in main memory. But these information also stored in disk for persistence storage.

The above image shows how Name Node stores information in disk.
Two different files are

  1. fsimage - Its the snapshot of the filesystem when namenode started
  2. Edit logs - Its the sequence of changes made to the filesystem after namenode started

Only in the restart of namenode , edit logs are applied to fsimage to get the latest snapshot of the file system. But namenode restart are rare in production clusters which means edit logs can grow very large for the clusters where namenode runs for a long period of time. The following issues we will encounter in this situation.

  1. Editlog become very large , which will be challenging to manage it
  2. Namenode restart takes long time because lot of changes has to be merged
  3. In the case of crash, we will lost huge amount of metadata since fsimage is very old

So to overcome this issues we need a mechanism which will help us reduce the edit log size which is manageable and have up to date fsimage ,so that load on namenode reduces . It’s very similar to Windows Restore point, which will allow us to take snapshot of the OS so that if something goes wrong , we can fallback to the last restore point.

So now we understood NameNode functionality and challenges to keep the meta data up to date.So what is this all have to with Seconadary Namenode?

Secondary Namenode

Secondary Namenode helps to overcome the above issues by taking over responsibility of merging editlogs with fsimage from the namenode.

The above figure shows the working of Secondary Namenode

  1. It gets the edit logs from the namenode in regular intervals and applies to fsimage
  2. Once it has new fsimage, it copies back to namenode
  3. Namenode will use this fsimage for the next restart,which will reduce the startup time

Secondary Namenode whole purpose is to have a checkpoint in HDFS. Its just a helper node for namenode.That’s why it also known as checkpoint node inside the community.

So we now understood all Secondary Namenode does puts a checkpoint in filesystem which will help Namenode to function better. Its not the replacement or backup for the Namenode. So from now on make a habit of calling it as a checkpoint node.

Secondary Namenode - What it really do?的更多相关文章

  1. Secondary NameNode:的作用?

    前言 最近刚接触Hadoop, 一直没有弄明白NameNode和Secondary NameNode的区别和关系.很多人都认为,Secondary NameNode是NameNode的备份,是为了防止 ...

  2. Hadoop之Secondary NameNode

    NameNode存储文件系统的变化作为log追加在本地的一个文件里:这个文件是edits.当一个NameNode启动时,它从一个映像文件:FsImage,读取HDFS的状态,使用来自edits日志文件 ...

  3. 解读Secondary NameNode的功能

    1.概述 最近有朋友问我Secondary NameNode的作用,是不是NameNode的备份?是不是为了防止NameNode的单点问题?确实,刚接触Hadoop,从字面上看,很容易会把Second ...

  4. Secondary NameNode 的作用

    https://blog.csdn.net/xh16319/article/details/31375197 很多人都认为,Secondary NameNode是NameNode的备份,是为了防止Na ...

  5. (转)Secondary NameNode的作用

    在Hadoop中,有一些命名不好的模块,Secondary NameNode是其中之一.从它的名字上看,它给人的感觉就像是NameNode的备份.但它实际上却不是.很多Hadoop的初学者都很疑惑,S ...

  6. 010 secondary namenode(同步元数据和日志)

    1.格式化 首先格式化之后只剩下一个根目录. 格式化后会出现元数据 集群启动之后,元数据放在内存中的(消耗内存中) 格式化后会产生镜像文件fsimage,元数据存储 启动的时候namenode会读取镜 ...

  7. Secondary NameNode究竟是做什么的

    Secondary NameNode:它究竟有什么作用? 在hadoop中,有一些命名不好的模块,Secondary NameNode是其中之一.从它的名字上看,它给人的感觉就像是NameNode的备 ...

  8. Hadoop- NameNode和Secondary NameNode元数据管理机制

    元数据的存储机制 A.内存中有一份完整的元数据(内存meta data) B.磁盘有一个“准完整”的元数据镜像(fsimage)文件(在namenode的工作目录中) C.用于衔接内存metadata ...

  9. NameNode && Secondary NameNode工作机制

    NameNode && Secondary NameNode工作机制 1)工作流程 2)  fsimage和edits NameNode是HDFS的大脑,它维护着整个文件系统的目录树, ...

随机推荐

  1. Remoting接口测试工具

    动手写一个Remoting接口测试工具 基于.NET开发分布式系统,经常用到Remoting技术.在测试驱动开发流行的今天,如果针对分布式系统中的每个Remoting接口的每个方法都要写详细的测试脚本 ...

  2. SQL计算年代差

    1.用datediff函数 select datediff(yyyy,StuBirthday,getdate())>17 2.用year函数 select (year(getdate()-yea ...

  3. 关于Dictionary字典和List列表

    命名空间System.Collections.Generic中有两个非常重要,而且常用的泛型集合类,它们分别是Dictionary<TKey,TValue>字典和List<T> ...

  4. python int异常 python isdigit

    python int是python把任何类型转换成int类型的方法,但是你如果运用不好的话,会引发异常,但是python的str字符串转换方法运用起来倒是比较安全,它把任何对象转换成字符串类型都不会报 ...

  5. 企业架构研究总结(34)——TOGAF架构内容框架之架构制品(下)

    4.2.31 数据生命周期图(Data Lifecycle Diagram) 数据生命周期图是在业务流程的约束之下对业务数据在其整个生命周期(从概念阶段到最终退出)中对其进行管理的核心部分.数据从本质 ...

  6. golang环境搭建

    golang环境搭建 好久没写博客了,最近加班好厉害,加到自己都觉得不太适合这个行业了,每天头都是沉甸甸的,可惜今年注定不是收获的季节. 最近忙里偷闲在学习nodejs,赶巧看到golang的文章,一 ...

  7. IceMx.Mvc 我的js MVC 框架四、试水植物大战僵尸(雏形版)

    有图有真相 开始 最近老婆在家迷上了植物大战僵尸,每天回去躺床上就玩,有一天居然跟我说冰箱后边爬着好几只僵尸,当时我就惊呆了,后来才知道她是在说蟑螂,我去. 闲言少叙,书归正传,这是一个雏形,没有在界 ...

  8. MySQL能够承受上亿万条的数据量的架构

    MySQL能够承受上亿万条的数据量的架构 最近做的搜索引擎的数据量是越来越大估计了下在中国可能涉及到的1Kw的数据量,就全球来说也就是1K亿而已,最初是用的数据库是MySQL现在来说要做些优化,最终使 ...

  9. 在路由器上搭建SVN服务器

    在路由器上搭建SVN服务器 SVN托管服务大家都不陌生了,我最早开始用的是谷歌提供的SVN,因为在上面托管的项目都是开源的,所以当有些项目不方便在网上公开的时候,就需要自己搭建SVN服务器了.wind ...

  10. JIT动态编译器的原理与实现之Interpreter3

    JIT动态编译器的原理与实现之Interpreter(解释器)的实现(三) 接下来,就是要实现一个虚拟机了.记得编码高质量的代码中有一条:不要过早地优化你的代码.所以,也本着循序渐进的原则,我将从实现 ...