Formatting HDFS

Working on hadoop, especially on test clusters, I have managed to break my HDFS layer and sometimes with no possible redemption, or at least none that I wanted to invest time in. For whatever other reason sometimes you just want to scratch your HDFS and start anew.

Without going on too much details, which is outside the point of this blog post. HDFS is mainly composed of 2 types of elements:

Namenode: At high level the namenode stores the HDFS namespace, think of it as your file system tree.
Datanode: this is where your data is actually stored

The Namenode: /hadoop/hdfs/namenode/current

All new edits are written to the the edit log and regularly merged out to an FSImage file, for more concise management. An fsimage file represents the file system state after all modifications up to a specific transaction ID. The seen_txid file, has the last seen transaction. VERSION: contains cluster and hdfs IDs.

For a more detailled explanation: Hdfs metadata

The Datanode: /hadoop/hdfs/data/current

In our example we will only focus on VERSIOn very close to the namenode VERSION.

Hdfs non HA formatting

In non HA everything is simple enough.

Stop the HDFS Service
run hadoop namenode -format (as user hdfs)
clear the data directory on all datanodes
restart hdfs

At this point your HDFS layer is empty and if you check the VERSION of namenodes and datanodes they should coincide

Hdfs HA formatting

In HA things get a little more complicated. In HA Standby and Active namenodes have a shared storage managed by the journal node service. HA relies on a failover scenario to swap from StandBy to Active Namenode and as any other system in hadoop this uses zookeeper. As you can see a couple more pieces need to made aware of a formatting action.

The initial steps are very close

Stop the Hdfs service
Start only the journal nodes (as they will need to be made aware of the formatting)
On the first namenode (as user hdfs)
1. hadoop namenode -format
2. hdfs namenode -initializeSharedEdits -force (for the journal nodes)
3. hdfs zkfc -formatZK -force (to force zookeeper to reinitialise)
4. restart that first namenode
On the second namenode
1. hdfs namenode -bootstrapStandby -force (force synch with first namenode)
On every datanode clear the data directory
Restart the HDFS service

This was a very simple step by step guide to formatting. In a later article we will cover actually repairing common errors in HDFS

Formatting HDFS的更多相关文章

HDFS中namenode启动失败
1.环境配置: -1.core-site.xml文件 <configuration> <property> <name>fs.defaultFS</name& ...
Hadoop 2.7.4 HDFS+YRAN HA部署
实验环境主机名称 IP地址角色统一安装目录统一安装用户 sht-sgmhadoopnn-01 172.16.101.55 namenode,resourcemanager /usr/local ...
Hadoop集群-HDFS集群中大数据运维常用的命令总结
Hadoop集群-HDFS集群中大数据运维常用的命令总结作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 本篇博客会简单涉及到滚动编辑,融合镜像文件,目录的空间配额等运维操作简介.话 ...
Hadoop集群(二) HDFS搭建
HDFS只是Hadoop最基本的一个服务,很多其他服务,都是基于HDFS展开的.所以部署一个HDFS集群,是很核心的一个动作,也是大数据平台的开始. 安装Hadoop集群,首先需要有Zookeeper ...
Apache hadoop namenode ha和yarn ha ---HDFS高可用性
HDFS高可用性Hadoop HDFS 的两大问题:NameNode单点:虽然有StandbyNameNode,但是冷备方案,达不到高可用--阶段性的合并edits和fsimage,以缩短集群启动的时 ...
HDFS ha 格式化报错：a shared edits dir must not be specified if HA is not enabled.
错误内容: Formatting using clusterid: CID-19921335-620f-4e72-a056-899702613a6b2019-01-12 07:28:46,986 IN ...
hadoop 2.7.3本地环境运行官方wordcount-基于HDFS
接上篇<hadoop 2.7.3本地环境运行官方wordcount>.继续在本地模式下测试,本次使用hdfs. 2 本地模式使用fs计数wodcount 上面是直接使用的是linux的文件 ...
Hadoop学习之旅二：HDFS
本文基于Hadoop1.X 概述分布式文件系统主要用来解决如下几个问题: 读写大文件加速运算对于某些体积巨大的文件,比如其大小超过了计算机文件系统所能存放的最大限制或者是其大小甚至超过了计算机整 ...
python基础操作以及hdfs操作
目录前言基础操作 hdfs操作总结一.前言作为一个全栈工程师,必须要熟练掌握各种语言...HelloWorld.最近就被"逼着"走向了python开发之路, ...

随机推荐

excel定位函数
在Excel中MATCH函数可以返回指定内容所在的位置,而INDEX又可以根据指定位置查询到位置所对应的数据,各取其优点,我们可以返回指定位置相关联的数据. MATCH函数(返回指定内容所 ...
Plupload 上传详细讲解，Plupload 多实例上传，Plupload多个上传按钮--推荐使用
今天帮朋友解决 Plupload 上传的问题,查了很多资料,资料还是挺全的,但是有点零零散散的,故整理好,合并发出来. 本教程包括: Plupload 上传详细讲. Plupload 多实例上 ...
用Word2007写CSDN博客
目前大部分的博客作者在用Word写博客这件事情上都会遇到以下3个痛点: 1.所有博客平台关闭了文档发布接口,用户无法使用Word,Windows Live Writer等工具来发布博客.使用Word写 ...
使用Git 管理heroku的项目（windows）
此过程与管理github中的项目类似,即是普通的git配置安装 Heroku Toolbelt, 里面包含了 msygit Foreman,以及heroku的命令行界面 1.首先在heroku上新建 ...
HRBUST1311 火影忍者之～忍者村 2017-03-06 16:06 106人阅读评论(0) 收藏
火影忍者之-忍者村忍者村是忍者聚居的村子,相等于国家的军事力量.绝大部分村民都是忍者,有一些忍者会在村内开设书店.餐厅等,不过大部分忍者都是为村子执行任务的忍者,以赚取酬劳,并于战时为国家出战. ...
Oracle Alert - APP-ALR-04108: SQL error ORA-01455
SELECT OD.ORGANIZATION_CODE, TO_CHAR(H.ORDER_NUMBER), --ORACLE ALERT 自动转数字类型最长11位,转字符处理解决APP-ALR-041 ...
IPv4&&IPv6地址结构分析
IPv4套接字地址结构: 套接字都需要有一个指向套接字地址结构的指针作为参数.每个协议簇都定义它自己的套接字地址结构.这些结构的名字均已sockaddr_开头,并以对应每个协议族的唯一后缀结尾. wi ...
Jenkins权限管控
需求: 不同的账号角色进入只能看到自己对应的项目,且只能拥有构建等基本权限. 如wechat用户进入系统只能看到以wechat开头的job(具体匹配什么名称的job,可以设置) 目录: 1.安装插件 ...
ajax +LoadLayer插件实现访问页面跳转loading..
布局页:第一步进行扩展ajax$(function () { $.ajax2 = function (options) {//遮罩 Mask();//jquery 原生ajax $.ajax(opti ...
sql查询优化--数字转换字符串字段
SELECT top 1 pt.* FROM t1where id='20180731223014' SELECT top 1 pt.* FROM t1where id='0180731223014 ...

Formatting HDFS

Formatting HDFS的更多相关文章

随机推荐

热门专题