强制DataNode向NameNode上报blocks

-见 2024-10-22 15:49:36 原文

正常情况下，什么时候上报blocks，是由NameNode通过回复心跳响应的方式触发的。

一次机房搬迁中，原机房hadoop版本为2.7.2，新机房版本为2.8.0，采用先扩容再缩容的方式搬迁。由于新旧机房机型不同和磁盘数不同，操作过程搞混过hdfs-site.xml，因为两种不同的机型，hdfs-site.xml不便做到相同，导致了NameNode报大量“missing block”。

然而依据NameNode所报信息，在DataNode能找到那些被标记为“missing”的blocks。修复配置问题后，“missing block”并没有消失。结合DataNode源代码，估计是因为DataNode没有向NameNode上报blocks。

结合DataNode的源代码，发现了HDFS自带的工具triggerBlockReport，它可以强制指定的DataNode向NameNode上报块，使用方法为：

hdfs dfsadmin -triggerBlockReport datanode_host:ipc_port

如：hdfs dfsadmin -triggerBlockReport 192.168.31.35:50020

正常情况下NameNode启动时，会要求DataNode上报一次blocks（通过fullBlockReportLeaseId值来控制），相关源代码如下：

DataNode相关代码（BPServiceActor.java）：

private void offerService() throws Exception {

HeartbeatResponse resp = sendHeartBeat(requestBlockReportLease); // 向NameNode发向心跳

long fullBlockReportLeaseId = resp.getFullBlockReportLeaseId(); // 心跳响应

boolean forceFullBr = scheduler.forceFullBlockReport.getAndSet(false); // triggerBlockReport强制上报仅一次有效

if (forceFullBr) {

LOG.info("Forcing a full block report to " + nnAddr);

}

if ((fullBlockReportLeaseId != 0) || forceFullBr) {

cmds = blockReport(fullBlockReportLeaseId);

fullBlockReportLeaseId = 0;

}

}

// NameNode相关代码（FSNamesystem.java）：

/**

* The given node has reported in. This method should:

* 1) Record the heartbeat, so the datanode isn't timed out

* 2) Adjust usage stats for future block allocation

*

* If a substantial amount of time passed since the last datanode

* heartbeat then request an immediate block report.

*

* @return an array of datanode commands

* @throws IOException

*/

HeartbeatResponse handleHeartbeat(DatanodeRegistration nodeReg,

StorageReport[] reports, long cacheCapacity, long cacheUsed,

int xceiverCount, int xmitsInProgress, int failedVolumes,

VolumeFailureSummary volumeFailureSummary,

boolean requestFullBlockReportLease) throws IOException {

readLock();

try {

//get datanode commands

final int maxTransfer = blockManager.getMaxReplicationStreams() - xmitsInProgress;

DatanodeCommand[] cmds = blockManager.getDatanodeManager().handleHeartbeat(

nodeReg, reports, blockPoolId, cacheCapacity, cacheUsed,

xceiverCount, maxTransfer, failedVolumes, volumeFailureSummary);

long fullBlockReportLeaseId = 0;

if (requestFullBlockReportLease) {

fullBlockReportLeaseId = blockManager.requestBlockReportLeaseId(nodeReg);

}

//create ha status

final NNHAStatusHeartbeat haState = new NNHAStatusHeartbeat(

haContext.getState().getServiceState(),

getFSImage().getCorrectLastAppliedOrWrittenTxId());

return new HeartbeatResponse(cmds, haState, rollingUpgradeInfo, fullBlockReportLeaseId);

} finally {

readUnlock("handleHeartbeat");

}

}

强制DataNode向NameNode上报blocks的更多相关文章

datanode与namenode的通信
在分析DataNode时, 因为DataNode上保存的是数据块, 因此DataNode主要是对数据块进行操作. A. DataNode的主要工作流程1. 客户端和DataNode的通信: 客户端向D ...
Hadoop源码学习笔记(5) ——回顾DataNode和NameNode的类结构
Hadoop源码学习笔记(5) ——回顾DataNode和NameNode的类结构之前我们简要的看过了DataNode的main函数以及整个类的大至,现在结合前面我们研究的线程和RPC,则可以进一步 ...
datanode与namenode的通信原理
在分析DataNode时, 因为DataNode上保存的是数据块, 因此DataNode主要是对数据块进行操作. **A. DataNode的主要工作流程:** 1. 客户端和DataNode的通信: ...
关于hadoop集群下Datanode和Namenode无法访问的解决方案
HDFS架构 HDFS也是按照Master和Slave的结构,分namenode,secondarynamenode,datanode这几个角色. Namenode:是maseter节点,是大领导.管 ...
HDFS原理讲解
简介本文是笔者在学习HDFS的时候的学习笔记整理, 将HDFS的核心功能的原理都整理在这里了. [广告] 如果你喜欢本博客,请点此查看本博客所有文章:http://www.cnblogs.com/x ...
我要进大厂之大数据Hadoop HDFS知识点（2）
01 我们一起学大数据老刘继续分享出Hadoop中的HDFS模块的一些高级知识点,也算是对今天复习的HDFS内容进行一次总结,希望能够给想学大数据的同学一点帮助,也希望能够得到大佬们的批评和指点! ...
NameNode与DataNode的工作原理剖析
NameNode与DataNode的工作原理剖析作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.HDFS写数据流程 >.客户端通过Distributed FileSyst ...
Hadoop的namenode的管理机制，工作机制和datanode的工作原理
HDFS前言: 1) 设计思想分而治之:将大文件.大批量文件,分布式存放在大量服务器上,以便于采取分而治之的方式对海量数据进行运算分析: 2)在大数据系统中作用: 为各类分布式运算框架(如:mapr ...
HDFS【Namenode、SecondaryNamenode、Datanode】
目录一. NameNode和SecondaryNameNode 1.NN和2NN 工作机制 2. NN和2NN中的fsimage.edits分析 3.checkpoint设置 4.namenode故 ...

随机推荐

半吊子的STM32 — SPI通信
全双工,同步串行通信. 一般需要三条线通信: MOSI 主设备发送,从设备接收 MISO 主设备接收,从设备发送 SCLK 时钟线多设备时,多线选取从机: 传输过程中,主从机中的移位寄存器中数据相互 ...
js去除字符串空格(空白符)
使用js去除字符串内所带有空格,有以下三种方法: ( 1 ) replace正则匹配方法去除字符串内所有的空格:str = str.replace(/\s*/g,""); 去除字 ...
移动端input 无法获取焦点的问题
下午遇到一个问题,移动端的input都不能输入了,后来发现是 -webkit-user-select :none ; 在移动端开发中,我们有时有针对性的写一些特殊的重置,比如: * { -webkit ...
ef linq 中判断实体中是否包含某集合
我有一个需求,问题有很多标签,在查询时,需要筛选包含查询标签的一个集合(List<int>),以前的做法是先查询出来符合查询标签条件的标签id的结果集A,再查询问题时,加上判断是否包含该标 ...
PAT 1014 福尔摩斯的约会 (20)（代码+思路）
1014 福尔摩斯的约会 (20)(20 分) 大侦探福尔摩斯接到一张奇怪的字条:"我们约会吧! 3485djDkxh4hhGE 2984akDfkkkkggEdsb s&hgsfd ...
hdu 5455 (2015沈阳网赛简单题) Fang Fang
题目;http://acm.hdu.edu.cn/showproblem.php?pid=5455 题意就是找出所给字符串有多少个满足题目所给条件的子串,重复的也算,坑点是如果有c,f以外的字符也是不 ...
20172306《Java程序设计与数据结构》第十周学习总结
20172306<Java程序设计>第十周学习总结教材学习内容总结本章主要的讲的是集合有关的知识: 1.集合与数据结构 - 集合是一种对象,集合表示一个专用于保存元素的对象,并该对象还 ...
Loadrunner 脚本录制策略
Loadrunner在脚本录制过程中,我们会先后分别碰见init.action.transaction.end.block等概念.本次打算以图文并茂的形式为大家分别讲解. 以下为一个简要的网站操作逻辑 ...
gcc产生类型转换告警
问题背景: 看 https://www.cnblogs.com/sinaxyz/p/4525208.html 这个篇blog时候,发现在应用层代码中,函数 int open_netlink() 中,有 ...
xml配置sql语句