【原创】大数据基础之HDFS(1)HDFS新创建文件如何分配Datanode
HDFS中的File由Block组成,一个File包含一个或多个Block,当创建File时会创建一个Block,然后根据配置的副本数量(默认是3)申请3个Datanode来存放这个Block;
通过hdfs fsck命令可以查看一个文件具体的Block、Datanode、Rack信息,例如:
hdfs fsck /tmp/test.sql -files -blocks -locations -racks
Connecting to namenode via http://name_node:50070
FSCK started by hadoop (auth:SIMPLE) from /client for path /tmp/test.sql at Thu Dec 13 15:44:12 CST 2018
/tmp/test.sql 16 bytes, 1 block(s): OK
0. BP-436366437-name_node-1493982655699:blk_1449692331_378721485 len=16 repl=3 [/DEFAULT/server111:50010, /DEFAULT/server121:50010, /DEFAULT/server43:50010]Status: HEALTHY
Total size: 16 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 16 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 193
Number of racks: 1
FSCK ended at Thu Dec 13 15:44:12 CST 2018 in 1 millisecondsThe filesystem under path '/tmp/test.sql' is HEALTHY
那3个Datanode是如何选择出来的?有一个优先级:
1 当前机架(相对hdfs client而言)
2 远程机架(相对hdfs client而言)
3 另一机架
4 全部随机
然后每个机架能选择几个Datanode(即maxNodesPerRack)有一个计算公式,详见代码
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer
private int findNewDatanode(final DatanodeInfo[] original
) throws IOException {
if (nodes.length != original.length + 1) {
throw new IOException(
new StringBuilder()
.append("Failed to replace a bad datanode on the existing pipeline ")
.append("due to no more good datanodes being available to try. ")
.append("(Nodes: current=").append(Arrays.asList(nodes))
.append(", original=").append(Arrays.asList(original)).append("). ")
.append("The current failed datanode replacement policy is ")
.append(dfsClient.dtpReplaceDatanodeOnFailure).append(", and ")
.append("a client may configure this via '")
.append(DFSConfigKeys.DFS_CLIENT_WRITE_REPLACE_DATANODE_ON_FAILURE_POLICY_KEY)
.append("' in its configuration.")
.toString());
}
注释:当没有找到新的datanode时会报异常,报错如下:
Caused by: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[server82:50010], original=[server.82:50010]).
The current failed datanode replacement policy is ALWAYS, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
private void addDatanode2ExistingPipeline() throws IOException {
...
final DatanodeInfo[] original = nodes;
final LocatedBlock lb = dfsClient.namenode.getAdditionalDatanode(
src, fileId, block, nodes, storageIDs,
failed.toArray(new DatanodeInfo[failed.size()]),
1, dfsClient.clientName);
setPipeline(lb);
//find the new datanode
final int d = findNewDatanode(original);
注释:会调用getAdditionalDatanode方法来获取1个新的datanode,此处略去很多调用堆栈
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault
private DatanodeStorageInfo[] chooseTarget(int numOfReplicas,
Node writer,
List<DatanodeStorageInfo> chosenStorage,
boolean returnChosenNodes,
Set<Node> excludedNodes,
long blocksize,
final BlockStoragePolicy storagePolicy) {
...
int[] result = getMaxNodesPerRack(chosenStorage.size(), numOfReplicas);
numOfReplicas = result[0];
int maxNodesPerRack = result[1];
...
final Node localNode = chooseTarget(numOfReplicas, writer, excludedNodes,
blocksize, maxNodesPerRack, results, avoidStaleNodes, storagePolicy,
EnumSet.noneOf(StorageType.class), results.isEmpty());
注释:此处maxNodesPerRack表示每个机架最多只能分配几个datanode
private Node chooseTarget(int numOfReplicas,
Node writer,
final Set<Node> excludedNodes,
final long blocksize,
final int maxNodesPerRack,
final List<DatanodeStorageInfo> results,
final boolean avoidStaleNodes,
final BlockStoragePolicy storagePolicy,
final EnumSet<StorageType> unavailableStorages,
final boolean newBlock) {
...
if (numOfResults <= 1) {
chooseRemoteRack(1, dn0, excludedNodes, blocksize, maxNodesPerRack,
results, avoidStaleNodes, storageTypes);
if (--numOfReplicas == 0) {
return writer;
}
}
注释:此处会尝试在远程机架(即与已有的datanode不同的机架)获取一个新的datanode
protected void chooseRemoteRack(int numOfReplicas,
DatanodeDescriptor localMachine,
Set<Node> excludedNodes,
long blocksize,
int maxReplicasPerRack,
List<DatanodeStorageInfo> results,
boolean avoidStaleNodes,
EnumMap<StorageType, Integer> storageTypes)
throws NotEnoughReplicasException {
...
chooseRandom(numOfReplicas, "~" + localMachine.getNetworkLocation(),
excludedNodes, blocksize, maxReplicasPerRack, results,
avoidStaleNodes, storageTypes);
注释:此处会在所有可选的datanode中随机选择一个
protected DatanodeStorageInfo chooseRandom(int numOfReplicas,
String scope,
Set<Node> excludedNodes,
long blocksize,
int maxNodesPerRack,
List<DatanodeStorageInfo> results,
boolean avoidStaleNodes,
EnumMap<StorageType, Integer> storageTypes)
throws NotEnoughReplicasException {
...
int numOfAvailableNodes = clusterMap.countNumOfAvailableNodes(
scope, excludedNodes);
...
if (numOfReplicas>0) {
String detail = enableDebugLogging;
if (LOG.isDebugEnabled()) {
if (badTarget && builder != null) {
detail = builder.toString();
builder.setLength(0);
} else {
detail = "";
}
}
throw new NotEnoughReplicasException(detail);
}
注释:如果由于一些原因(比如节点磁盘满或者下线),导致numOfAvailableNodes计算结果为0,会抛出NotEnoughReplicasException
其中maxNodesPerRack计算逻辑如下:
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault
/**
* Calculate the maximum number of replicas to allocate per rack. It also
* limits the total number of replicas to the total number of nodes in the
* cluster. Caller should adjust the replica count to the return value.
*
* @param numOfChosen The number of already chosen nodes.
* @param numOfReplicas The number of additional nodes to allocate.
* @return integer array. Index 0: The number of nodes allowed to allocate
* in addition to already chosen nodes.
* Index 1: The maximum allowed number of nodes per rack. This
* is independent of the number of chosen nodes, as it is calculated
* using the target number of replicas.
*/
private int[] getMaxNodesPerRack(int numOfChosen, int numOfReplicas) {
int clusterSize = clusterMap.getNumOfLeaves();
int totalNumOfReplicas = numOfChosen + numOfReplicas;
if (totalNumOfReplicas > clusterSize) {
numOfReplicas -= (totalNumOfReplicas-clusterSize);
totalNumOfReplicas = clusterSize;
}
// No calculation needed when there is only one rack or picking one node.
int numOfRacks = clusterMap.getNumOfRacks();
if (numOfRacks == 1 || totalNumOfReplicas <= 1) {
return new int[] {numOfReplicas, totalNumOfReplicas};
}
int maxNodesPerRack = (totalNumOfReplicas-1)/numOfRacks + 2;
// At this point, there are more than one racks and more than one replicas
// to store. Avoid all replicas being in the same rack.
//
// maxNodesPerRack has the following properties at this stage.
// 1) maxNodesPerRack >= 2
// 2) (maxNodesPerRack-1) * numOfRacks > totalNumOfReplicas
// when numOfRacks > 1
//
// Thus, the following adjustment will still result in a value that forces
// multi-rack allocation and gives enough number of total nodes.
if (maxNodesPerRack == totalNumOfReplicas) {
maxNodesPerRack--;
}
return new int[] {numOfReplicas, maxNodesPerRack};
}
注释:
int maxNodesPerRack = (totalNumOfReplicas-1)/numOfRacks + 2;
if (maxNodesPerRack == totalNumOfReplicas) {
maxNodesPerRack--;
}
【原创】大数据基础之HDFS(1)HDFS新创建文件如何分配Datanode的更多相关文章
- 大数据学习(一)-------- HDFS
需要精通java开发,有一定linux基础. 1.简介 大数据就是对海量数据进行数据挖掘. 已经有了很多框架方便使用,常用的有hadoop,storm,spark,flink等,辅助框架hive,ka ...
- 【原创】大数据基础之Zookeeper(2)源代码解析
核心枚举 public enum ServerState { LOOKING, FOLLOWING, LEADING, OBSERVING; } zookeeper服务器状态:刚启动LOOKING,f ...
- 【原创】大数据基础之Kerberos(2)hive impala hdfs访问
1 hive # kadmin.local -q 'ktadd -k /tmp/hive3.keytab -norandkey hive/server03@TEST.COM'# kinit -kt / ...
- 大数据基础总结---HDFS分布式文件系统
HDFS分布式文件系统 文件系统的基本概述 文件系统定义:文件系统是一种存储和组织计算机数据的方法,它使得对其访问和查找变得容易. 文件名:在文件系统中,文件名是用于定位存储位置. 元数据(Metad ...
- 大数据技术之Hadoop(HDFS)
第1章 HDFS概述 1.1 HDFS产出背景及定义 1.2 HDFS优缺点 1.3 HDFS组成架构 1.4 HDFS文件块大小(面试重点) 第2章 HDFS的Shell操作(开发重点) 1.基本语 ...
- 大数据学习(02)——HDFS入门
Hadoop模块 提到大数据,Hadoop是一个绕不开的话题,我们来看看Hadoop本身包含哪些模块. Common是基础模块,这个是必须用的.剩下常用的就是HDFS和YARN. MapReduce现 ...
- 【原创】大数据基础之Impala(1)简介、安装、使用
impala2.12 官方:http://impala.apache.org/ 一 简介 Apache Impala is the open source, native analytic datab ...
- 大数据学习之旅1——HDFS版本演化
最近开始学习大数据,发现大数据有很多很多组件,我现在负责的是HDFS(Hadoop分布式储存系统)的学习,整理了一下HDFS的版本情况.因为HDFS是Hadoop的重要组成部分,所以有关HDFS的版本 ...
- 大数据之路week07--day01(HDFS学习,Java代码操作HDFS,将HDFS文件内容存入到Mysql)
一.HDFS概述 数据量越来越多,在一个操作系统管辖的范围存不下了,那么就分配到更多的操作系统管理的磁盘中,但是不方便管理和维护,因此迫切需要一种系统来管理多台机器上的文件,这就是分布式文件管理系统 ...
随机推荐
- Java 208 道面试题:第一模块答案
目前市面上的面试题存在两大问题:第一,题目太旧好久没有更新了,还都停留在 2010 年之前的状态:第二,近几年 JDK 更新和发布都很快,Java 的用法也变了不少,加上 Java 技术栈也加入了很多 ...
- Asp.Net Core使用NLog+Mysql的几个小问题
项目中使用NLog记录日志,很好用,之前一直放在文本文件中,准备放到db中,方便查询. 项目使用了Mysql,所以日志也放到Mysql上,安装NLog不用说,接着你需要安装Mysql.Data安装包: ...
- JS/JQuery 设置input等标签设置和取消只读属性
<input type="text" id="HouseName" value="" align="left"/& ...
- spring cloud实战与思考(三) 微服务之间通过fiegn上传一组文件(下)
需求场景: 用户调用微服务1的接口上传一组图片和对应的描述信息.微服务1处理后,再将这组图片上传给微服务2进行处理.各个微服务能区分开不同的图片进行不同处理. 上一篇博客已经讨论了在微服务之间传递一组 ...
- Linux 学习 (十一) 软件安装管理
Linux软件安装管理 学习笔记 软件包简介 软件包分类: 源码包 :脚本安装包 二进制包(RPM 包.系统默认包) 源码包的优点: 开源,如果有足够的能力,可以修改源代码 可以自由选择所需的功能 软 ...
- hdu-5786(补图最短路)
题意:给你n个点,m条无向边,问你这n个点构成的完全图,不用那m条边,由一个s出现的单源最短路 解题思路:首先,暴力建图不行,点太多,那么我们就按照它的规则来,把m条边建好,但是建的这个图表示不走的方 ...
- 前端js区域上下拖拽
先说说需求吧,网页内又上下两个区域,需要做到的功能是,第一个区域A底部的边可以进行拖拽使得区域变大或变小,同时第二个区域B跟着拖动的变化进行自适应. 思路: 1.使用一个假的div定义为那条可进行拖拽 ...
- MySQL——修改一个表的自增值
语句 alter table <table name> auto_increment=<value>; 示例 mysql; Query OK, rows affected (0 ...
- 第十九节、基于传统图像处理的目标检测与识别(词袋模型BOW+SVM附代码)
在上一节.我们已经介绍了使用HOG和SVM实现目标检测和识别,这一节我们将介绍使用词袋模型BOW和SVM实现目标检测和识别. 一 词袋介绍 词袋模型(Bag-Of-Word)的概念最初不是针对计算机视 ...
- 三步解决fiddler升级后https无法通过证书验证问题
有时候使用fiddler时,https页面会出现错误提示,我们可以这样设置来避免错误 第一步:去掉https的抓取 Tools>Option 去掉Capture HTTPS CONNECTs 的 ...