一:HDFS 用户指导
1.hdfs的牛逼特性
- Hadoop, including HDFS, is well suited for distributed storage and distributed processing using commodity hardware. It is fault tolerant, scalable, and extremely simple to expand. MapReduce, well known for its simplicity and applicability for large set of distributed applications, is an integral part of Hadoop. 分布式存储
- HDFS is highly configurable with a default configuration well suited for many installations. Most of the time, configuration needs to be tuned only for very large clusters. 适当的配置
- Hadoop is written in Java and is supported on all major platforms. 平台适应性
- Hadoop supports shell-like commands to interact with HDFS directly. shell-like的操作方式
- The NameNode and Datanodes have built in web servers that makes it easy to check current status of the cluster. 内置web服务,方便检查集群
- New features and improvements are regularly implemented in HDFS. The following is a subset of useful features in HDFS:
- File permissions and authentication. 文件权限验证
- Rack awareness: to take a node's physical location into account while scheduling tasks and allocating storage.
- Safemode: an administrative mode for maintenance. 安全模式,用于运维
- fsck: a utility to diagnose health of the file system, to find missing files or blocks. 检查文件系统的工具,发现丢失的文件或者块
- fetchdt: a utility to fetch DelegationToken and store it in a file on the local system.
- Balancer: tool to balance the cluster when the data is unevenly distributed among DataNodes.
- Upgrade and rollback: after a software upgrade, it is possible to rollback to HDFS' state before the upgrade in case of unexpected problems.
- Secondary NameNode: performs periodic checkpoints of the namespace and helps keep the size of file containing log of HDFS modifications within certain limits at the NameNode.
- Checkpoint node: performs periodic checkpoints of the namespace and helps minimize the size of the log stored at the NameNode containing changes to the HDFS. Replaces the role previously filled by the Secondary NameNode, though is not yet battle hardened. The NameNode allows multiple Checkpoint nodes simultaneously, as long as there are no Backup nodes registered with the system.
- Backup node: An extension to the Checkpoint node. In addition to checkpointing it also receives a stream of edits from the NameNode and maintains its own in-memory copy of the namespace, which is always in sync with the active NameNode namespace state. Only one Backup node may be registered with the NameNode at once.
- -report: reports basic statistics of HDFS. Some of this information is also available on the NameNode front page. 报告状态
- -safemode: though usually not required, an administrator can manually enter or leave Safemode. 开启安全模式
- -finalizeUpgrade: removes previous backup of the cluster made during last upgrade. 删除上次集群更新时的备份
- -refreshNodes: Updates the namenode with the set of datanodes allowed to connect to the namenode. Namenodes re-read datanode hostnames in the file defined bydfs.hosts, dfs.hosts.exclude. Hosts defined in dfs.hosts are the datanodes that are part of the cluster. If there are entries in dfs.hosts, only the hosts in it are allowed to register with the namenode. Entries in dfs.hosts.exclude are datanodes that need to be decommissioned. Datanodes complete decommissioning when all the replicas from them are replicated to other datanodes. Decommissioned nodes are not automatically shutdown and are not chosen for writing for new replicas.
- -printTopology : Print the topology of the cluster. Display a tree of racks and datanodes attached to the tracks as viewed by the NameNode. 打印拓扑
- dfs.namenode.checkpoint.period, set to 1 hour by default, specifies the maximum delay between two consecutive checkpoints, and
- dfs.namenode.checkpoint.txns, set to 1 million by default, defines the number of uncheckpointed transactions on the NameNode which will force an urgent checkpoint, even if the checkpoint period has not been reached.
- Policy to keep one of the replicas of a block on the same node as the node that is writing the block. 在当前读写的节点中保存一个数据备份。
- Need to spread different replicas of a block across the racks so that cluster can survive loss of whole rack. 保存数据分布到各个机架,可以允许整个机架的丢失
- One of the replicas is usually placed on the same rack as the node writing to the file so that cross-rack network I/O is reduced.
- Spread HDFS data uniformly across the DataNodes in the cluster.
<wiz_tmp_tag id="wiz-table-range-border" contenteditable="false" style="display: none;">
一:HDFS 用户指导的更多相关文章
- [技术翻译]Guava-libraries(一): 用户指导
用户指导 本文翻译自http://code.google.com/p/guava-libraries/wiki/GuavaExplained,由十八子将翻译,发表于博客园 http://www.cnb ...
- iOS-王云鹤 APP首次启动显示用户指导
这个功能的重点就是在如何判断应用是第一次启动的. 其实很简单 我们只需要在一个类里面写好用户引导页面 基本上都是使用UIScrollView 来实现, 新建一个继承于UIViewController ...
- Hadoop HDFS 用户指南
This document is a starting point for users working with Hadoop Distributed File System (HDFS) eithe ...
- HDFS用户指南
https://hadoop.apache.org/docs/r1.2.1/hdfs_user_guide.html hdfs的一些特征: 1.hadoop 包含hdfs 很适合分布式存储以及分布式处 ...
- hdfs 创建一个新用户
需要给第三方提供hdfs用户,和上传文件的权限 1.需要先在linux 上创建一个普通用户: hn,并修改密码 sudo -u hdfs hadoop fs -mkdir /user/用户名 sudo ...
- Hive Tutorial(上)(Hive 入门指导)
用户指导 Hive 指导 Hive指导 概念 Hive是什么 Hive不是什么 获得和开始 数据单元 类型系统 内置操作符和方法 语言性能 用法和例子(在<下>里面) 概念 Hive是什么 ...
- 相同版本的CDH集群间迁移hdfs以及hbase
前言 由于项目数据安全的需要,这段时间看了下hadoop的distcp的命令使用,不断的纠结的问度娘,度娘告诉我的结果也让我很纠结,都是抄来抄去, 还好在牺牲大量的时间的基础上还终于搞出来了,顺便写这 ...
- 【Hadoop学习】HDFS 短路本地读
Hadoop版本:2.6.0 本文系从官方文档翻译而来,转载请尊重译者的工作,注明以下链接: http://www.cnblogs.com/zhangningbo/p/4146296.html 背景 ...
- 解决从linux本地文件系统上传文件到HDFS时的权限问题
当使用 hadoop fs -put localfile /user/xxx 时提示: put: Permission denied: user=root, access=WRITE, inode=& ...
随机推荐
- C++笔记006:关于类的补充
原创笔记,转载请注明出处! 点击[关注],关注也是一种美德~ 关于类的补充: 类是一个数据类型(固定大小内存块的别名),定义一个类,是一个抽象的概念,不会给你分配内存,用数据类型定义变量的时候,才会分 ...
- acm--1006
Problem Description The three hands of the clock are rotating every second and meeting each other ma ...
- 继续深入更新shell脚本容易出错的地方
一.在shell中用到如果需要输入某些值,需要用到read -p命令 这是我写的猜数字游戏,一开始在输出的时候,屏幕上总会打印输出 "INT" 经过反复的练习才发现 双引号后面应 ...
- MySQL慢日志实践
慢日志查询作用 慢日志查询的主要功能就是,记录sql语句中超过设定的时间阈值的查询语句.例如,一条查询sql语句,我们设置的阈值为1s,当这条查询语句的执行时间超过了1s,则将被写入到慢查询配置的日志 ...
- PTA基础编程题目集7-4 BCD解密
BCD数是用一个字节来表达两位十进制的数,每四个比特表示一位.所以如果一个BCD数的十六进制是0x12,它表达的就是十进制的12.但是小明没学过BCD,把所有的BCD数都当作二进制数转换成十进制输出了 ...
- Qt——QScrollArea
1.QScrollArea是否显示滚动条是由一个主要的子控件决定.检查滚动条未显示(1)是否只有一个子控件(2)是否设置 setWidgetResizable(true);,因为这个的本质是QWidg ...
- Matlab R2018a版离线使用帮助文档方法
转载自:Matlab R2018a版离线使用帮助文档方法 问题 Matlab R2018a版本安装后,帮助文档默认为在线方式,需要使用账号登录,如果没有激活密钥或许可证编号,就无法使用帮助文档了. 方 ...
- ssm中需要注意的问题
1.在controller中需要加注解 @Controller @RequestMapping("url") @Autowired private CardService card ...
- Making AJAX Applications Crawlable
https://developers.google.com/webmasters/ajax-crawling/
- SpringBoot-04:SpringBoot在idea中的俩种创建方式
------------吾亦无他,唯手熟尔,谦卑若愚,好学若饥------------- 创建SpringBoot工程有很多种方式,我只讲俩种最为常见的 一,依托springboot官网提供的模板.( ...