HDFS命令概述

HDFS命令涉及两类,一类是hadoop命令,一类是hdfs命令,功能也分为两类,第一类是HDFS文件操作命令,第二类是HDFS管理命令。

二者都是shell命令,真正的命令只有hadoop和hdfs,而无所谓的ls/mv/cp/cat/mkdir…dfs/setQuota/fsck…等命令,后者都是以入参传递给hadoop和hdfs的。

具体实现参考bin/hadoop和bin/hdfs。hadoop族其他命令如yarn,实现机制类似。官方介绍如下:

The File System (FS) shell includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS) as well as other file systems that 
Hadoop supports, such as Local FS, WebHDFS, S3 FS, and others. The FS shell is invoked by:
bin/hadoop fs <args>
All FS shell commands take path URIs as arguments. The URI format is scheme://authority/path. For HDFS the scheme is hdfs, and for the Local FS the scheme is file.
The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenodehost/parent/child or simply as /parent/child (given that your configuration is set to point to hdfs://namenodehost).
Most of the commands in FS shell behave like corresponding Unix commands. Differences are described with each of the commands. Error information is sent to stderr and the output is sent to stdout.
If HDFS is being used, hdfs dfs is a synonym.
Relative paths can be used. For HDFS, the current working directory is the HDFS home directory /user/<username> that often has to be created manually. The HDFS home directory can also be implicitly accessed, e.g., when using the HDFS trash folder, the .Trash directory in the home directory.

HDFS命令实现机制:
输入命令hadoop/hdfs和参数,经过解析之后,通过执行java或者jsvc启动java程序,获取hdfs文件信息或者传入操作指令,实现文件系统管理。

                                              图1. HDFS命令执行交互过程简介

针对hadoop fs和hdsf dfs/dfsadmin命令,hadoop-3.1.1版本具体实现如下:

相关的shell脚本为bin/hadoopbin/hdfs

相关的java类为 org/apache/hadoop/fs/FsShell.java org/apache/hadoop/fs/shell/CommandFactory.java org/apache/hadoop/util/ToolRunner.java  org/apache/hadoop/fs/shell/FsCommand.java  org/apache/hadoop/util/GenericOptionsParser.java 和 org/apache/hadoop/hdfs/tools/DFSAdmin.java 。

其中DFSAdmin.java继承自FsShell类。

FsShell: 是整个hdfs命令的核心,负责各种操作实现类的注册(通过反射实现)和命令的分发执行。
FsCommand: 各种操作命令注册实现。
CommandFactoryL: 命令注册机制的实现。主要通过工厂模式利用反射实现。
GenericOptionsParser.java: 中间转换操作,实际执行命令的仍是FsShell.run()函数。
GenericOptionsParser.java: 负责命令输入参数的解析。
ToolRunner.java: 执行命令操作,实际上是FsShell.run()执行。

1、命令注册

 //org/apache/hadoop/fs/FsShell.java
protected void init() throws IOException {
getConf().setQuietMode(true);
UserGroupInformation.setConfiguration(getConf());
if (commandFactory == null) {
commandFactory = new CommandFactory(getConf());
commandFactory.addObject(new Help(), "-help");
commandFactory.addObject(new Usage(), "-usage");
registerCommands(commandFactory);
}
} protected void registerCommands(CommandFactory factory) {
// TODO: DFSAdmin subclasses FsShell so need to protect the command
// registration. This class should morph into a base class for
// commands, and then this method can be abstract
if (this.getClass().equals(FsShell.class)) {
factory.registerCommands(FsCommand.class);
}
} //org/apache/hadoop/fs/shell/CommandFactory.java
/**
* Invokes "static void registerCommands(CommandFactory)" on the given class.
* This method abstracts the contract between the factory and the command
* class. Do not assume that directly invoking registerCommands on the
* given class will have the same effect.
* @param registrarClass class to allow an opportunity to register
*/
public void registerCommands(Class<?> registrarClass) {
try {
registrarClass.getMethod(
"registerCommands", CommandFactory.class
).invoke(null, this);
} catch (Exception e) {
throw new RuntimeException(StringUtils.stringifyException(e));
}
} //org/apache/hadoop/fs/shell/FsCommand.java
/**
* Register the command classes used by the fs subcommand
* @param factory where to register the class
*/
public static void registerCommands(CommandFactory factory) {
factory.registerCommands(AclCommands.class);
factory.registerCommands(CopyCommands.class);
factory.registerCommands(Count.class);
factory.registerCommands(Delete.class);
factory.registerCommands(Display.class);
factory.registerCommands(Find.class);
factory.registerCommands(FsShellPermissions.class);
factory.registerCommands(FsUsage.class);
factory.registerCommands(Ls.class);
factory.registerCommands(Mkdir.class);
factory.registerCommands(MoveCommands.class);
factory.registerCommands(SetReplication.class);
factory.registerCommands(Stat.class);
factory.registerCommands(Tail.class);
factory.registerCommands(Head.class);
factory.registerCommands(Test.class);
factory.registerCommands(Touch.class);
factory.registerCommands(Truncate.class);
factory.registerCommands(SnapshotCommands.class);
factory.registerCommands(XAttrCommands.class);
}

2、命令解析

  /**
* Parse the user-specified options, get the generic options, and modify
* configuration accordingly.
*
* @param opts Options to use for parsing args.
* @param args User-specified arguments
* @return true if the parse was successful
*/
private boolean parseGeneralOptions(Options opts, String[] args)
throws IOException {
opts = buildGeneralOptions(opts);
CommandLineParser parser = new GnuParser();
boolean parsed = false;
try {
commandLine = parser.parse(opts, preProcessForWindows(args), true);
processGeneralOptions(commandLine);
parsed = true;
} catch(ParseException e) {
LOG.warn("options parsing failed: "+e.getMessage()); HelpFormatter formatter = new HelpFormatter();
formatter.printHelp("general options are: ", opts);
}
return parsed;
}
/**
* Retrieve any left-over non-recognized options and arguments
*
* @return remaining items passed in but not parsed as an array
*/
public String[] getArgs()
{
String[] answer = new String[args.size()]; args.toArray(answer); return answer;
}

3、命令执行

/**
* run
*/
@Override
public int run(String argv[]) throws Exception {
// initialize FsShell
init();
Tracer tracer = new Tracer.Builder("FsShell").
conf(TraceUtils.wrapHadoopConf(SHELL_HTRACE_PREFIX, getConf())).
build();
int exitCode = -1;
if (argv.length < 1) {
printUsage(System.err);
} else {
String cmd = argv[0];
Command instance = null;
try {
instance = commandFactory.getInstance(cmd);
if (instance == null) {
throw new UnknownCommandException();
}
TraceScope scope = tracer.newScope(instance.getCommandName());
if (scope.getSpan() != null) {
String args = StringUtils.join(" ", argv);
if (args.length() > 2048) {
args = args.substring(0, 2048);
}
scope.getSpan().addKVAnnotation("args", args);
}
try {
exitCode = instance.run(Arrays.copyOfRange(argv, 1, argv.length));
} finally {
scope.close();
}
} catch (IllegalArgumentException e) {
if (e.getMessage() == null) {
displayError(cmd, "Null exception message");
e.printStackTrace(System.err);
} else {
displayError(cmd, e.getLocalizedMessage());
}
printUsage(System.err);
if (instance != null) {
printInstanceUsage(System.err, instance);
}
} catch (Exception e) {
// instance.run catches IOE, so something is REALLY wrong if here
LOG.debug("Error", e);
displayError(cmd, "Fatal internal error");
e.printStackTrace(System.err);
}
}
tracer.close();
return exitCode;
}

4、格式化输出

  /** allows stdout to be captured if necessary */
public PrintStream out = System.out;
/** allows stderr to be captured if necessary */
public PrintStream err = System.err;
/** allows the command factory to be used if necessary */
private CommandFactory commandFactory = null;

HDFS命令实现分析的更多相关文章

  1. HDFS源码分析数据块校验之DataBlockScanner

    DataBlockScanner是运行在数据节点DataNode上的一个后台线程.它为所有的块池管理块扫描.针对每个块池,一个BlockPoolSliceScanner对象将会被创建,其运行在一个单独 ...

  2. HDFS源码分析心跳汇报之数据块汇报

    在<HDFS源码分析心跳汇报之数据块增量汇报>一文中,我们详细介绍了数据块增量汇报的内容,了解到它是时间间隔更长的正常数据块汇报周期内一个smaller的数据块汇报,它负责将DataNod ...

  3. HDFS源码分析心跳汇报之BPServiceActor工作线程运行流程

    在<HDFS源码分析心跳汇报之数据结构初始化>一文中,我们了解到HDFS心跳相关的BlockPoolManager.BPOfferService.BPServiceActor三者之间的关系 ...

  4. HDFS源码分析心跳汇报之数据块增量汇报

    在<HDFS源码分析心跳汇报之BPServiceActor工作线程运行流程>一文中,我们详细了解了数据节点DataNode周期性发送心跳给名字节点NameNode的BPServiceAct ...

  5. HDfs命令

    HDFS命令分为用户命令(dfs,fsck等),管理员命令(dfsadmn,namenode,datanode等) hdfs -ls -lsr 执行lsr 是递归显示 drwxr-xr-x -hado ...

  6. sodu 命令场景分析

    摘自:http://www.cnblogs.com/hazir/p/sudo_command.html sudo 命令情景分析   Linux 下使用 sudo 命令,可以让普通用户也能执行一些或者全 ...

  7. 4-linux、hdfs命令

    定义: linux:Linux是一套免费使用和自由传播的类Unix操作系统,是一个基于POSIX和UNIX的多用户.多任务.支持多线程和多CPU的 操作系统.它能运行主要的UNIX工具软件.应用程序和 ...

  8. hdfs命令get或者put提示找不到目录或文件

    今天用hdfs命令出现个诡异情况: hadoop fs -put a.txt /user/root/ put: `a.txt': No such file or directory 用get命令存在相 ...

  9. HDFS 命令大全

    目录 概要 用户命令 dfs 命令 追加文件内容 查看文件内容 得到文件的校验信息 修改用户组 修改文件权限 修改文件所属用户 本地拷贝到 hdfs hdfs 拷贝到本地 获取目录,文件数量及大小 h ...

随机推荐

  1. linux 文件操作命令 touch、cat、more、less、head、tail

    touch /bin/touch 创建空文件 linux 创建文件可以使用特殊符号,/除外 touch test test1 创建了两个文件touch "test test1" 创 ...

  2. 1finally与return、exit()

    public class TestException { public static void main(String[] args) { String[] str = {"1", ...

  3. elipse安装php

    在用eclipse作为PHP的开发IDE工具时,如果下载的Eclipse不带有PHP功能,则需要我们自己来给Eclipse升级.不过也可以下载eclipseForPHP 在Eclipse的help菜单 ...

  4. 全局css,js缓存及更新版本策略

    在当今web世界里,CDN对于加速页面加载速度,提高用户体验起了非常重要的作用.但是问题也带来了:作为开发人员,可能需要不定时的更新部分静态文件,比如对网页的重新设计会涉及到css文件的更新,这时怎么 ...

  5. RHEL7系统管理之资源管理

    1. CGroup(控制群组).slice(切片).scop.service 控制群组(control group)是linux kernel的一项功能, 该功能允许linux对RHEL7中syste ...

  6. 使用CADisplayLink写秒表

    使用CADisplayLink写秒表 效果: 源码: StopWatch.h 与 StopWatch.m // // StopWatch.h // ShowTime // // Created by ...

  7. 铁乐学python_Day40_进程池

    进程之间的数据共享 基于消息传递的并发编程是大势所趋, 即便是使用线程,推荐做法也是将程序设计为大量独立的线程集合,通过消息队列交换数据. 这样极大地减少了对使用锁和其他同步手段的需求,还可以扩展到分 ...

  8. 作业一 制作PC配置 吴昊

  9. firewalld防火墙简单理解总结(一)

    参考文章:https://linux.cn/article-8098-1.html https://linux.cn/article-9073-1.html   #多区域使用示例,重点参考 前言 防火 ...

  10. Outliner大纲式笔记软件介绍

    简介 什么是Outliner An outliner (or outline processor) is a specialized type of word processor used to vi ...