HDFS命令概述

HDFS命令涉及两类,一类是hadoop命令,一类是hdfs命令,功能也分为两类,第一类是HDFS文件操作命令,第二类是HDFS管理命令。

二者都是shell命令,真正的命令只有hadoop和hdfs,而无所谓的ls/mv/cp/cat/mkdir…dfs/setQuota/fsck…等命令,后者都是以入参传递给hadoop和hdfs的。

具体实现参考bin/hadoop和bin/hdfs。hadoop族其他命令如yarn,实现机制类似。官方介绍如下:

The File System (FS) shell includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS) as well as other file systems that 
Hadoop supports, such as Local FS, WebHDFS, S3 FS, and others. The FS shell is invoked by:
bin/hadoop fs <args>
All FS shell commands take path URIs as arguments. The URI format is scheme://authority/path. For HDFS the scheme is hdfs, and for the Local FS the scheme is file.
The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenodehost/parent/child or simply as /parent/child (given that your configuration is set to point to hdfs://namenodehost).
Most of the commands in FS shell behave like corresponding Unix commands. Differences are described with each of the commands. Error information is sent to stderr and the output is sent to stdout.
If HDFS is being used, hdfs dfs is a synonym.
Relative paths can be used. For HDFS, the current working directory is the HDFS home directory /user/<username> that often has to be created manually. The HDFS home directory can also be implicitly accessed, e.g., when using the HDFS trash folder, the .Trash directory in the home directory.

HDFS命令实现机制:
输入命令hadoop/hdfs和参数,经过解析之后,通过执行java或者jsvc启动java程序,获取hdfs文件信息或者传入操作指令,实现文件系统管理。

                                              图1. HDFS命令执行交互过程简介

针对hadoop fs和hdsf dfs/dfsadmin命令,hadoop-3.1.1版本具体实现如下:

相关的shell脚本为bin/hadoopbin/hdfs

相关的java类为 org/apache/hadoop/fs/FsShell.java org/apache/hadoop/fs/shell/CommandFactory.java org/apache/hadoop/util/ToolRunner.java  org/apache/hadoop/fs/shell/FsCommand.java  org/apache/hadoop/util/GenericOptionsParser.java 和 org/apache/hadoop/hdfs/tools/DFSAdmin.java 。

其中DFSAdmin.java继承自FsShell类。

FsShell: 是整个hdfs命令的核心,负责各种操作实现类的注册(通过反射实现)和命令的分发执行。
FsCommand: 各种操作命令注册实现。
CommandFactoryL: 命令注册机制的实现。主要通过工厂模式利用反射实现。
GenericOptionsParser.java: 中间转换操作,实际执行命令的仍是FsShell.run()函数。
GenericOptionsParser.java: 负责命令输入参数的解析。
ToolRunner.java: 执行命令操作,实际上是FsShell.run()执行。

1、命令注册

 //org/apache/hadoop/fs/FsShell.java
protected void init() throws IOException {
getConf().setQuietMode(true);
UserGroupInformation.setConfiguration(getConf());
if (commandFactory == null) {
commandFactory = new CommandFactory(getConf());
commandFactory.addObject(new Help(), "-help");
commandFactory.addObject(new Usage(), "-usage");
registerCommands(commandFactory);
}
} protected void registerCommands(CommandFactory factory) {
// TODO: DFSAdmin subclasses FsShell so need to protect the command
// registration. This class should morph into a base class for
// commands, and then this method can be abstract
if (this.getClass().equals(FsShell.class)) {
factory.registerCommands(FsCommand.class);
}
} //org/apache/hadoop/fs/shell/CommandFactory.java
/**
* Invokes "static void registerCommands(CommandFactory)" on the given class.
* This method abstracts the contract between the factory and the command
* class. Do not assume that directly invoking registerCommands on the
* given class will have the same effect.
* @param registrarClass class to allow an opportunity to register
*/
public void registerCommands(Class<?> registrarClass) {
try {
registrarClass.getMethod(
"registerCommands", CommandFactory.class
).invoke(null, this);
} catch (Exception e) {
throw new RuntimeException(StringUtils.stringifyException(e));
}
} //org/apache/hadoop/fs/shell/FsCommand.java
/**
* Register the command classes used by the fs subcommand
* @param factory where to register the class
*/
public static void registerCommands(CommandFactory factory) {
factory.registerCommands(AclCommands.class);
factory.registerCommands(CopyCommands.class);
factory.registerCommands(Count.class);
factory.registerCommands(Delete.class);
factory.registerCommands(Display.class);
factory.registerCommands(Find.class);
factory.registerCommands(FsShellPermissions.class);
factory.registerCommands(FsUsage.class);
factory.registerCommands(Ls.class);
factory.registerCommands(Mkdir.class);
factory.registerCommands(MoveCommands.class);
factory.registerCommands(SetReplication.class);
factory.registerCommands(Stat.class);
factory.registerCommands(Tail.class);
factory.registerCommands(Head.class);
factory.registerCommands(Test.class);
factory.registerCommands(Touch.class);
factory.registerCommands(Truncate.class);
factory.registerCommands(SnapshotCommands.class);
factory.registerCommands(XAttrCommands.class);
}

2、命令解析

  /**
* Parse the user-specified options, get the generic options, and modify
* configuration accordingly.
*
* @param opts Options to use for parsing args.
* @param args User-specified arguments
* @return true if the parse was successful
*/
private boolean parseGeneralOptions(Options opts, String[] args)
throws IOException {
opts = buildGeneralOptions(opts);
CommandLineParser parser = new GnuParser();
boolean parsed = false;
try {
commandLine = parser.parse(opts, preProcessForWindows(args), true);
processGeneralOptions(commandLine);
parsed = true;
} catch(ParseException e) {
LOG.warn("options parsing failed: "+e.getMessage()); HelpFormatter formatter = new HelpFormatter();
formatter.printHelp("general options are: ", opts);
}
return parsed;
}
/**
* Retrieve any left-over non-recognized options and arguments
*
* @return remaining items passed in but not parsed as an array
*/
public String[] getArgs()
{
String[] answer = new String[args.size()]; args.toArray(answer); return answer;
}

3、命令执行

/**
* run
*/
@Override
public int run(String argv[]) throws Exception {
// initialize FsShell
init();
Tracer tracer = new Tracer.Builder("FsShell").
conf(TraceUtils.wrapHadoopConf(SHELL_HTRACE_PREFIX, getConf())).
build();
int exitCode = -1;
if (argv.length < 1) {
printUsage(System.err);
} else {
String cmd = argv[0];
Command instance = null;
try {
instance = commandFactory.getInstance(cmd);
if (instance == null) {
throw new UnknownCommandException();
}
TraceScope scope = tracer.newScope(instance.getCommandName());
if (scope.getSpan() != null) {
String args = StringUtils.join(" ", argv);
if (args.length() > 2048) {
args = args.substring(0, 2048);
}
scope.getSpan().addKVAnnotation("args", args);
}
try {
exitCode = instance.run(Arrays.copyOfRange(argv, 1, argv.length));
} finally {
scope.close();
}
} catch (IllegalArgumentException e) {
if (e.getMessage() == null) {
displayError(cmd, "Null exception message");
e.printStackTrace(System.err);
} else {
displayError(cmd, e.getLocalizedMessage());
}
printUsage(System.err);
if (instance != null) {
printInstanceUsage(System.err, instance);
}
} catch (Exception e) {
// instance.run catches IOE, so something is REALLY wrong if here
LOG.debug("Error", e);
displayError(cmd, "Fatal internal error");
e.printStackTrace(System.err);
}
}
tracer.close();
return exitCode;
}

4、格式化输出

  /** allows stdout to be captured if necessary */
public PrintStream out = System.out;
/** allows stderr to be captured if necessary */
public PrintStream err = System.err;
/** allows the command factory to be used if necessary */
private CommandFactory commandFactory = null;

HDFS命令实现分析的更多相关文章

  1. HDFS源码分析数据块校验之DataBlockScanner

    DataBlockScanner是运行在数据节点DataNode上的一个后台线程.它为所有的块池管理块扫描.针对每个块池,一个BlockPoolSliceScanner对象将会被创建,其运行在一个单独 ...

  2. HDFS源码分析心跳汇报之数据块汇报

    在<HDFS源码分析心跳汇报之数据块增量汇报>一文中,我们详细介绍了数据块增量汇报的内容,了解到它是时间间隔更长的正常数据块汇报周期内一个smaller的数据块汇报,它负责将DataNod ...

  3. HDFS源码分析心跳汇报之BPServiceActor工作线程运行流程

    在<HDFS源码分析心跳汇报之数据结构初始化>一文中,我们了解到HDFS心跳相关的BlockPoolManager.BPOfferService.BPServiceActor三者之间的关系 ...

  4. HDFS源码分析心跳汇报之数据块增量汇报

    在<HDFS源码分析心跳汇报之BPServiceActor工作线程运行流程>一文中,我们详细了解了数据节点DataNode周期性发送心跳给名字节点NameNode的BPServiceAct ...

  5. HDfs命令

    HDFS命令分为用户命令(dfs,fsck等),管理员命令(dfsadmn,namenode,datanode等) hdfs -ls -lsr 执行lsr 是递归显示 drwxr-xr-x -hado ...

  6. sodu 命令场景分析

    摘自:http://www.cnblogs.com/hazir/p/sudo_command.html sudo 命令情景分析   Linux 下使用 sudo 命令,可以让普通用户也能执行一些或者全 ...

  7. 4-linux、hdfs命令

    定义: linux:Linux是一套免费使用和自由传播的类Unix操作系统,是一个基于POSIX和UNIX的多用户.多任务.支持多线程和多CPU的 操作系统.它能运行主要的UNIX工具软件.应用程序和 ...

  8. hdfs命令get或者put提示找不到目录或文件

    今天用hdfs命令出现个诡异情况: hadoop fs -put a.txt /user/root/ put: `a.txt': No such file or directory 用get命令存在相 ...

  9. HDFS 命令大全

    目录 概要 用户命令 dfs 命令 追加文件内容 查看文件内容 得到文件的校验信息 修改用户组 修改文件权限 修改文件所属用户 本地拷贝到 hdfs hdfs 拷贝到本地 获取目录,文件数量及大小 h ...

随机推荐

  1. html单元格导出excel图形环境问题

     现象:报表页面端展现正常,点击导出excel,选择完是否分页后页面没有反应,后台润乾日志中错误信息: runqianReportLogger : [ERROR]  - Error: at com ...

  2. Hive是读时模式

    Hive处理的数据是大数据,在保存表数据时不对数据进行校验,而是在读数据时校验,不符合格式的数据设置为NULL: 读时模式的优点是,加载数据库快. 传统的数据库如mysql.oracle是写时模式,不 ...

  3. zookeeper - java操作

    ZKUtils.java package test; import java.io.IOException; import java.util.concurrent.CountDownLatch; i ...

  4. 单机安装hive和presto

    问题: 公司最近在搞presto,主要是分析一下presto和hive的查询大数据量的性能对比: 我先把我的对比图拿出来(50条数据左右)针对同一条sql(select * from employee ...

  5. [转载]python——事件驱动的简明讲解

    本文转载自http://www.cnblogs.com/thinkroom/p/6729480.html 作者:码匠信龙 方便自己今后查阅存档 关键词:编程范式,事件驱动,回调函数,观察者模式 --- ...

  6. ExpressRoute 先决条件和清单

    若要使用 ExpressRoute 连接到 Azure 服务,需确认是否符合以下部分中所列的要求. 帐户要求 使用中的有效 Azure 帐户.需有此帐户才能设置 ExpressRoute 线路. 连接 ...

  7. UIButton的resizableImageWithCapInsets使用解析

    UIButton的resizableImageWithCapInsets使用解析 效果: 使用的源文件: 源码: // // ViewController.m // SpecialButton // ...

  8. Linux seq命令详解

    seq: squeue  是一个序列的缩写,主要用来输出序列化的东西 seq常见命令参数 用法:seq [选项]... 尾数 或:seq [选项]... 首数 尾数 或:seq [选项]... 首数 ...

  9. (转)光照模型及cg实现

    经典光照模型(illumination model) 物体表面光照颜色由入射光.物体材质,以及材质和光的交互规律共同决定. 由于环境光给予物体各个点的光照强度相同,且没有方向之分,所以在只有环境光的情 ...

  10. ASP.NET MVC 5 开发环境配置

    Install-Package Ninject -Version 3.2.2 -ProjectName SportsStore.WebUIInstall-Package Ninject.Web.Com ...