【HDFS API编程】查看目标文件夹下的所有文件、递归查看目标文件夹下的所有文件

使用hadoop命令：hadoop fs -ls /hdfsapi/test 我们能够查看HDFS文件系统/hdfsapi/test目录下的所有文件信息

那么使用代码怎么写呢？直接先上代码：(这之后贴上去的代码怎么就全灰色了？....)

public class HDFSApp {

    public static final String HDFS_PATH = "hdfs://hadoop000:8020";
    FileSystem fileSystem = null;
    Configuration configuration = null;

    @Before
    public void setUp() throws Exception{
        System.out.println("setUp-----------");
        configuration = new Configuration();
        configuration.set("dfs.replication","1");

        /**
         * 构造一个访问制定HDFS系统的客户端对象
         * 第一个参数：HDFS的URI
         * 第二个参数：客户端制定的配置参数
         * 第三个参数：客户端的身份，说白了就是用户名
         */
        fileSystem = FileSystem.get(new URI(HDFS_PATH),configuration,"hadoop");
    }   
 　　/**

     * 查看目标文件夹下的所有文件

     * @throws Exception

     */

    @Test

    public void listFiles() throws Exception{

        FileStatus[] statuses = fileSystem.listStatus(new Path("/hdfsapi/test"));

        for(FileStatus file : statuses){

            String isDir = file.isDirectory() ? "文件夹" : "文件";

            String permission = file.getPermission().toString();

            short replication = file.getReplication();

            long length = file.getLen();

            String path = file.getPath().toString();

            System.out.println(isDir + "\t" + permission + "\t" + replication + "\t" + length + "\t" + path);

        }

    }
 　　@After
    public void tearDown(){
        configuration = null;
        fileSystem = null;
        System.out.println("----------tearDown------");
    }
}

运行测试类：
setUp-----------
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
文件    rw-r--r--    3    14    hdfs://hadoop000:8020/hdfsapi/test/a.txt
文件    rw-r--r--    1    28    hdfs://hadoop000:8020/hdfsapi/test/c.txt
文件    rw-r--r--    1    181367942    hdfs://hadoop000:8020/hdfsapi/test/jdk.zip
文件    rw-r--r--    1    2732    hdfs://hadoop000:8020/hdfsapi/test/t.txt
文件夹    rwxr-xr-x    0    0    hdfs://hadoop000:8020/hdfsapi/test/testdir
----------tearDown------

首先我们找到fileSystem的listStatus方法，这个方法怎么用？还是那句话：哪里不会Ctrl点哪里。我们点进去能看到方法的源码信息，能够知道该方法的返回值是一个FileStatus[]数组类型，所要传入的参数是目标目录Path：

 /**

   * List the statuses of the files/directories in the given path if the path is

   * a directory.

   * <p>

   * Does not guarantee to return the List of files/directories status in a

   * sorted order.

   *列出给定路径中文件/目录的状态（如果路径为目录。不保证返回排序顺序。

   * @param f given path

   * @return the statuses of the files/directories in the given patch

   * @throws FileNotFoundException when the path does not exist;

   *         IOException see specific implementation

   */

  public abstract FileStatus[] listStatus(Path f) throws FileNotFoundException,

                                                         IOException;

既然返回的是一个数组类型，我们自然会想到用循环来遍历，但是FileStatus 这个又是什么呢？Ctrl点进去：

/** Interface that represents the client side information for a file.

 *表示文件的客户端信息的接口。

 */

@InterfaceAudience.Public

@InterfaceStability.Stable

public class FileStatus implements Writable, Comparable {

  private Path path;

  private long length;

  private boolean isdir;

  private short block_replication;

  private long blocksize;

  private long modification_time;

  private long access_time;

  private FsPermission permission;

  private String owner;

  private String group;

  private Path symlink;

FileStatus这是一个表示文件的客户端信息的接口，贴上了一些类的成员变量，我们能从中知道这个里面包含了文件的这么多信息接口。自然就能够使用类里的方法进行访问取得文件的相关信息了。

测试成功，但是我们发现一个问题，就是这个方法就如hadoop fs -ls /hdfsapi/test 一样用户只能查看到当前目录下的文件信息，倘若文件夹test下还有文件夹testdir，testdir文件夹里还有文件就无法显示了，所以我们来看看怎么进行递归查看目标文件夹下的所有文件。

首先通过hadoop命令递归查看： hadoop fs -ls -R /hdfsapi/test （赶紧试试去 recursive 递归）

那么通过代码怎么实现呢？

我们之前使用的是fileSystem下的listStatus方法，那么我们继续查看API有没有能够使用的，我们看到有一个方法是listFiles：

 /**

   * List the statuses and block locations of the files in the given path.

   * Does not guarantee to return the iterator that traverses statuses

   * of the files in a sorted order.

   *

   * If the path is a directory,

   *   if recursive is false, returns files in the directory;

   *   if recursive is true, return files in the subtree rooted at the path.

   * If the path is a file, return the file's status and block locations.

   *

   * @param f is the path

   * @param recursive if the subdirectories need to be traversed recursively

   *

   * @return an iterator that traverses statuses of the files

   *

   * @throws FileNotFoundException when the path does not exist;

   *         IOException see specific implementation

   */

  public RemoteIterator<LocatedFileStatus> listFiles(

      final Path f, final boolean recursive)

  throws FileNotFoundException, IOException {}

所以需要我们不仅善于查看API还要善于查找API。

于是递归查看目标文件夹下的所有文件代码这么写：

/**

     * 递归查看目标文件夹下的所有文件

     * @throws Exception

     */

    @Test

    public void listFilesRecursive() throws Exception{

        RemoteIterator<LocatedFileStatus> files = fileSystem.listFiles(new Path("/hdfsapi/test"),true);

        while (files.hasNext()){

            LocatedFileStatus file = files.next();

            String isDir = file.isDirectory() ? "文件夹" : "文件";

            String permission = file.getPermission().toString();

            short replication = file.getReplication();

            long length = file.getLen();

            String path = file.getPath().toString();

            System.out.println(isDir + "\t" + permission + "\t" + replication + "\t" + length + "\t" + path);

        }

    }

运行测试类：

setUp-----------

log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).

log4j:WARN Please initialize the log4j system properly.

log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

文件    rw-r--r--    3    14    hdfs://hadoop000:8020/hdfsapi/test/a.txt

文件    rw-r--r--    1    28    hdfs://hadoop000:8020/hdfsapi/test/c.txt

文件    rw-r--r--    1    181367942    hdfs://hadoop000:8020/hdfsapi/test/jdk.zip

文件    rw-r--r--    1    2732    hdfs://hadoop000:8020/hdfsapi/test/t.txt

文件    rw-r--r--    1    11    hdfs://hadoop000:8020/hdfsapi/test/testdir/h.txt

----------tearDown------

【HDFS API编程】查看目标文件夹下的所有文件、递归查看目标文件夹下的所有文件的更多相关文章

【HDFS API编程】jUnit封装-改写创建文件夹
首先:什么是jUnit 回顾: https://www.cnblogs.com/Liuyt-61/p/10374732.html 上一节我们知道: /** * 使用Java API操作HDFS文件系 ...
【HDFS API编程】查看HDFS文件内容、创建文件并写入内容、更改文件名
首先,重点重复重复再重复: /** * 使用Java API操作HDFS文件系统 * 关键点: * 1)创建 Configuration * 2)获取 FileSystem * 3)...剩下的就是 ...
【HDFS API编程】第一个应用程序的开发-创建文件夹
/** * 使用Java API操作HDFS文件系统 * 关键点: * 1)创建 Configuration * 2)获取 FileSystem * 3)...剩下的就是 HDFS API的操作了*/ ...
【HDFS API编程】从本地拷贝文件，从本地拷贝大文件，拷贝HDFS文件到本地
接着之前继续API操作的学习 CopyFromLocalFile: 顾名思义,从本地文件拷贝 /** * 使用Java API操作HDFS文件系统 * 关键点: * 1)create Configur ...
HDFS API编程
3.1常用类 3.1.1Configuration Hadoop配置文件的管理类,该类的对象封装了客户端或者服务器的配置(配置集群时,所有的xml文件根节点都是configuration ...
【HDFS API编程】开发环境搭建
使用HDFS API的方式来操作HDFS文件系统 IDEA Java 使用Maven来管理项目先打开IDEA,New Project 创建GAV然后next 默认使用的有idea内置的Maven,可 ...
【HDFS API编程】查看文件块信息
现在我们把文件都存在HDFS文件系统之上,现在有一个jdk.zip文件存储在上面,我们想知道这个文件在哪些节点之上?切成了几个块?每个块的大小是怎么样?先上测试类代码: /** * 查看文件块信息 * ...
【HDFS API编程】删除文件
所有操作都是以fileSystem为入口进行,我们使用fileSystem下的delete方法进行删除文件操作,删除的时候必须慎重. 直接上代码: /** * 删除文件 * @throws Excep ...
【HDFS API编程】图解客户端写文件到HDFS的流程

随机推荐

windows 安装touch指令
if you are using node.js just use npm to install it on Windows: C:\npm install touch-cli -g
1040 mysql Too many connections
笔者在项目中遇到mysql 出现:1040 too many connections 异常,意思是超过数据库最大连接数,打不开表结构信息.笔者排除问题建议:1.查看程序代码是否存在BUG:2.检查代码 ...
[Java Web学习]Spring MVC使用普通类对象，声明的对象为null
由于对Spring还不熟悉,目前还处于学习阶段,因此经常会遇到一些小白问题,这个问题需要在Spring文件中将普通对象注入bean,然后在MVC中添加set方法,填充普通对象.
JS写法数值与字符串的相互转换取字符中的一部分显示正则表达规则
http://www.imooc.com/article/15885 正则表达规则 <script type="text/javascript"> </scrip ...
Python3的List操作和方法
列表函数: len(list):列表元素个数 max(list):返回list中最大的元素 min(list):返回list中最小的元素 list(seq):将元组转换为列表列表方法: list.a ...
python中使用redis实战
from redis import StrictRedis rds = StrictRedis(host='127.0.0.1', port=6379, db=0, decode_responses= ...
C 一维数组冒泡排序，查最大值
1. 初始化 char a[10] = {'1','2','3','4','5'};//指定数组大小并部分赋初值, 其余部分赋值 '\0' ,ASSIC 是 0 char b[] = {'1','2' ...
DNS 负载均衡
相关文章: 文章网址一个域名可以绑定多个IP吗?由此引发的调查 https://ask.zkbhj.com/?/article/139
cellmap 基站查询 for android
cellmap for android 3.6.8.8.1.8 更新日期:2019年4月28日特别声明:本软件不能进行手机定位,不能对手机号码定位,谨防被骗. 安装说明:请卸载旧版本后,重新下载安装 ...
DevExpress Grid使用checkBox选中的方法
到官网得到消息自13.2版本后的Dev Grid中均内置了CheckBox列多选功能.在寻找答案的过程的成果进行记录. 一.13.2版本以后用法启用多选列对Gird中的View进行以下属性设置: ...

【HDFS API编程】查看目标文件夹下的所有文件、递归查看目标文件夹下的所有文件

【HDFS API编程】查看目标文件夹下的所有文件、递归查看目标文件夹下的所有文件的更多相关文章

随机推荐

热门专题