hdfs深入：10、hdfs的javaAPI操作

    /**
     * 递归遍历hdfs中所有的文件路径
     */
    @Test
    public void getAllHdfsFilePath() throws URISyntaxException, IOException {
        //获取fs的客户端
        FileSystem fileSystem = FileSystem.get(new URI("hdfs://node01:8020"), new Configuration());

        Path path = new Path("/");
        FileStatus[] fileStatuses = fileSystem.listStatus(path);

        //循环遍历fileStatuses，如果是文件，打印文件的路径，如果是文件夹，继续递归进去
        for (FileStatus fileStatus : fileStatuses){
            if (fileStatus.isDirectory()){//文件夹
                getDirectoryFiles(fileSystem,fileStatus);
            }else{ //文件
                System.out.println(fileStatus.getPath().toString());
            }
        }

        //方法二：
        System.out.println("方法二：利用官方提供API");
        RemoteIterator<LocatedFileStatus> locatedFileStatusRemoteIterator = fileSystem.listFiles(new Path("/"), true);

        while (locatedFileStatusRemoteIterator.hasNext()){
            LocatedFileStatus next = locatedFileStatusRemoteIterator.next();
            System.out.println(next.getPath());
        }

        //关闭fs的客户端
        fileSystem.close();
    }

    /**
     * 递归获取文件路径
     */
    public void getDirectoryFiles(FileSystem fileSystem,FileStatus fileStatus) throws IOException {
        //通过fileStatus获取文件夹路径
        Path path = fileStatus.getPath(); //该fileStatus必定为一个文件夹
        FileStatus[] fileStatuses = fileSystem.listStatus(path);
        for (FileStatus status:fileStatuses){
            if (fileStatus.isDirectory()){
                getDirectoryFiles(fileSystem,status);
            }else{
                System.out.println(fileStatus.getPath().toString());
            }
        }
    }

    /**
     * 下载hdfs文件到本地
     */
    @Test
    public void copyHdfsToLocal() throws Exception {

        FileSystem fileSystem = FileSystem.get(new URI("hdfs://node01:8020"), new Configuration());

        FSDataInputStream inputStream = fileSystem.open(new Path("hdfs://node01:8020/aa/haha2.txt"));

        FileOutputStream outputStream = new FileOutputStream(new File("d:\\install-log.txt"));

        IOUtils.copy(inputStream,outputStream);
        IOUtils.closeQuietly(inputStream);
        IOUtils.closeQuietly(outputStream);

        //方法二：利用官方API
        //有报错：java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileWithMode0(Ljava/lang/String;JJJI)Ljava/io/FileDescriptor;
        fileSystem.copyToLocalFile(new Path("hdfs://node01:8020/aa/haha2.txt"),new Path("file:///d:\\install-log2.txt"));

        fileSystem.close();
    }

    /**
     * hdfs上面创建文件夹
     */
    @Test
    public void createHdfsDir() throws  Exception{
        FileSystem fileSystem = FileSystem.get(new URI("hdfs://node01:8020"), new Configuration());
        fileSystem.mkdirs(new Path("/aa/bb/cc/"));
        fileSystem.close();
    }

    /**
     * hdfs的文件上传
     */
    @Test
    public void uploadFileToHdfs() throws  Exception{
        FileSystem fileSystem = FileSystem.get(new URI("hdfs://node01:8020"), new Configuration());
        //注：new Path()中的字符串参数如果省略file:///或hdfs://的话，默认会在参数前添加hdfs://node01:8020，即，默认是hdfs路径
        fileSystem.copyFromLocalFile(false,new Path("file:///d:\\output.txt"),new Path("/aa/bb/cc"));

        //第二种方法：通过流的方式
        //输出流，负责将数据输出到hdfs的路径上面去
        FSDataOutputStream outputStream = fileSystem.create(new Path("/aa/bb/cc/empSel.hdfs"));
        //通过输入流读取本地文件系统的文件
        InputStream inputStream = new FileInputStream(new File("d:\\empSel.txt"));
        IOUtils.copy(inputStream,outputStream);
        IOUtils.closeQuietly(inputStream);
        IOUtils.closeQuietly(outputStream);
        fileSystem.close();
    }

    /**
     * hdfs的权限校验机制
     */
    @Test
    public  void hdfsPermission() throws  Exception{
        /*
            在所有节点的hdfs-site.xml中设置开启权限验证：
            <property>
                <name>dfs.permissions</name>
                <value>true</value>
            </property>
            普通的filesystem，执行时会报错：org.apache.hadoop.security.AccessControlException:
            Permission denied: user=Administrator, access=READ, inode="/config/core-site.xml":root:supergroup:-rw-------
            FileSystem fileSystem = FileSystem.get(new URI("hdfs://node01:8020"), new Configuration());
         */
        //通过伪造用户来获取分布式文件系统的客户端
        FileSystem fileSystem = FileSystem.get(new URI("hdfs://node01:8020"), new Configuration(), "root");
        //从hdfs上下载文件到本地
        FSDataInputStream inputStream = fileSystem.open(new Path("/config/core-site.xml"));
        FileOutputStream outputStream = new FileOutputStream(new File("d:\\core-site.txt"));
        IOUtils.copy(inputStream,outputStream);
        IOUtils.closeQuietly(inputStream);
        IOUtils.closeQuietly(outputStream);
//        fileSystem.copyFromLocalFile(new Path("file:///d:\\transferIndex.txt"),new Path("/aa/bb/cc/"));
//        fileSystem.delete(new Path("/aa/bb/cc/"),false);
        fileSystem.close();
    }

    /**
     * hdfs在上传小文件的时候进行合并
     * 在我们的hdfs 的shell命令模式下，可以通过命令行将很多的hdfs文件合并成一个大文件下载到本地：
     *      hdfs dfs -getmerge /config/*.xml  ./hello.xml
     *  上传时也能将小文件合并到一个大文件里面去。
     */
    @Test
    public void mergeFile()throws  Exception{
        //获取分布式文件系统
        FileSystem fileSystem = FileSystem.get(new URI("hdfs://192.168.8.100:8020"), new Configuration(),"root");
        FSDataOutputStream outputStream = fileSystem.create(new Path("/bigFile.xml"));

        //获取本地所有小文件的输入流
        //首先获取本地文件系统
        LocalFileSystem localFileSystem = FileSystem.getLocal(new Configuration());
        FileStatus[] fileStatuses = localFileSystem.listStatus(new Path("file:///D:\\上传小文件合并"));
        for (FileStatus fileStatus:fileStatuses){
            Path path = fileStatus.getPath();
            FSDataInputStream fsDataInputStream = localFileSystem.open(path);
            IOUtils.copy(fsDataInputStream,outputStream);
            IOUtils.closeQuietly(fsDataInputStream);
        }
        IOUtils.closeQuietly(outputStream);
        fileSystem.close();
        localFileSystem.close();
    }

hdfs深入：10、hdfs的javaAPI操作的更多相关文章

使用javaAPI操作hdfs
欢迎到https://github.com/huabingood/everyDayLanguagePractise查看源码. 一.构建环境在hadoop的安装包中的share目录中有hadoop所有 ...
HDFS文件系统的JAVA-API操作(一)
使用java.net.URL访问HDFS文件系统 HDFS的API使用说明: 1.如果要访问HDFS,HDFS客户端必须有一份HDFS的配置文件也就是hdfs-site.xml,从而读取Nameno ...
Linux单机环境下HDFS伪分布式集群安装操作步骤v1.0
公司平台的分布式文件系统基于Hadoop HDFS技术构建,为开发人员学习及后续项目中Hadoop HDFS相关操作提供技术参考特编写此文档.本文档描述了Linux单机环境下Hadoop HDFS伪分 ...
HDFS命令行及JAVA API操作
查看进程 jps 访问hdfs: hadoop-root:50070 hdfs bash命令: hdfs dfs <1> -help: 显示命令的帮助的信息 <2> - ...
初识HDFS（10分钟了解HDFS、NameNode和DataNode）
概览首先我们来认识一下HDFS, HDFS(Hadoop Distributed File System )Hadoop分布式文件系统.它其实是将一个大文件分成若干块保存在不同服务器的多个节点中.通 ...
Hadoop HDFS的shell(命令行客户端)操作实例
HDFS的shell(命令行客户端)操作实例 3.2 常用命令参数介绍 -help 功能:输出这个命令参数手册 -ls 功能:显示目录信息示例: hadoop fs ...
[bigdata] 使用Flume hdfs sink， hdfs文件未关闭的问题
现象: 执行mapreduce任务时失败通过hadoop fsck -openforwrite命令查看发现有文件没有关闭. [root@com ~]# hadoop fsck -openforwri ...
[HDFS Manual] CH3 HDFS Commands Guide
HDFS Commands Guide HDFS Commands Guide 3.1概述 3.2 用户命令 3.2.1 classpath 3.2.2 dfs 3.2.3 envvars 3.2.4 ...
[HDFS Manual] CH2 HDFS Users Guide
2 HDFS Users Guide 2 HDFS Users Guide 2.1目的 2.2.概述 2.3.先决条件 2.4. Web Interface 2.5. Shell Command 2. ...
[HDFS Manual] CH1 HDFS体系结构
v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VM ...

随机推荐

BZOJ_3620_似乎在梦中见过的样子_KMP
BZOJ_3620_似乎在梦中见过的样子_KMP Description “Madoka,不要相信 QB!”伴随着 Homura 的失望地喊叫,Madoka 与 QB 签订了契约. 这是 Modoka ...
request的Content-Type小结
一.Content-Type定义 Content-Type MediaType,即是Internet Media Type,互联网媒体类型:也叫做MIME类型,在Http协议消息头中,使用Conten ...
【转】浏览器中输入url后发生了什么
原文地址:http://www.jianshu.com/p/c1dfc6caa520 在学习前端的过程中经常看到这样一个问题:当你在浏览器中输入url后发生了什么?下面是个人学习过程中的总结,供个人复 ...
fzu 2150(bfs)
Problem 2150 Fire Game Accept: 693 Submit: 2657 Time Limit: 1000 mSec Memory Limit : 32768 KB ...
小程序-demo：小熊の日记
ylbtech-小程序-demo:小熊の日记 1.CHANGELOG.md # -- * 更新开发者工具至`v0.10.101100` * 修改`new`页的数据绑定方式 & 修改多行文本框输 ...
虚拟机C盘扩容
使用 <分区助手> 下载地址:http://115.com/file/belj8wkm
关于zookeeper的集群搭建
在所有机器上安装完zookeeper之后, 开始进行集群的搭建 1. 修改 ../zookeeper/conf/zoo.cfg 文件 tickTime=2000 dataDir=/usr/local/ ...
Mysql数据库介绍、安装和配置文件
Mysql数据库介绍.安装和配置文件 MySQL数据库介绍 mysql是开源关系型数据库,遵循GPL协议. mysql的特点是性能卓越且服务稳定,开源,无版本限制,成本低,单进程多线程,多用户,基于C ...
bzoj 3143 [Hnoi2013]游走【高斯消元+dp】
参考:http://blog.csdn.net/vmurder/article/details/44542575 和2337有点像设点u的经过期望(还是概率啊我也分不清,以下都分不清)为\( x[u ...
【TIDB】2、TIDB进阶
0.TIDB优势 1.和MySql相比,具备OLAP能力.省去了很多数据仓库搭建成本和学习成本.这在业务层是非常受欢迎的.可以在其他分库分表业务中,通过 syncer 同步,进行合并,然后进行统计分析 ...

hdfs深入：10、hdfs的javaAPI操作

hdfs深入：10、hdfs的javaAPI操作的更多相关文章

随机推荐

热门专题