hadoop源码剖析--RawLocalFileSystem

RawLocalFileSystem是hadoop中实现的本地文件系统，在该类中与文件元数据和目录相关的操作，都是通过适配方式适配到java.io.File的对应API来完成的，适配过程简单，代码清晰。

1.文件元数据和目录相关的操作分析

下面主要以mkDirs()方法为例来窥探该类的实现和一些独到之处。

/****************************************************************

* Implement the FileSystem API for the raw local filesystem.

*

* 本地文件系统实现，文件元数据和目录相关的操作都是通过适配到java.io.File的对应API完成的。

*****************************************************************/

public class RawLocalFileSystem extends FileSystem {

　　static final URI NAME = URI.create("file:///"); //本地文件系统的uri scheme

　　private Path workingDir;

　/**

   * Creates the specified directory hierarchy. Does not

   * treat existence as an error.

   */

  //递归创建目录，是个幂等操作

  public boolean mkdirs(Path f) throws IOException {

    Path parent = f.getParent();

    File p2f = pathToFile(f);

    //如果父目录为空，试图先创建父目录

    //通过File创建目录，并判断成功创建目录

    return (parent == null || mkdirs(parent)) &&

      (p2f.mkdir() || p2f.isDirectory());

  }

  /** {@inheritDoc} */

  //递归创建目录，并为目录设置访问权限（通过调用shell的"chmod "命令来完成的）

  //问答：奇怪java的文件操作中没有提供chmod的api吗？？？查看java.io.File后发现提供了相应的api，

  //但控制粒度太粗了，相关api为：setReadOnly,setWritable,setReadable,setExecutable。对用户权限的控制只到了owner和other的区分对待，没有“chmod ”控制的精细

  @Override

  public boolean mkdirs(Path f, FsPermission permission) throws IOException {

    boolean b = mkdirs(f);

    setPermission(f, permission);

    return b;

  }

再看一下RawLocalFileSystem中的一个内部类RowLocalFileStatus：

 static class RawLocalFileStatus extends FileStatus {

    /* We can add extra fields here. It breaks at least CopyFiles.FilePair().

     * We recognize if the information is already loaded by check if

     * onwer.equals("").

     */

    private boolean isPermissionLoaded() {

      return !super.getOwner().equals("");

    }

    RawLocalFileStatus(File f, long defaultBlockSize, FileSystem fs) {

      super(f.length(), f.isDirectory(), 1, defaultBlockSize,

            f.lastModified(), new Path(f.getPath()).makeQualified(fs));

    }

    @Override

    public FsPermission getPermission() {

      if (!isPermissionLoaded()) {

        loadPermissionInfo();

      }

      return super.getPermission();

    }

　　//使用'ls -ld'命令来获取权限信息

    private void loadPermissionInfo() {

      IOException e = null;

      try {

        StringTokenizer t = new StringTokenizer(

            FileUtil.execCommand(new File(getPath().toUri()),

                                 Shell.getGET_PERMISSION_COMMAND()));

        //expected format

        //-rw-------    1 username groupname ...

        String permission = t.nextToken();

        if (permission.length() > 10) { //files with ACLs might have a '+'

          permission = permission.substring(0, 10);

        }

        setPermission(FsPermission.valueOf(permission));

        t.nextToken();

        setOwner(t.nextToken());

        setGroup(t.nextToken());

      } catch (Shell.ExitCodeException ioe) {

        if (ioe.getExitCode() != 1) {

          e = ioe;

        } else {

          setPermission(null);

          setOwner(null);

          setGroup(null);

        }

      } catch (IOException ioe) {

        e = ioe;

      } finally {

        if (e != null) {

          throw new RuntimeException("Error while running command to get " +

                                     "file permissions : " +

                                     StringUtils.stringifyException(e));

        }

      }

    }

通过以上两段代码可以看出hadoop的本地文件系统的实现，在利用java语言提供的File类的基础上，做了一些适合自身的变化来达到目标。调用linux的shell命令，需要在linux系统中创建一个新的java虚拟机而消耗大量的资源。

2. 文件的读分析

RawLocalFileSystem使用LocalFSFileInputStream和LocalFSFileOutputStream进行读写。

/*******************************************************

   * For open()'s FSInputStream

   *******************************************************/

  //本地文件系统读取流

  class LocalFSFileInputStream extends FSInputStream {

    FileInputStream fis; //文件读取流

    private long position; //记录当前读取的数据在文件中的位置

    public LocalFSFileInputStream(Path f) throws IOException {

      this.fis = new TrackingFileInputStream(pathToFile(f)); //实际使用的是文件读取流是TrackingFileInputStream

    }

    //系统文件当前位置

    public void seek(long pos) throws IOException {

      fis.getChannel().position(pos);

      this.position = pos;

    }

    //获取位置

    public long getPos() throws IOException {

      return this.position;

    }

    //定位到新的block（本地文件系统没有这样的功能，所以简单返回失败）

    public boolean seekToNewSource(long targetPos) throws IOException {

      return false;

    }

    /*

     * Just forward to the fis

     */

    //获取剩余可读或可跳过的字节数

    public int available() throws IOException { return fis.available(); }

    //关闭输入流，并释放系统分配的资源

    public void close() throws IOException { fis.close(); }

    public boolean markSupport() { return false; }

    //read()方法需要随时更新position，以保证getPos()能返回正确的值

    public int read() throws IOException {

      try {

        int value = fis.read();

        if (value >= 0) {

          this.position++; //更新文件当前位置

        }

        return value;

      } catch (IOException e) {                 // unexpected exception

        throw new FSError(e);                   // assume native fs error

      }

    }

    public int read(byte[] b, int off, int len) throws IOException {

      try {

        int value = fis.read(b, off, len);

        if (value > 0) {

          this.position += value;

        }

        return value;

      } catch (IOException e) {                 // unexpected exception

        throw new FSError(e);                   // assume native fs error

      }

    }

    public int read(long position, byte[] b, int off, int len)

      throws IOException {

      ByteBuffer bb = ByteBuffer.wrap(b, off, len);

      try {

        return fis.getChannel().read(bb, position);

      } catch (IOException e) {

        throw new FSError(e);

      }

    }

    public long skip(long n) throws IOException {

      long value = fis.skip(n);

      if (value > 0) {

        this.position += value;

      }

      return value;

    }

  }

可以看到LocalFSFileInputStream类实际使用的读流是TrackingFileInputStream

//重写了FileInputStream中的所有read方法，提供文件读取字节数的统计功能。

  //TrackingFileInputStream使用修饰器模式

  class TrackingFileInputStream extends FileInputStream {

    public TrackingFileInputStream(File f) throws IOException {

      super(f);

    }

    public int read() throws IOException {

      int result = super.read();

      if (result != -1) {

        statistics.incrementBytesRead(1);

      }

      return result;

    }

    public int read(byte[] data) throws IOException {

      int result = super.read(data);

      if (result != -1) {

        statistics.incrementBytesRead(result);

      }

      return result;

    }

    public int read(byte[] data, int offset, int length) throws IOException {

      int result = super.read(data, offset, length);

      if (result != -1) {

        statistics.incrementBytesRead(result);

      }

      return result;

    }

  }

那么RawLocalFileSystem和LocalFSFileInputStream是如何对接起来进行读操作的呢，当然还是和java的api一致（使用open()和create()方法来创建LocalFSFileInputStream）。下面以LocalFSFileInputStream的open()方法为例进行分析：

public FSDataInputStream open(Path f, int bufferSize) throws IOException {

    if (!exists(f)) {

      throw new FileNotFoundException(f.toString());

    }

    return new FSDataInputStream(new BufferedFSInputStream( //包装LocalFSFileInputStream

        new LocalFSFileInputStream(f), bufferSize));

  }

public class FSDataInputStream extends DataInputStream

    implements Seekable, PositionedReadable, Closeable {

  public FSDataInputStream(InputStream in)

    throws IOException {

    super(in);

    if( !(in instanceof Seekable) || !(in instanceof PositionedReadable) ) {

      throw new IllegalArgumentException(

          "In is not an instance of Seekable or PositionedReadable");

    }

  }

  public synchronized void seek(long desired) throws IOException {

    ((Seekable)in).seek(desired);

  }

  public long getPos() throws IOException {

    return ((Seekable)in).getPos();

  }

  public int read(long position, byte[] buffer, int offset, int length)

    throws IOException {

    return ((PositionedReadable)in).read(position, buffer, offset, length);

  }

  public void readFully(long position, byte[] buffer, int offset, int length)

    throws IOException {

    ((PositionedReadable)in).readFully(position, buffer, offset, length);

  }

  public void readFully(long position, byte[] buffer)

    throws IOException {

    ((PositionedReadable)in).readFully(position, buffer, 0, buffer.length);

  }

  public boolean seekToNewSource(long targetPos) throws IOException {

    return ((Seekable)in).seekToNewSource(targetPos);

  }

}

获取到读流后就可以调用流的读取方法进行读取了。

3. 文件的写分析

至于写操作，还是和java中的写保持一致的，支持append和随机写两种方式。

hadoop源码剖析--RawLocalFileSystem的更多相关文章

hadoop源码剖析--hdfs安全模式
一.什么是安全模式 hadoop安全模式是name node的一种状态,处于该状态时有种量特性: 1.namenode不接受任何对hfds文件系统的改变操作(即此时整个文件系统处于只读状态): 2.不 ...
hadoop源码剖析--$HADOOP_HOME/bin/hadoop脚本文件分析
1. $HADOOP_HOME/bin/ hadoop #!/usr/bin/env bash# Licensed to the Apache Software Foundation (ASF) un ...
（升级版）Spark从入门到精通（Scala编程、案例实战、高级特性、Spark内核源码剖析、Hadoop高端）
本课程主要讲解目前大数据领域最热门.最火爆.最有前景的技术——Spark.在本课程中,会从浅入深,基于大量案例实战,深度剖析和讲解Spark,并且会包含完全从企业真实复杂业务需求中抽取出的案例实战.课 ...
Hadoop源码学习笔记之NameNode启动场景流程二：http server启动源码剖析
NameNodeHttpServer启动源码剖析,这一部分主要按以下步骤进行: 一.源码调用分析二.伪代码调用流程梳理三.http server服务流程图解第一步,源码调用分析前一篇文章已经锁 ...
Apache Spark源码剖析
Apache Spark源码剖析(全面系统介绍Spark源码,提供分析源码的实用技巧和合理的阅读顺序,充分了解Spark的设计思想和运行机理) 许鹏著 ISBN 978-7-121-25420- ...
《Apache Spark源码剖析》
Spark Contributor,Databricks工程师连城,华为大数据平台开发部部长陈亮,网易杭州研究院副院长汪源,TalkingData首席数据科学家张夏天联袂力荐1.本书全面.系统地介绍了 ...
Spark源码剖析 - SparkContext的初始化(二)_创建执行环境SparkEnv
2. 创建执行环境SparkEnv SparkEnv是Spark的执行环境对象,其中包括众多与Executor执行相关的对象.由于在local模式下Driver会创建Executor,local-cl ...
Hadoop源码学习笔记之NameNode启动场景流程一：源码环境搭建和项目模块及NameNode结构简单介绍
最近在跟着一个大佬学习Hadoop底层源码及架构等知识点,觉得有必要记录下来这个学习过程.想到了这个废弃已久的blog账号,决定重新开始更新. 主要分以下几步来进行源码学习: 一.搭建源码阅读环境二. ...
jQuery之Deferred源码剖析
一.前言大约在夏季,我们谈过ES6的Promise(详见here),其实在ES6前jQuery早就有了Promise,也就是我们所知道的Deferred对象,宗旨当然也和ES6的Promise一样, ...

随机推荐

Objective-C 的 API 设计(转)
英文原文:API Design 转自oschina 参与翻译(14人): 李远超, 魏涛, showme, weizhe72, 周荣冰, crAzyli0n, WangWenjing, throwab ...
cocos2d-x 事件分发机制 ——触摸事件监听
cocos2d-x 3.0 出来已经好久了,也已经用3.0写了几个小游戏,感觉3.0的事件触发机制太赞了,随这里总结一下.也算是对知识的一种回顾和加深理解. 3.0的事件分发机制中.须要也只须要通过创 ...
EasyHook库系列使用教程之四钩子的启动与停止
此文的产生花费了大量时间对EasyHook进行深入了解同一时候參考了大量文档先来简单比較一下EasyHook与Detour钩取后程序流程 Detours:钩取API函数后.产生两个地址,一个地址相应 ...
视图交互－－表视图(UITableView)的cell交互析略
在表视图UITableView的cell上经常有一些交互,根据项目开发中的情况,需要对此进行一些规范.总结出了几种交互方法,这些方法在其他视图的交互上同样可以适用.用一个简单的例子来举例说明一下,其他 ...
Mysql5.6审计功能
1. 前言为了安全和操作的可追溯性考虑,越来越多的公司增加了审计功能.mysql5.5推出了相关的审计功能,到5.6.20功能进一步完好.算是勉强可用了.尽管细粒度方面做的不是太好. ...
小白学phoneGap《构建跨平台APP：phoneGap移动应用实战》连载一（PhoneGap中的API）
之前本博连载过<构建跨平台APP:jQuery Mobile移动应用实战>一书.深受移动开发入门人员的喜爱. 从如今開始,连载它的孪生姐妹书phoneGap移动应用实战一书,希望以前是小白 ...
使用squid快速搭建代理
shadowsocks停止维护,如何使用squid快速搭建代理 =======本项目主要介绍如何利用国外VPS搭建多协议代理服务.GFW 封锁了 HTTP/Socks5 代理,HTTP 代理是关键 ...
Arduino关于旋转编码器程序的介绍（Reading Rotary Encoders）--by Markdown
介绍旋转或编码器是一个角度測量装置. 他用作精确測量电机的旋转角度或者用来控制控制轮子(能够无限旋转,而电位器只能旋转到特定位置).其中有一些还安装了一个能够在轴上按的button,就像音乐播放器的 ...
获取当前外网IP地址
<script src="http://pv.sohu.com/cityjson?ie=utf-8"></script><script>cons ...
nanoporetech/nanonet
nanoporetech/nanonet CodeIssues 7Pull requests 0Projects 0Wiki Insights First generation RNN baseca ...

hadoop源码剖析--RawLocalFileSystem

hadoop源码剖析--RawLocalFileSystem的更多相关文章

随机推荐

热门专题