一、引言

  Hadoop版本提供了对多种文件系统的支持,但是这些文件系统是以何种方式实现的,其实现原理是什么以前并没有深究过。今天正好有人咨询我这个问题:Hadoop对S3的支持原理是什么?特此总结一下。Hadoop支持的文件系统包括:  

  文件系统                 URI前缀       hadoop的具体实现类

  Local                     file               fs.LocalFileSystem

  HDFS                     hdfs            hdfs.DistributedFileSystem

  HFTP                      hftp            hdfs.HftpFileSystem

  HSFTP                    hsftp          hdfs.HsftpFileSystem

  HAR                        har            fs.HarFileSystem

  KFS                         kfs            fs.kfs.KosmosFileSystem

  FTP                          ftp             fs.ftp.FTPFileSystem

  S3 (native)              s3n            fs.s3native.NativeS3FileSystem

  S3 (blockbased)      s3      fs.s3.S3FileSystem

二、争议观点

  1.Hadoop对S3文件系统的支持是通过自己实现S3文件系统来做的吗?

   2.Hadoop对S3文件系统的支持是通过S3文件系统接口,实现的对S3文件系统的整合?

三、源码解析

 package org.apache.hadoop.fs.s3;

 import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.Closeable;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;
import java.util.HashMap;
import java.util.Map;
import java.util.Set;
import java.util.TreeSet; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.s3.INode.FileType;
import org.jets3t.service.S3Service;
import org.jets3t.service.S3ServiceException;
import org.jets3t.service.impl.rest.httpclient.RestS3Service;
import org.jets3t.service.model.S3Bucket;
import org.jets3t.service.model.S3Object;
import org.jets3t.service.security.AWSCredentials; class Jets3tFileSystemStore implements FileSystemStore { private static final String FILE_SYSTEM_NAME = "fs";
private static final String FILE_SYSTEM_VALUE = "Hadoop"; private static final String FILE_SYSTEM_TYPE_NAME = "fs-type";
private static final String FILE_SYSTEM_TYPE_VALUE = "block"; private static final String FILE_SYSTEM_VERSION_NAME = "fs-version";
private static final String FILE_SYSTEM_VERSION_VALUE = "1"; private static final Map<String, String> METADATA =
new HashMap<String, String>(); static {
METADATA.put(FILE_SYSTEM_NAME, FILE_SYSTEM_VALUE);
METADATA.put(FILE_SYSTEM_TYPE_NAME, FILE_SYSTEM_TYPE_VALUE);
METADATA.put(FILE_SYSTEM_VERSION_NAME, FILE_SYSTEM_VERSION_VALUE);
} private static final String PATH_DELIMITER = Path.SEPARATOR;
private static final String BLOCK_PREFIX = "block_"; private Configuration conf; private S3Service s3Service; private S3Bucket bucket; private int bufferSize; public void initialize(URI uri, Configuration conf) throws IOException { this.conf = conf; S3Credentials s3Credentials = new S3Credentials();
s3Credentials.initialize(uri, conf);
try {
AWSCredentials awsCredentials =
new AWSCredentials(s3Credentials.getAccessKey(),
s3Credentials.getSecretAccessKey());
this.s3Service = new RestS3Service(awsCredentials);
} catch (S3ServiceException e) {
if (e.getCause() instanceof IOException) {
throw (IOException) e.getCause();
}
throw new S3Exception(e);
}
bucket = new S3Bucket(uri.getHost()); this.bufferSize = conf.getInt("io.file.buffer.size", 4096);
} public String getVersion() throws IOException {
return FILE_SYSTEM_VERSION_VALUE;
} private void delete(String key) throws IOException {
try {
s3Service.deleteObject(bucket, key);
} catch (S3ServiceException e) {
if (e.getCause() instanceof IOException) {
throw (IOException) e.getCause();
}
throw new S3Exception(e);
}
} public void deleteINode(Path path) throws IOException {
delete(pathToKey(path));
} public void deleteBlock(Block block) throws IOException {
delete(blockToKey(block));
} public boolean inodeExists(Path path) throws IOException {
InputStream in = get(pathToKey(path), true);
if (in == null) {
return false;
}
in.close();
return true;
} public boolean blockExists(long blockId) throws IOException {
InputStream in = get(blockToKey(blockId), false);
if (in == null) {
return false;
}
in.close();
return true;
} private InputStream get(String key, boolean checkMetadata)
throws IOException { try {
S3Object object = s3Service.getObject(bucket, key);
if (checkMetadata) {
checkMetadata(object);
}
return object.getDataInputStream();
} catch (S3ServiceException e) {
if ("NoSuchKey".equals(e.getS3ErrorCode())) {
return null;
}
if (e.getCause() instanceof IOException) {
throw (IOException) e.getCause();
}
throw new S3Exception(e);
}
} private InputStream get(String key, long byteRangeStart) throws IOException {
try {
S3Object object = s3Service.getObject(bucket, key, null, null, null,
null, byteRangeStart, null);
return object.getDataInputStream();
} catch (S3ServiceException e) {
if ("NoSuchKey".equals(e.getS3ErrorCode())) {
return null;
}
if (e.getCause() instanceof IOException) {
throw (IOException) e.getCause();
}
throw new S3Exception(e);
}
} private void checkMetadata(S3Object object) throws S3FileSystemException,
S3ServiceException { String name = (String) object.getMetadata(FILE_SYSTEM_NAME);
if (!FILE_SYSTEM_VALUE.equals(name)) {
throw new S3FileSystemException("Not a Hadoop S3 file.");
}
String type = (String) object.getMetadata(FILE_SYSTEM_TYPE_NAME);
if (!FILE_SYSTEM_TYPE_VALUE.equals(type)) {
throw new S3FileSystemException("Not a block file.");
}
String dataVersion = (String) object.getMetadata(FILE_SYSTEM_VERSION_NAME);
if (!FILE_SYSTEM_VERSION_VALUE.equals(dataVersion)) {
throw new VersionMismatchException(FILE_SYSTEM_VERSION_VALUE,
dataVersion);
}
} public INode retrieveINode(Path path) throws IOException {
return INode.deserialize(get(pathToKey(path), true));
} public File retrieveBlock(Block block, long byteRangeStart)
throws IOException {
File fileBlock = null;
InputStream in = null;
OutputStream out = null;
try {
fileBlock = newBackupFile();
in = get(blockToKey(block), byteRangeStart);
out = new BufferedOutputStream(new FileOutputStream(fileBlock));
byte[] buf = new byte[bufferSize];
int numRead;
while ((numRead = in.read(buf)) >= 0) {
out.write(buf, 0, numRead);
}
return fileBlock;
} catch (IOException e) {
// close output stream to file then delete file
closeQuietly(out);
out = null; // to prevent a second close
if (fileBlock != null) {
fileBlock.delete();
}
throw e;
} finally {
closeQuietly(out);
closeQuietly(in);
}
} private File newBackupFile() throws IOException {
File dir = new File(conf.get("fs.s3.buffer.dir"));
if (!dir.exists() && !dir.mkdirs()) {
throw new IOException("Cannot create S3 buffer directory: " + dir);
}
File result = File.createTempFile("input-", ".tmp", dir);
result.deleteOnExit();
return result;
} public Set<Path> listSubPaths(Path path) throws IOException {
try {
String prefix = pathToKey(path);
if (!prefix.endsWith(PATH_DELIMITER)) {
prefix += PATH_DELIMITER;
}
S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
Set<Path> prefixes = new TreeSet<Path>();
for (int i = 0; i < objects.length; i++) {
prefixes.add(keyToPath(objects[i].getKey()));
}
prefixes.remove(path);
return prefixes;
} catch (S3ServiceException e) {
if (e.getCause() instanceof IOException) {
throw (IOException) e.getCause();
}
throw new S3Exception(e);
}
} public Set<Path> listDeepSubPaths(Path path) throws IOException {
try {
String prefix = pathToKey(path);
if (!prefix.endsWith(PATH_DELIMITER)) {
prefix += PATH_DELIMITER;
}
S3Object[] objects = s3Service.listObjects(bucket, prefix, null);
Set<Path> prefixes = new TreeSet<Path>();
for (int i = 0; i < objects.length; i++) {
prefixes.add(keyToPath(objects[i].getKey()));
}
prefixes.remove(path);
return prefixes;
} catch (S3ServiceException e) {
if (e.getCause() instanceof IOException) {
throw (IOException) e.getCause();
}
throw new S3Exception(e);
}
} private void put(String key, InputStream in, long length, boolean storeMetadata)
throws IOException { try {
S3Object object = new S3Object(key);
object.setDataInputStream(in);
object.setContentType("binary/octet-stream");
object.setContentLength(length);
if (storeMetadata) {
object.addAllMetadata(METADATA);
}
s3Service.putObject(bucket, object);
} catch (S3ServiceException e) {
if (e.getCause() instanceof IOException) {
throw (IOException) e.getCause();
}
throw new S3Exception(e);
}
} public void storeINode(Path path, INode inode) throws IOException {
put(pathToKey(path), inode.serialize(), inode.getSerializedLength(), true);
} public void storeBlock(Block block, File file) throws IOException {
BufferedInputStream in = null;
try {
in = new BufferedInputStream(new FileInputStream(file));
put(blockToKey(block), in, block.getLength(), false);
} finally {
closeQuietly(in);
}
} private void closeQuietly(Closeable closeable) {
if (closeable != null) {
try {
closeable.close();
} catch (IOException e) {
// ignore
}
}
} private String pathToKey(Path path) {
if (!path.isAbsolute()) {
throw new IllegalArgumentException("Path must be absolute: " + path);
}
return path.toUri().getPath();
} private Path keyToPath(String key) {
return new Path(key);
} private String blockToKey(long blockId) {
return BLOCK_PREFIX + blockId;
} private String blockToKey(Block block) {
return blockToKey(block.getId());
} public void purge() throws IOException {
try {
S3Object[] objects = s3Service.listObjects(bucket);
for (int i = 0; i < objects.length; i++) {
s3Service.deleteObject(bucket, objects[i].getKey());
}
} catch (S3ServiceException e) {
if (e.getCause() instanceof IOException) {
throw (IOException) e.getCause();
}
throw new S3Exception(e);
}
} public void dump() throws IOException {
StringBuilder sb = new StringBuilder("S3 Filesystem, ");
sb.append(bucket.getName()).append("\n");
try {
S3Object[] objects = s3Service.listObjects(bucket, PATH_DELIMITER, null);
for (int i = 0; i < objects.length; i++) {
Path path = keyToPath(objects[i].getKey());
sb.append(path).append("\n");
INode m = retrieveINode(path);
sb.append("\t").append(m.getFileType()).append("\n");
if (m.getFileType() == FileType.DIRECTORY) {
continue;
}
for (int j = 0; j < m.getBlocks().length; j++) {
sb.append("\t").append(m.getBlocks()[j]).append("\n");
}
}
} catch (S3ServiceException e) {
if (e.getCause() instanceof IOException) {
throw (IOException) e.getCause();
}
throw new S3Exception(e);
}
System.out.println(sb);
} }

四、有图有真相

 五、结论

  Hadoop对S3文件系统的支持通过S3文件系统接口,实现的对S3文件系统的整合。有感兴趣的可以自行参照源码。

Hadoop文件系统支持释疑之S3的更多相关文章

  1. hadoop学习笔记:hadoop文件系统浅析

    1.什么是分布式文件系统? 管理网络中跨多台计算机存储的文件系统称为分布式文件系统. 2.为什么需要分布式文件系统了? 原因很简单,当数据集的大小超过一台独立物理计算机的存储能力时候,就有必要对它进行 ...

  2. hadoop文件系统浅析

    1.什么是分布式文件系统? 管理网络中跨多台计算机存储的文件系统称为分布式文件系统. 2.为什么需要分布式文件系统了? 原因很简单,当数据集的大小超过一台独立物理计算机的存储能力时候,就有必要对它进行 ...

  3. hadoop2.5.2学习及实践笔记(六)—— Hadoop文件系统及其java接口

    文件系统概述 org.apache.hadoop.fs.FileSystem是hadoop的抽象文件系统,为不同的数据访问提供了统一的接口,并提供了大量具体文件系统的实现,满足hadoop上各种数据访 ...

  4. [转帖]hadoop学习笔记:hadoop文件系统浅析

    hadoop学习笔记:hadoop文件系统浅析 https://www.cnblogs.com/sharpxiajun/archive/2013/06/15/3137765.html 1.什么是分布式 ...

  5. hadoop文件系统与I/O流

    本文地址:http://www.cnblogs.com/archimedes/p/hadoop-filesystem-io.html,转载请注明源地址. hadoop借鉴了Linux虚拟文件系统的概念 ...

  6. hadoop文件系统FileSystem详解 转自http://hi.baidu.com/270460591/item/0efacd8accb7a1d7ef083d05

    Hadoop文件系统 基本的文件系统命令操作, 通过hadoop fs -help可以获取所有的命令的详细帮助文件. Java抽象类org.apache.hadoop.fs.FileSystem定义了 ...

  7. 云计算分布式大数据Hadoop实战高手之路第八讲Hadoop图文训练课程:Hadoop文件系统的操作实战

    本讲通过实验的方式讲解Hadoop文件系统的操作. “云计算分布式大数据Hadoop实战高手之路”之完整发布目录 云计算分布式大数据实战技术Hadoop交流群:312494188,每天都会在群中发布云 ...

  8. Java API实现Hadoop文件系统增删改查

    Java API实现Hadoop文件系统增删改查 Hadoop文件系统可以通过shell命令hadoop fs -xx进行操作,同时也提供了Java编程接口 maven配置 <project x ...

  9. Hadoop学习笔记(3) Hadoop文件系统二

    1 查询文件系统 (1) 文件元数据:FileStatus,该类封装了文件系统中文件和目录的元数据,包括文件长度.块大小.备份.修改时间.所有者以及版权信息.FileSystem的getFileSta ...

随机推荐

  1. zabbix的搭建与入门

    一,Zabbix架构 zabbix 是一个基于 WEB 界面的提供分布式系统监视以及网络监视功能的企业级的开源解决方案.zabbix 能监视各种网络参数,保证服务器系统的安全运营:并提供灵活的通知机制 ...

  2. Python Django 前后端数据交互 之 后端向前端发送数据

    Django 前后台的数据传递 严正声明:作者:psklf出处: http://www.cnblogs.com/psklf/archive/2016/05/30/5542612.html欢迎转载,但未 ...

  3. 浏览器是怎样工作的:渲染引擎,HTML解析

    渲染引擎 渲染引擎的职责是……渲染,也就是把请求的内容显示到浏览器屏幕上. 默认情况下渲染引擎可以显示HTML,XML文档以及图片. 通过插件(浏览器扩展)它可以显示其它类型文档.比如使用PDF vi ...

  4. 对Repository模式误用的反思和纠正

    一直以来想自己做一套开发框架,在其基础上进行快速开发,自从接触微软的MVC框架和Entityframework以来,阅读了大量园子里的相关的技术文章,也进行了不少摸索和尝试,中间经历了多次大刀阔斧的重 ...

  5. L224 词汇题

    Elaborate 精心的 preparations were being made for the Prime Minister’s official visit to the four forei ...

  6. ztree树形菜单的增加删除修改和换图标

    首先需要注意一点,如果有研究过树形菜单,就会发现实现删除和修改功能特别简单,但是增加却有一点复杂.造成这个现象是ztree树形菜单的历史遗留问题.大概是之前的版本没有增加这个功能,后来的版本加上了这个 ...

  7. flash存储原理

    norflash 带有 SRAM接口,有足够的地址引脚来寻址,可以很容易地存取其内容每一字节:nandflash器件使用复杂的IO口串行的存取数据,读写操作采用512字节的块(也就是读/写某个字节,必 ...

  8. JMeter传递JSON数据

    步骤: 1.添加线程组.HTTP请求默认值.察看结果树等参考<JMeter实现bugfree登录接口测试>.这里不再赘述. 2.添加HTTP请求 在Body Data中写上输入的参数.参数 ...

  9. OK335xS U-boot 编译问题&无Linux shell 问题

    /************************************************************************** * OK335xS U-boot 编译问题&am ...

  10. PR5

    修改字幕的两种方式