Hadoop文件系统支持释疑之S3
一、引言
Hadoop版本提供了对多种文件系统的支持,但是这些文件系统是以何种方式实现的,其实现原理是什么以前并没有深究过。今天正好有人咨询我这个问题:Hadoop对S3的支持原理是什么?特此总结一下。Hadoop支持的文件系统包括:
文件系统 URI前缀 hadoop的具体实现类
Local file fs.LocalFileSystem
HDFS hdfs hdfs.DistributedFileSystem
HFTP hftp hdfs.HftpFileSystem
HSFTP hsftp hdfs.HsftpFileSystem
HAR har fs.HarFileSystem
KFS kfs fs.kfs.KosmosFileSystem
FTP ftp fs.ftp.FTPFileSystem
S3 (native) s3n fs.s3native.NativeS3FileSystem
S3 (blockbased) s3 fs.s3.S3FileSystem
二、争议观点
1.Hadoop对S3文件系统的支持是通过自己实现S3文件系统来做的吗?
2.Hadoop对S3文件系统的支持是通过S3文件系统接口,实现的对S3文件系统的整合?
三、源码解析
package org.apache.hadoop.fs.s3; import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.Closeable;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;
import java.util.HashMap;
import java.util.Map;
import java.util.Set;
import java.util.TreeSet; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.s3.INode.FileType;
import org.jets3t.service.S3Service;
import org.jets3t.service.S3ServiceException;
import org.jets3t.service.impl.rest.httpclient.RestS3Service;
import org.jets3t.service.model.S3Bucket;
import org.jets3t.service.model.S3Object;
import org.jets3t.service.security.AWSCredentials; class Jets3tFileSystemStore implements FileSystemStore { private static final String FILE_SYSTEM_NAME = "fs";
private static final String FILE_SYSTEM_VALUE = "Hadoop"; private static final String FILE_SYSTEM_TYPE_NAME = "fs-type";
private static final String FILE_SYSTEM_TYPE_VALUE = "block"; private static final String FILE_SYSTEM_VERSION_NAME = "fs-version";
private static final String FILE_SYSTEM_VERSION_VALUE = "1"; private static final Map<String, String> METADATA =
new HashMap<String, String>(); static {
METADATA.put(FILE_SYSTEM_NAME, FILE_SYSTEM_VALUE);
METADATA.put(FILE_SYSTEM_TYPE_NAME, FILE_SYSTEM_TYPE_VALUE);
METADATA.put(FILE_SYSTEM_VERSION_NAME, FILE_SYSTEM_VERSION_VALUE);
} private static final String PATH_DELIMITER = Path.SEPARATOR;
private static final String BLOCK_PREFIX = "block_"; private Configuration conf; private S3Service s3Service; private S3Bucket bucket; private int bufferSize; public void initialize(URI uri, Configuration conf) throws IOException { this.conf = conf; S3Credentials s3Credentials = new S3Credentials();
s3Credentials.initialize(uri, conf);
try {
AWSCredentials awsCredentials =
new AWSCredentials(s3Credentials.getAccessKey(),
s3Credentials.getSecretAccessKey());
this.s3Service = new RestS3Service(awsCredentials);
} catch (S3ServiceException e) {
if (e.getCause() instanceof IOException) {
throw (IOException) e.getCause();
}
throw new S3Exception(e);
}
bucket = new S3Bucket(uri.getHost()); this.bufferSize = conf.getInt("io.file.buffer.size", 4096);
} public String getVersion() throws IOException {
return FILE_SYSTEM_VERSION_VALUE;
} private void delete(String key) throws IOException {
try {
s3Service.deleteObject(bucket, key);
} catch (S3ServiceException e) {
if (e.getCause() instanceof IOException) {
throw (IOException) e.getCause();
}
throw new S3Exception(e);
}
} public void deleteINode(Path path) throws IOException {
delete(pathToKey(path));
} public void deleteBlock(Block block) throws IOException {
delete(blockToKey(block));
} public boolean inodeExists(Path path) throws IOException {
InputStream in = get(pathToKey(path), true);
if (in == null) {
return false;
}
in.close();
return true;
} public boolean blockExists(long blockId) throws IOException {
InputStream in = get(blockToKey(blockId), false);
if (in == null) {
return false;
}
in.close();
return true;
} private InputStream get(String key, boolean checkMetadata)
throws IOException { try {
S3Object object = s3Service.getObject(bucket, key);
if (checkMetadata) {
checkMetadata(object);
}
return object.getDataInputStream();
} catch (S3ServiceException e) {
if ("NoSuchKey".equals(e.getS3ErrorCode())) {
return null;
}
if (e.getCause() instanceof IOException) {
throw (IOException) e.getCause();
}
throw new S3Exception(e);
}
} private InputStream get(String key, long byteRangeStart) throws IOException {
try {
S3Object object = s3Service.getObject(bucket, key, null, null, null,
null, byteRangeStart, null);
return object.getDataInputStream();
} catch (S3ServiceException e) {
if ("NoSuchKey".equals(e.getS3ErrorCode())) {
return null;
}
if (e.getCause() instanceof IOException) {
throw (IOException) e.getCause();
}
throw new S3Exception(e);
}
} private void checkMetadata(S3Object object) throws S3FileSystemException,
S3ServiceException { String name = (String) object.getMetadata(FILE_SYSTEM_NAME);
if (!FILE_SYSTEM_VALUE.equals(name)) {
throw new S3FileSystemException("Not a Hadoop S3 file.");
}
String type = (String) object.getMetadata(FILE_SYSTEM_TYPE_NAME);
if (!FILE_SYSTEM_TYPE_VALUE.equals(type)) {
throw new S3FileSystemException("Not a block file.");
}
String dataVersion = (String) object.getMetadata(FILE_SYSTEM_VERSION_NAME);
if (!FILE_SYSTEM_VERSION_VALUE.equals(dataVersion)) {
throw new VersionMismatchException(FILE_SYSTEM_VERSION_VALUE,
dataVersion);
}
} public INode retrieveINode(Path path) throws IOException {
return INode.deserialize(get(pathToKey(path), true));
} public File retrieveBlock(Block block, long byteRangeStart)
throws IOException {
File fileBlock = null;
InputStream in = null;
OutputStream out = null;
try {
fileBlock = newBackupFile();
in = get(blockToKey(block), byteRangeStart);
out = new BufferedOutputStream(new FileOutputStream(fileBlock));
byte[] buf = new byte[bufferSize];
int numRead;
while ((numRead = in.read(buf)) >= 0) {
out.write(buf, 0, numRead);
}
return fileBlock;
} catch (IOException e) {
// close output stream to file then delete file
closeQuietly(out);
out = null; // to prevent a second close
if (fileBlock != null) {
fileBlock.delete();
}
throw e;
} finally {
closeQuietly(out);
closeQuietly(in);
}
} private File newBackupFile() throws IOException {
File dir = new File(conf.get("fs.s3.buffer.dir"));
if (!dir.exists() && !dir.mkdirs()) {
throw new IOException("Cannot create S3 buffer directory: " + dir);
}
File result = File.createTempFile("input-", ".tmp", dir);
result.deleteOnExit();
return result;
} public Set<Path> listSubPaths(Path path) throws IOException {
try {
String prefix = pathToKey(path);
if (!prefix.endsWith(PATH_DELIMITER)) {
prefix += PATH_DELIMITER;
}
S3Object[] objects = s3Service.listObjects(bucket, prefix, PATH_DELIMITER);
Set<Path> prefixes = new TreeSet<Path>();
for (int i = 0; i < objects.length; i++) {
prefixes.add(keyToPath(objects[i].getKey()));
}
prefixes.remove(path);
return prefixes;
} catch (S3ServiceException e) {
if (e.getCause() instanceof IOException) {
throw (IOException) e.getCause();
}
throw new S3Exception(e);
}
} public Set<Path> listDeepSubPaths(Path path) throws IOException {
try {
String prefix = pathToKey(path);
if (!prefix.endsWith(PATH_DELIMITER)) {
prefix += PATH_DELIMITER;
}
S3Object[] objects = s3Service.listObjects(bucket, prefix, null);
Set<Path> prefixes = new TreeSet<Path>();
for (int i = 0; i < objects.length; i++) {
prefixes.add(keyToPath(objects[i].getKey()));
}
prefixes.remove(path);
return prefixes;
} catch (S3ServiceException e) {
if (e.getCause() instanceof IOException) {
throw (IOException) e.getCause();
}
throw new S3Exception(e);
}
} private void put(String key, InputStream in, long length, boolean storeMetadata)
throws IOException { try {
S3Object object = new S3Object(key);
object.setDataInputStream(in);
object.setContentType("binary/octet-stream");
object.setContentLength(length);
if (storeMetadata) {
object.addAllMetadata(METADATA);
}
s3Service.putObject(bucket, object);
} catch (S3ServiceException e) {
if (e.getCause() instanceof IOException) {
throw (IOException) e.getCause();
}
throw new S3Exception(e);
}
} public void storeINode(Path path, INode inode) throws IOException {
put(pathToKey(path), inode.serialize(), inode.getSerializedLength(), true);
} public void storeBlock(Block block, File file) throws IOException {
BufferedInputStream in = null;
try {
in = new BufferedInputStream(new FileInputStream(file));
put(blockToKey(block), in, block.getLength(), false);
} finally {
closeQuietly(in);
}
} private void closeQuietly(Closeable closeable) {
if (closeable != null) {
try {
closeable.close();
} catch (IOException e) {
// ignore
}
}
} private String pathToKey(Path path) {
if (!path.isAbsolute()) {
throw new IllegalArgumentException("Path must be absolute: " + path);
}
return path.toUri().getPath();
} private Path keyToPath(String key) {
return new Path(key);
} private String blockToKey(long blockId) {
return BLOCK_PREFIX + blockId;
} private String blockToKey(Block block) {
return blockToKey(block.getId());
} public void purge() throws IOException {
try {
S3Object[] objects = s3Service.listObjects(bucket);
for (int i = 0; i < objects.length; i++) {
s3Service.deleteObject(bucket, objects[i].getKey());
}
} catch (S3ServiceException e) {
if (e.getCause() instanceof IOException) {
throw (IOException) e.getCause();
}
throw new S3Exception(e);
}
} public void dump() throws IOException {
StringBuilder sb = new StringBuilder("S3 Filesystem, ");
sb.append(bucket.getName()).append("\n");
try {
S3Object[] objects = s3Service.listObjects(bucket, PATH_DELIMITER, null);
for (int i = 0; i < objects.length; i++) {
Path path = keyToPath(objects[i].getKey());
sb.append(path).append("\n");
INode m = retrieveINode(path);
sb.append("\t").append(m.getFileType()).append("\n");
if (m.getFileType() == FileType.DIRECTORY) {
continue;
}
for (int j = 0; j < m.getBlocks().length; j++) {
sb.append("\t").append(m.getBlocks()[j]).append("\n");
}
}
} catch (S3ServiceException e) {
if (e.getCause() instanceof IOException) {
throw (IOException) e.getCause();
}
throw new S3Exception(e);
}
System.out.println(sb);
} }
四、有图有真相

五、结论
Hadoop对S3文件系统的支持通过S3文件系统接口,实现的对S3文件系统的整合。有感兴趣的可以自行参照源码。
Hadoop文件系统支持释疑之S3的更多相关文章
- hadoop学习笔记:hadoop文件系统浅析
1.什么是分布式文件系统? 管理网络中跨多台计算机存储的文件系统称为分布式文件系统. 2.为什么需要分布式文件系统了? 原因很简单,当数据集的大小超过一台独立物理计算机的存储能力时候,就有必要对它进行 ...
- hadoop文件系统浅析
1.什么是分布式文件系统? 管理网络中跨多台计算机存储的文件系统称为分布式文件系统. 2.为什么需要分布式文件系统了? 原因很简单,当数据集的大小超过一台独立物理计算机的存储能力时候,就有必要对它进行 ...
- hadoop2.5.2学习及实践笔记(六)—— Hadoop文件系统及其java接口
文件系统概述 org.apache.hadoop.fs.FileSystem是hadoop的抽象文件系统,为不同的数据访问提供了统一的接口,并提供了大量具体文件系统的实现,满足hadoop上各种数据访 ...
- [转帖]hadoop学习笔记:hadoop文件系统浅析
hadoop学习笔记:hadoop文件系统浅析 https://www.cnblogs.com/sharpxiajun/archive/2013/06/15/3137765.html 1.什么是分布式 ...
- hadoop文件系统与I/O流
本文地址:http://www.cnblogs.com/archimedes/p/hadoop-filesystem-io.html,转载请注明源地址. hadoop借鉴了Linux虚拟文件系统的概念 ...
- hadoop文件系统FileSystem详解 转自http://hi.baidu.com/270460591/item/0efacd8accb7a1d7ef083d05
Hadoop文件系统 基本的文件系统命令操作, 通过hadoop fs -help可以获取所有的命令的详细帮助文件. Java抽象类org.apache.hadoop.fs.FileSystem定义了 ...
- 云计算分布式大数据Hadoop实战高手之路第八讲Hadoop图文训练课程:Hadoop文件系统的操作实战
本讲通过实验的方式讲解Hadoop文件系统的操作. “云计算分布式大数据Hadoop实战高手之路”之完整发布目录 云计算分布式大数据实战技术Hadoop交流群:312494188,每天都会在群中发布云 ...
- Java API实现Hadoop文件系统增删改查
Java API实现Hadoop文件系统增删改查 Hadoop文件系统可以通过shell命令hadoop fs -xx进行操作,同时也提供了Java编程接口 maven配置 <project x ...
- Hadoop学习笔记(3) Hadoop文件系统二
1 查询文件系统 (1) 文件元数据:FileStatus,该类封装了文件系统中文件和目录的元数据,包括文件长度.块大小.备份.修改时间.所有者以及版权信息.FileSystem的getFileSta ...
随机推荐
- 微信H5支付 C#
首先奉上 万能的 官方文档 应用场景(废话) H5支付是指商户在微信客户端外的移动端网页展示商品或服务,用户在前述页面确认使用微信支付时,商户发起本服务呼起微信客户端进行支付. ...
- main方法原来只要放在public static类中就能跑,涨知识了
接口中可以装在嵌套类对象. public interface ClassInterface { void howdy(); class Test implements ClassInterface { ...
- Java学习图谱
学习图谱 一:常见模式与工具 二:分布式架构 三:微服务架构 四:底层知识 五:性能优化 六:工程化与工具
- 攻防:文件上传漏洞的攻击与防御,转自H3C
WebShell就是以asp.php.jsp或者cgi等网页文件形式存在的一种命令执行环境,也可以将其称做为一种网页后门.黑客在入侵了一个网站后,通常会将这些asp或php后门文件与网站服务器WEB目 ...
- ThinkPad 复刻计划 ThinkPad Time Machine
在快节奏的高科技市场中,针对性的进化 ThinkPad 的设计几乎是闻所未闻的.在汽车行业,保时捷无疑干的不错,但我不认为有任何其他的电脑公司可以顶住压力,坚持自己的初心这么久.没有任何一个竞争对手可 ...
- iOS-UIImage变为NSData并进行压缩
<iOS>UIImage变为NSData并进行压缩 //sdk中提供了方法可以直接调用 UIImage *img = [UIImage imageNamed:@"some.p ...
- 用tornado实现一个简单的websocket样例
想用SPRING MVC,NODE.JS EXPRESS,TORNADO实现同一个功能,开阔一下视野. 先来TORNADO的吧.. 客户端代码都差不多,主要是服务端代码. TORNADO的说法: ht ...
- Python 字典的遍历
dic = {"赵四":"刘晓光", "刘能":"王晓利", "王木生":"范伟" ...
- [LeetCode&Python] Problem 806. Number of Lines To Write String
We are to write the letters of a given string S, from left to right into lines. Each line has maximu ...
- Word2007:如何在竖版(纵向)页面中间插入横版(横向)页面
通常情况下,我们在word排版过程中使用一种页面版式(横版/竖版)即可.但在某些特殊情况下,我们可能会需要在竖版页面中间插入一页或多页横版页面,抑或在横版页面中间插入竖版页面.那么,如何针对这 ...