HDFS Java API的使用举例
HDFS是Hadoop应用程序使用的主要分布式存储。HDFS集群主要由管理文件系统元数据的NameNode和存储实际数据的DataNodes组成,HDFS架构图描述了NameNode,DataNode和客户端之间的基本交互。客户端联系NameNode进行文件元数据或文件修改,并直接使用DataNodes执行实际的文件I / O。
Hadoop支持shell命令直接与HDFS进行交互,同时也支持JAVA API对HDFS的操作,例如,文件的创建、删除、上传、下载、重命名等。
HDFS中的文件操作主要涉及以下几个类:
Configuration:提供对配置参数的访问
FileSystem:文件系统对象
Path:在FileSystem中命名文件或目录。 路径字符串使用斜杠作为目录分隔符。 如果以斜线开始,路径字符串是绝对的
FSDataInputStream和FSDataOutputStream:这两个类分别是HDFS中的输入和输出流
下面是JAVA API对HDFS的操作过程:
1.项目结构

2.pom.xml配置
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion> <groupId>com.zjl</groupId>
<artifactId>myhadoop</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging> <name>myhadoop</name>
<url>http://maven.apache.org</url> <properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<hadoop.version>2.5.0</hadoop.version>
</properties> <dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
</dependency>
</dependencies>
</project>
3.拷贝hadoop安装目录下与HDFS相关的配置(core-site.xml,hdfs-site.xml,log4j.properties)到resource目录下
[hadoop@hadoop01 ~]$ cd /opt/modules/hadoop-2.6./etc/hadoop/
[hadoop@hadoop01 hadoop]$ cp core-site.xml hdfs-site.xml log4j.properties /opt/tools/workspace/myhadoop/src/main/resource/
[hadoop@hadoop01 hadoop]$
(1)core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
--> <!-- Put site-specific property overrides in this file. --> <configuration>
<property>
<name>fs.defaultFS</name>
<!-- 如果没有配置,默认会从本地文件系统读取数据 -->
<value>hdfs://hadoop01.zjl.com:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<!-- hadoop文件系统依赖的基础配置,很多路径都依赖它。如果hdfs-site.xml中不配置namenode和datanode的存放位置,默认就放在这个路径中 -->
<value>/opt/modules/hadoop-2.6.5/data/tmp</value>
</property>
</configuration>
(2)hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
--> <!-- Put site-specific property overrides in this file. --> <configuration>
<property>
<!-- default value 3 -->
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
(3)采用默认即可,打印hadoop的日志信息所需的配置文件。如果不配置,运行程序时eclipse控制台会提示警告
4.启动hadoop的hdfs的守护进程,并在hdfs文件系统中创建文件(文件共步骤5中java程序读取)
[hadoop@hadoop01 hadoop]$ cd /opt/modules/hadoop-2.6./
[hadoop@hadoop01 hadoop-2.6.]$ sbin/start-dfs.sh
// :: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [hadoop01.zjl.com]
hadoop01.zjl.com: starting namenode, logging to /opt/modules/hadoop-2.6./logs/hadoop-hadoop-namenode-hadoop01.zjl.com.out
hadoop01.zjl.com: starting datanode, logging to /opt/modules/hadoop-2.6./logs/hadoop-hadoop-datanode-hadoop01.zjl.com.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /opt/modules/hadoop-2.6./logs/hadoop-hadoop-secondarynamenode-hadoop01.zjl.com.out
// :: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop01 hadoop-2.6.]$ jps
NameNode
Jps
SecondaryNameNode
DataNode
org.eclipse.equinox.launcher_1.3.201.v20161025-.jar
[hadoop@hadoop01 hadoop-2.6.]$ bin/hdfs dfs -mkdir -p /user/hadoop/mapreduce/wordcount/input
// :: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop01 hadoop-2.6.]$ cat wcinput/wc.input
hadoop yarn
hadoop mapreduce
hadoop hdfs
yarn nodemanager
hadoop resourcemanager
[hadoop@hadoop01 hadoop-2.6.]$ bin/hdfs dfs -put wcinput/wc.input /user/hadoop/mapreduce/wordcount/input/
// :: WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop01 hadoop-2.6.]$
5.java代码
package com.zjl.myhadoop; import java.io.File;
import java.io.FileInputStream; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils; /**
*
* @author hadoop
*
*/
public class HdfsApp { /**
* get file system
* @return
* @throws Exception
*/
public static FileSystem getFileSystem() throws Exception{
//read configuration
//core-site.xml,core-default-site.xml,hdfs-site.xml,hdfs-default-site.xml
Configuration conf = new Configuration();
//create file system
FileSystem fileSystem = FileSystem.get(conf);
return fileSystem;
} /**
* read file from hdfs file system,output to the console
* @param fileName
* @throws Exception
*/
public static void read(String fileName) throws Exception {
//read path
Path readPath = new Path(fileName);
//get file system
FileSystem fileSystem = getFileSystem();
//open file
FSDataInputStream inStream = fileSystem.open(readPath);
try{
//read file
IOUtils.copyBytes(inStream, System.out, 4096, false);
}catch (Exception e) {
e.printStackTrace();
}finally {
//io close
IOUtils.closeStream(inStream);
}
} public static void upload(String inFileName, String outFileName) throws Exception { //file input stream,local file
FileInputStream inStream = new FileInputStream(new File(inFileName)); //get file system
FileSystem fileSystem = getFileSystem();
//write path,hdfs file system
Path writePath = new Path(outFileName); //output stream
FSDataOutputStream outStream = fileSystem.create(writePath);
try{
//write file
IOUtils.copyBytes(inStream, outStream, 4096, false);
}catch (Exception e) {
e.printStackTrace();
}finally {
//io close
IOUtils.closeStream(inStream);
IOUtils.closeStream(outStream);
}
}
public static void main( String[] args ) throws Exception {
//1.read file from hdfs to console
// String fileName = "/user/hadoop/mapreduce/wordcount/input/wc.input";
// read(fileName); //2.upload file from local file system to hdfs file system
//file input stream,local file
String inFileName = "/opt/modules/hadoop-2.6.5/wcinput/wc.input";
String outFileName = "/user/hadoop/put-wc.input";
upload(inFileName, outFileName);
}
}
6.调用方法 read(fileName)

7.进入hdfs文件系统查看/user/hadoop目录

8.调用upload(inFileName, outFileName),然后刷新步骤7的页面,文件上传成功

HDFS Java API的使用举例的更多相关文章
- HDFS Java API 常用操作
package com.luogankun.hadoop.hdfs.api; import java.io.BufferedInputStream; import java.io.File; impo ...
- HDFS shell操作及HDFS Java API编程
HDFS shell操作及HDFS Java API编程 1.熟悉Hadoop文件结构. 2.进行HDFS shell操作. 3.掌握通过Hadoop Java API对HDFS操作. 4.了解Had ...
- 【Hadoop】HA 场景下访问 HDFS JAVA API Client
客户端需要指定ns名称,节点配置,ConfiguredFailoverProxyProvider等信息. 代码示例: package cn.itacst.hadoop.hdfs; import jav ...
- hadoop hdfs java api操作
package com.duking.util; import java.io.IOException; import java.util.Date; import org.apache.hadoop ...
- HDFS Java API
HDFS Java API 搭建Hadoop客户端与Java访问HDFS集群
- HDFS Java API 的基本使用
一. 简介 二.API的使用 2.1 FileSystem 2.2 创建目录 2.3 创建指定权限的目录 2.4 创建文件,并写入内容 ...
- Hadoop 学习之路(七)—— HDFS Java API
一. 简介 想要使用HDFS API,需要导入依赖hadoop-client.如果是CDH版本的Hadoop,还需要额外指明其仓库地址: <?xml version="1.0" ...
- Hadoop 系列(七)—— HDFS Java API
一. 简介 想要使用 HDFS API,需要导入依赖 hadoop-client.如果是 CDH 版本的 Hadoop,还需要额外指明其仓库地址: <?xml version="1.0 ...
- HDFS JAVA API介绍
注:在工程pom.xml 所在目录,cmd中运行 mvn package ,打包可能会有两个jar,名字较长的是包含所有依赖的重量级的jar,可以在linux中使用 java -cp 命令来跑.名字较 ...
随机推荐
- jQuery之文档处理
jQuery 文档处理 1)内部插入 2)外部插入 3)包裹 4)替换 5)删除 6)复制 1.内部插入 append(content|fn) 向每个匹配的元素内部追加内容. 向所有段落中追加一些HT ...
- kotlin, 一种新的android平台一级开发语言
最近看到一则科技新闻, 大致内容是google将kotlin语言作为android应用开发的一级语言, 与java并驾齐驱, 这是一个开发界的大事件大新闻, 连google的亲儿子go语言也没有这种待 ...
- CSS3如何实现超出指定文本以省略号显示效果
不做前端很久了,今天从重构师那里了解到CSS3已经可以实现很多以往必须通过JS才能实现的效果,如渐变,阴影,自动截断文本展示省略号等等强大效果,而且这些功能日渐成熟,已经大量用于生产环境.H5真的日渐 ...
- JAVA栈帧
简介 Java栈是一块线程私有的内存空间.java堆和程序数据相关,java栈就是和线程执行密切相关的,线程的执行的基本行为是函数调用,每次函数调用的数据都是通过java栈来传递的. Java栈与数据 ...
- Spring Cloud 客服端负载均衡 Ribbon
一.简介 Spring Cloud Ribbon 是一个基于Http和TCP的客服端负载均衡工具,它是基于Netflix Ribbon实现的.它不像服务注册中心.配置中心.API网关那样独立部署, ...
- Lua学习(3)——控制结构
Lua提供了一组传统的,小巧的控制结构,包括用于条件执行的if,用于迭代的while.repeat和for.所有的控制结构都有一个现实的终止符号:if for while 都以end结尾,repeat ...
- Swift 入门之简单语法(五)
面向对象 目标 构造函数 构造函数的基本概念 构造函数的执行顺序 KVC 在构造函数中的使用及原理 便利构造函数 析构函数 区分 重载 和 重写 懒加载 只读属性(计算型属性) 设置模型数据(didS ...
- 基于html5 Canvas图表库 : ECharts
ECharts开源来自百度商业前端数据可视化团队,基于html5 Canvas,是一个纯Javascript图表库,提供直观,生动,可交互,可个性化定制的数据可视化图表.创新的拖拽重计算.数据视图.值 ...
- 搭建本地 Registry - 每天5分钟玩转 Docker 容器技术(20)
Docker Hub 虽然非常方便,但还是有些限制,比如: 需要 internet 连接,而且下载和上传速度慢. 上传到 Docker Hub 的镜像任何人都能够访问,虽然可以用私有 reposito ...
- 安装配置rsync服务端
rsync是类unix系统下的数据镜像备份工具——remote sync.一款快速增量备份工具 Remote Sync,远程同步 支持本地复制,或者与其他SSH.rsync主机同步. rsync使用方 ...