安装hdfs包

  pip install hdfs

查看hdfs目录

[root@hadoop hadoop]# hdfs dfs -ls -R /
drwxr-xr-x - root supergroup 0 2017-05-18 23:57 /Demo
-rw-r--r-- 1 root supergroup 3494 2017-05-18 23:57 /Demo/hadoop-env.sh
drwxr-xr-x - root supergroup 0 2017-05-18 19:01 /logs
-rw-r--r-- 1 root supergroup 2223 2017-05-18 19:01 /logs/anaconda-ks.cfg
-rw-r--r-- 1 root supergroup 57162 2017-05-18 18:32 /logs/install.log

  

创建hdfs连接实例

#!/usr/bin/env python
# -*- coding:utf-8 -*-
__Author__ = 'kongZhaGen' import hdfs
client = hdfs.Client("http://172.10.236.21:50070")

  

list:返回远程文件夹包含的文件或目录名称,如果路径不存在则抛出错误。

  hdfs_path:远程文件夹的路径

  status:同时返回每个文件的状态信息

def list(self, hdfs_path, status=False):
"""Return names of files contained in a remote folder. :param hdfs_path: Remote path to a directory. If `hdfs_path` doesn't exist
or points to a normal file, an :class:`HdfsError` will be raised.
:param status: Also return each file's corresponding FileStatus_. """

  示例:

print client.list("/",status=False)
结果:
[u'Demo', u'logs']

  

status:获取hdfs系统上文件或文件夹的状态信息

  hdfs_path:路径名称

  strict:

    False:如果远程路径不存在返回None

    True:如果远程路径不存在抛出异常

def status(self, hdfs_path, strict=True):
"""Get FileStatus_ for a file or folder on HDFS. :param hdfs_path: Remote path.
:param strict: If `False`, return `None` rather than raise an exception if
the path doesn't exist. .. _FileStatus: FS_
.. _FS: http://hadoop.apache.org/docs/r1.0.4/webhdfs.html#FileStatus """

  示例:

print client.status(hdfs_path="/Demoo",strict=False)
结果:
None

  

makedirs:在hdfs上创建目录,可实现递归创建目录

  hdfs_path:远程目录名称

  permission:为新创建的目录设置权限

 def makedirs(self, hdfs_path, permission=None):
"""Create a remote directory, recursively if necessary. :param hdfs_path: Remote path. Intermediate directories will be created
appropriately.
:param permission: Octal permission to set on the newly created directory.
These permissions will only be set on directories that do not already
exist. This function currently has no return value as WebHDFS doesn't return a
meaningful flag. """

  示例:

  如果想在远程客户端通过脚本给hdfs创建目录,需要修改hdfs-site.xml

  <property>
  <name>dfs.permissions</name>
  <value>false</value>
  </property>

  重启hdfs

stop-dfs.sh
start-dfs.sh

  递归创建目录

client.makedirs("/data/rar/tmp",permission=755)

  

rename:移动一个文件或文件夹

  hdfs_src_path:源路径

  hdfs_dst_path:目标路径,如果路径存在且是个目录,则源目录移动到此目录中。如果路径存在且是个文件,则会抛出异常

def rename(self, hdfs_src_path, hdfs_dst_path):
"""Move a file or folder. :param hdfs_src_path: Source path.
:param hdfs_dst_path: Destination path. If the path already exists and is
a directory, the source will be moved into it. If the path exists and is
a file, or if a parent destination directory is missing, this method will
raise an :class:`HdfsError`. """

  示例:

client.rename("/SRC_DATA","/dest_data")

  

delete:从hdfs删除一个文件或目录

  hdfs_path:hdfs系统上的路径

  recursive:如果目录非空,True:可递归删除.False:抛出异常。

def delete(self, hdfs_path, recursive=False):
"""Remove a file or directory from HDFS. :param hdfs_path: HDFS path.
:param recursive: Recursively delete files and directories. By default,
this method will raise an :class:`HdfsError` if trying to delete a
non-empty directory. This function returns `True` if the deletion was successful and `False` if
no file or directory previously existed at `hdfs_path`. """

  示例:

client.delete("/dest_data",recursive=True)

  

upload:上传文件或目录到hdfs文件系统,如果目标目录已经存在,则将文件或目录上传到此目录中,否则新建目录。

def upload(self, hdfs_path, local_path, overwrite=False, n_threads=1,
temp_dir=None, chunk_size=2 ** 16, progress=None, cleanup=True, **kwargs):
"""Upload a file or directory to HDFS. :param hdfs_path: Target HDFS path. If it already exists and is a
directory, files will be uploaded inside.
:param local_path: Local path to file or folder. If a folder, all the files
inside of it will be uploaded (note that this implies that folders empty
of files will not be created remotely).
:param overwrite: Overwrite any existing file or directory.
:param n_threads: Number of threads to use for parallelization. A value of
`0` (or negative) uses as many threads as there are files.
:param temp_dir: Directory under which the files will first be uploaded
when `overwrite=True` and the final remote path already exists. Once the
upload successfully completes, it will be swapped in.
:param chunk_size: Interval in bytes by which the files will be uploaded.
:param progress: Callback function to track progress, called every
`chunk_size` bytes. It will be passed two arguments, the path to the
file being uploaded and the number of bytes transferred so far. On
completion, it will be called once with `-1` as second argument.
:param cleanup: Delete any uploaded files if an error occurs during the
upload.
:param \*\*kwargs: Keyword arguments forwarded to :meth:`write`. On success, this method returns the remote upload path. """

  示例:

>>> import hdfs
>>> client=hdfs.Client("http://172.10.236.21:50070")
>>> client.upload("/logs","/root/training/jdk-7u75-linux-i586.tar.gz")
'/logs/jdk-7u75-linux-i586.tar.gz'
>>> client.list("/logs")
[u'anaconda-ks.cfg', u'install.log', u'jdk-7u75-linux-i586.tar.gz']

  

content:获取hdfs系统上文件或目录的概要信息

print client.content("/logs/install.log")
结果:
{u'spaceConsumed': 57162, u'quota': -1, u'spaceQuota': -1, u'length': 57162, u'directoryCount': 0, u'fileCount': 1}

  

write:在hdfs文件系统上创建文件,可以是字符串,生成器或文件对象

def write(self, hdfs_path, data=None, overwrite=False, permission=None,
blocksize=None, replication=None, buffersize=None, append=False,
encoding=None):
"""Create a file on HDFS. :param hdfs_path: Path where to create file. The necessary directories will
be created appropriately.
:param data: Contents of file to write. Can be a string, a generator or a
file object. The last two options will allow streaming upload (i.e.
without having to load the entire contents into memory). If `None`, this
method will return a file-like object and should be called using a `with`
block (see below for examples).
:param overwrite: Overwrite any existing file or directory.
:param permission: Octal permission to set on the newly created file.
Leading zeros may be omitted.
:param blocksize: Block size of the file.
:param replication: Number of replications of the file.
:param buffersize: Size of upload buffer.
:param append: Append to a file rather than create a new one.
:param encoding: Encoding used to serialize data written.
"""

  

hdfs基本操作-python接口的更多相关文章

  1. caffe的python接口学习(7):绘制loss和accuracy曲线

    使用python接口来运行caffe程序,主要的原因是python非常容易可视化.所以不推荐大家在命令行下面运行python程序.如果非要在命令行下面运行,还不如直接用 c++算了. 推荐使用jupy ...

  2. Windows+Caffe+VS2013+python接口配置过程

    前段时间在笔记本上配置了Caffe框架,中间过程曲曲折折,但由于懒没有将详细过程总结下来,这两天又在一台配置较高的台式机上配置了Caffe,配置时便非常后悔当初没有写到博客中去,现已配置好Caffe, ...

  3. 机器学习caffe环境搭建——redhat7.1和caffe的python接口编译

    相信看这篇文章的都知道caffe是干嘛的了,无非就是深度学习.神经网络.计算机视觉.人工智能这些,这个我就不多介绍了,下面说说我的安装过程即遇到的问题,当然还有解决方法. 说下我的环境:1>虚拟 ...

  4. Caffe + Ubuntu 14.04 64bit + 无CUDA(linux下安装caffe(无cuda)以及python接口)

    安装Caffe指导书 环境: Linux 64位 显卡为Intel + AMD,非英伟达显卡 无GPU 一. 安装准备工作 1. 以管理员身份登录 在左上角点击图标,搜索terminal(即终端),以 ...

  5. caffe的python接口学习(1):生成配置文件

    caffe是C++语言写的,可能很多人不太熟悉,因此想用更简单的脚本语言来实现.caffe提供matlab接口和python接口,这两种语言就非常简单,而且非常容易进行可视化,使得学习更加快速,理解更 ...

  6. Caffe学习系列(11):数据可视化环境(python接口)配置

    参考:http://www.cnblogs.com/denny402/p/5088399.html 这节配置python接口遇到了不少坑. 1.我是利用anaconda来配置python环境,在将ca ...

  7. caffe中python接口的使用

    下面是基于我自己的接口,我是用来分类一维数据的,可能不具通用性: (前提,你已经编译了caffe的python的接口) 添加 caffe塻块的搜索路径,当我们import caffe时,可以找到. 对 ...

  8. Caffe学习系列(13):数据可视化环境(python接口)配置

    caffe程序是由c++语言写的,本身是不带数据可视化功能的.只能借助其它的库或接口,如opencv, python或matlab.大部分人使用python接口来进行可视化,因为python出了个比较 ...

  9. Python接口自动化——soap协议传参的类型是ns0类型的要创建工厂方法纪要

    1:在Python接口自动化中,对于soap协议的xml的请求我们可以使用Suds Client来实现,其soap协议传参的类型基本上是有2种: 第一种是传参,不需要再创建啥, 第二种就是ns0类型的 ...

随机推荐

  1. TP中使用laravel那一套验证

    ---恢复内容开始--- 1,tp5项目下新建一个extends目录,同时在入口文件index.php配置 define('EXTEND_PATH', '../extend/'); 结果: 2,加载l ...

  2. Postman—测试脚本

    前言 对于Postman中的每个请求,我们都可以使用JavaScript语言来开发测试脚本.这也就好比单元测试.我们先看看Postman的相关界面: 编写测试脚本 Postman测试脚本本质上是在发送 ...

  3. PL/SOL csv格式导出查询结果时出现某些列的数据被四舍五入了的解决办法

    昨天用pl/sql从oracle数据库捞取数据时,发现导出的csv格式中某些列的数据被进行了四舍五入处理了,当然这些列都是纯数字的,百思不得其解,后来上网才,才得知了原因. 这并不是导出的CSV文件数 ...

  4. 记录自己使用到的git命令行代码与git使用流程

    1.安装创建版本库 新建一个文件夹,用命令行实现: $ cd /d             //进入d盘 $ mkdir gitproject      //新建gitproject文件夹 $ cd ...

  5. 一个在linux环境执行io操作的bug

    今天项目有了一个奇葩的要求...是什么呢 后台上传了视频后,解析其中的时长,和预览图,并拼接在一起,然而,之东西并不是太麻烦,很快写好了,在本地测试后也没有问题,嗯,发布到测试环境后,一个jar包报错 ...

  6. How系列-公网如何ssh到内网的Linux系统中?

    起因 最近碰到一件事:B同学在他电脑的Ubuntu虚拟机中学习搭建服务器碰到了问题,要我帮他看下.我总不能一个QQ远程桌面连过去,那样操作会卡到崩溃.ssh过去是最好的方法,不过他的电脑跟我不在一个局 ...

  7. C#基础知识回顾--BackgroundWorker介绍

    简介 BackgroundWorker是.net里用来执行多线程任务的控件,它允许编程者在一个单独的线程上执行一些操作.耗时的操作(如下载和数据库事务)在长时间运行时可能会导致用户界面 (UI) 始终 ...

  8. 【转】Stack Overflow研发副总裁:.NET技术并不差,合适自己就好

    摘要:在QCon纽约大会上, Stack Exchange的工程部副总裁David Fullerton深入解析了如何使用C#.MS SQL等技术支撑Stack Overflow网站的单块应用架构,这个 ...

  9. Oracle表闪回功能

    1.启用表闪回首先要在表上支持行移动(在数据字典中设置标识来标识该操作可能会改变行ID,即同一条数据闪回成功后主键都一样,但行ID其实已经发生变化了) SQL> alter table base ...

  10. 【JavaScript 从零开始】 原始值和对象引用、类型转换

    JavaScript 中的原始值(undefined.null . 布尔值.数值和字符串)于对象(包括数组和函数)有着根本区别. 原始值是不可更改的:任何方法都无法改变(或“突变”)一个原始值. 对于 ...