hdfs操作手册

hdfscli 命令行

# hdfscli --help

HdfsCLI: a command line interface for HDFS.

Usage:

  hdfscli [interactive] [-a ALIAS] [-v...]

  hdfscli download [-fsa ALIAS] [-v...] [-t THREADS] HDFS_PATH LOCAL_PATH

  hdfscli upload [-sa ALIAS] [-v...] [-A | -f] [-t THREADS] LOCAL_PATH HDFS_PATH

  hdfscli -L | -V | -h

Commands:

  download                      Download a file or folder from HDFS. If a

                                single file is downloaded, - can be

                                specified as LOCAL_PATH to stream it to

                                standard out.

  interactive                   Start the client and expose it via the python

                                interpreter (using iPython if available).

  upload                        Upload a file or folder to HDFS. - can be

                                specified as LOCAL_PATH to read from standard

                                in.

Arguments:

  HDFS_PATH                     Remote HDFS path.

  LOCAL_PATH                    Path to local file or directory.

Options:

  -A --append                   Append data to an existing file. Only supported

                                if uploading a single file or from standard in.

  -L --log                      Show path to current log file and exit.

  -V --version                  Show version and exit.

  -a ALIAS --alias=ALIAS        Alias of namenode to connect to.

  -f --force                    Allow overwriting any existing files.

  -s --silent                   Don't display progress status.

  -t THREADS --threads=THREADS  Number of threads to use for parallelization.

                                0 allocates a thread per file. [default: 0]

  -v --verbose                  Enable log output. Can be specified up to three

                                times (increasing verbosity each time).

Examples:

  hdfscli -a prod /user/foo

  hdfscli download features.avro dat/

  hdfscli download logs/1987-03-23 - >>logs

  hdfscli upload -f - data/weights.tsv <weights.tsv

HdfsCLI exits with return status 1 if an error occurred and 0 otherwise.

要使用hdfscli,首先需要设置hdfscli的默认配置文件

# cat ~/.hdfscli.cfg

[global]

default.alias = dev

[dev.alias]

url = http://hadoop:50070

user = root

　　python可用的客户端类：

　　　　InsecureClient(default)

　　　　TokenClient

上传或下载文件

使用hdfscli上传文件或文件夹（将hadoop文件夹上传到/hdfs）

　　# hdfscli upload --alias=dev -f /hadoop-2.4.1/etc/hadoop/ /hdfs

使用hdfscli下载/logs目录到操作系统的/root/test目录下

　　# hdfscli download /logs /root/test/

hdfscli 交互模式

[root@hadoop ~]# hdfscli --alias=dev

Welcome to the interactive HDFS python shell.

The HDFS client is available as `CLIENT`.

>>> CLIENT.list("/")

[u'Demo', u'hdfs', u'logs', u'logss']

>>> CLIENT.status("/Demo")

{u'group': u'supergroup', u'permission': u'755', u'blockSize': 0,

 u'accessTime': 0, u'pathSuffix': u'', u'modificationTime': 1495123035501L,

 u'replication': 0, u'length': 0, u'childrenNum': 1, u'owner': u'root',

 u'type': u'DIRECTORY', u'fileId': 16389}

>>> CLIENT.delete("logs/install.log")

False

>>> CLIENT.delete("/logs/install.log")

True

与python接口的绑定

　　初始化客户端

　　1、导入client类，然后调用它的构造函数

>>> from hdfs import InsecureClient

>>> client = InsecureClient("http://172.10.236.21:50070",user='ann')

>>> client.list("/")

[u'Demo', u'hdfs', u'logs', u'logss']

　　2、导入config类，加载一个已存在的配置文件并且从已存在的alias创建一个client,配置文件默认的读取文件为~/.hdfs_config.cfg

>>> from hdfs import Config

>>> client=Config().get_client("dev")

>>> client.list("/")

[u'Demo', u'hdfs', u'logs', u'logss']

　　读文件

　　read()方法可从hdfs系统读取一个文件，但是它必须放在with块中，以确保每次都能正确关闭连接

>>> with client.read("/logs/yarn-env.sh"，encoding="utf-8") as reader:

...   features=reader.read()

...

>>> print features

　　chunk_size参数将返回一个生成器，它使文件的内容变成流数据

>>> with client.read("/logs/yarn-env.sh",chunk_size=1024) as reader:

...   for chunk in reader:

...      print chunk

...

　　delimiter参数同样返回一个生成器，文件内容是被指定符号分隔的

>>> with client.read("/logs/yarn-env.sh", encoding="utf-8", delimiter="\n") as reader:

...   for line in reader:

...     time.sleep(1)

...     print line

　　写文件

write方法用于写文件到hdfs(将本地文件kong.txt写入hdfs的/logs/kongtest.txt文件中)

>>> with open("/root/test/kong.txt") as reader, client.write("/logs/kongtest.txt") as writer:

...   for line in reader:

...     if line.startswith("-"):

...       writer.write(line)