python 小程序大文件的拆分合并

1. 将大文件拆分为小文件

I 通过二进制的方式将大文件读取出来，将其拆分存，以不同的文件方式存放在一个目录下面

II 提供两种操作方式交互式和命令行模式

#! usr/bin/python

# -*- coding:utf-8 -*-

import sys, os

megebytes = 1024 * 1000

chunksize = int(1.4 * megebytes)

def clear_dir(target_dir):

    """

    清空目录

    :param targetdir:需要清空的目录

    :return: None

    """

    for fname in os.listdir(target_dir):

        path = os.path.join(target_dir, fname)

        if os.path.isfile(path):

            os.remove(path)

        else:

            os.rmdir(path)

def split(fromfile, todir, chunksize=chunksize):

    if not os.path.exists(todir):

        os.mkdir(todir)

    else:

        clear_dir(todir)

    partnum = 0

    with open(fromfile, "rb") as input:

        while True:

            tmpdata = input.read(chunksize)

            if not tmpdata:break

            partnum += 1

            filename = os.path.join(todir, ('part{0:04d}'.format(partnum)))

            with open(filename, 'wb') as fileobj:

                fileobj.write(tmpdata)

    assert partnum <= 9999

    return partnum

def main():

    global chunksize

    if len(sys.argv) == 2 and sys.argv[1] == '-help':

        print('Use:split_file.py [file-to-split target-dir [chunksize]]')

    else:

        if len(sys.argv) < 3:

            interactive = True

            fromfile = input('enter the file to split:')

            todir = input('enter the dir to hold the split info:')

        else:

            interactive = False

            fromfile, todir = sys.argv[1:3]

            if len(sys.argv) == 4:chunksize = int(sys.argv[3])

        absfrom, absto = map(os.path.abspath, [fromfile, todir])

        print('spliting from {0} to {1} by {2}'.format(absfrom, absto, chunksize))

    try:

        parts = split(absfrom, absto, chunksize)

    except:

        print('error during split')

    else:

        print('split finished:{0} parts are in {1}'.format(parts, absto))

    if interactive: print('input any key')

if __name__ == '__main__':

    #clear_dir("../testdir")

    #split("../testdir1/test.pdf", "../testdir")

    main()

2 将拆分之后的文件重新合并

I 将拆分后的文件以二进制的方式读取，再以二进制的方式保存

II 提供两种操作方式交互式和命令行模式

import sys

import os

readsize = 1024

def join(fromdir, tofile):

    """

    将使用split_file分开的文件重新合并为原文件

    :param fromdir: 分开的小文件

    :param tofile: 原文件

    :return:

    """

    partfiles = os.listdir(fromdir)

    with open(tofile, 'wb') as output:

        for eachpart in partfiles:

            filepath = os.path.join(fromdir, eachpart)

            with open(filepath, 'rb') as fileobj:

                while True:

                    bytes = fileobj.read(readsize)

                    if not bytes:break

                    output.write(bytes)

if __name__ == '__main__':

    if len(sys.argv) == 2 and sys.argv[1] == '-help':

        print('using join [from dir nme] [to file name]')

    else:

        if len(sys.argv) != 3:

            fromdir = input('Enter the from dir')

            tofile = input('Enter the to file')

        else:

            fromdir = sys.argv[1]

            tofile = sys.argv[2]

    fromdir, tofile = map(os.path.abspath, [fromdir, tofile])

    print('joining')

    try:

        join(fromdir, tofile)

    except:

        print("Error during joining file")

    else:

        print("joining completed")

python 小程序大文件的拆分合并的更多相关文章

Python逐块读取大文件行数的代码 - 为程序员服务
Python逐块读取大文件行数的代码 - 为程序员服务 python数文件行数最简单的方法是使用enumerate方法,但是如果文件很大的话,这个方法就有点慢了,我们可以逐块的读取文件的内容,然后按块 ...
Python 小程序，对文件操作及其它
以下是自己写的几个对文件操作的小程序,里面涉及到文件操作,列表(集合,字典)的运用等.比方说,从文件里读取一行数据.分别存放于列表中,再对列表进行操作.如去掉里面的反复项.排序等操作. 常见对文件里行 ...
怎么样通过编写Python小程序来统计测试脚本的关键字
怎么样通过编写Python小程序来统计测试脚本的关键字通常自动化测试项目到了一定的程序,编写的测试代码自然就会很多,如果很早已经编写的测试脚本现在某些基础函数.业务函数需要修改,那么势必要找出那些引 ...
Day1：第一个python小程序
Day1:第一个python小程序与开发工具Pycharm 一.Hello World C:\Users\wenxh>python Python 3.6.2 (v3.6.2:5fd33b5, J ...
python处理分隔大文件
4个.sql格式的文件,2G大小,直接插入mysql数据中,文件太大了,导入不进去. 太大的文件用python处理也很麻烦,处理不了,只能先分隔成小文件处理. 文件中数据格式:其中values里面的数 ...
一个有意思的Python小程序（全国省会名称随机出题）
本文为作者原创,转载请注明出处(http://www.cnblogs.com/mar-q/)by 负赑屃最近比较迷Python,仿照<Python编程快速上手>8.5写了一个随机出卷的小 ...
小程序-图片/文件本地缓存，减少CDN流量消耗
写在前面小程序网络图片读取: 在读取OSS图片CDN分发时流量大量消耗,导致资金费用增加. 网络图片比较大时,图片加载缓慢. 为了尽量减少上面两个问题,所以对已读的图片进行缓存处理,减少多次访问不必 ...
less文件编译成微信小程序wxss文件
2016年9月21日,微信小程序正式开启内测.在微信生态下,触手可及.用完即走的微信小程序引起广泛关注,刷爆朋友圈子.在这样的火爆氛围中,作为一个前端开发者的我,也悄悄地去尝鲜.在做demo小示例的过 ...
python 如何读取大文件
一般的读取文件的方法: with open(file_path, "r") as f: print f.read() 或者 with open(file_path,"r& ...

随机推荐

JavaScript 智能社拖拽
<!DOCTYPE HTML> <html> <head> <meta charset="utf-8"> <title> ...
jq 根据值的正负变色
效果图这样: 意思就是根据最后的百分值变色,值为负变绿色,值为正变红色. 所以只要取到那个标签里的值了,就能根据正负的判断决定颜色. 我的html部分这样: /*不过他们都说我的dom结构不太合理,同 ...
label和input里面文字不对齐的解决方法！
测试了集中方法,发现不行.只能用专署标签解决这个问题. <fieldset> <legend>神光咨询后台管理登录</legend> <br /& ...
Python-pandas
Python-pandas Python 中处理时间序列的主要工具是 pandas 库. 1.pannas 基础 1.1使用 DataFrame 类的第一步 #!/etc/bin/python #co ...
mySQL 中主键值自动增加
转 http://stevenjohn.iteye.com/blog/976397 MySql 主键自动增长博客分类: DataBase MySQLSQL 创建数据库,创建表. mysql> ...
CentOS 7 Git安装
Git安装 yum -y install git 安装后,在srv目录下建立Git的目录. 初始化一个git空仓库 git init --bare project.git 增加用于访问git仓库的用户 ...
sql server 2008 R2配置管理
安装vs2013后,sql server 2008R2配置管理提示“远程过程调用失败” 这是因为vs2013自带的Microsoft SQL Server 2012Local DB与之冲突. 通过升级 ...
SpringMVC
使用注解去完成整个项目安装spring的一个插件,则相关的提示就会出来
Python实现简单的Web（续）
写的有点乱..希望大神指教~~Python的缩进可真的将我缠了好久,想起我们班大神说缩进也是代码啊..修改之前的代码来显示请求的信息,同时重新整理一下代码: class RequestHandler( ...
LeetCode 162 Find Peak Element
Problem: A peak element is an element that is greater than its neighbors. Given an input array where ...

python 小程序大文件的拆分合并

python 小程序大文件的拆分合并的更多相关文章

随机推荐

热门专题