I have a large file ( ~4G) to process in Python. I wonder whether it is OK to "read" such a large file. So I tried in the following several ways:

The original large file to deal with is not "./CentOS-6.5-i386.iso", I just take this file as an example here.

1:  Normal Method. (ignore try/except/finally)

def main():
f = open(r"./CentOS-6.5-i386.iso", "rb")
for line in f:
print(line, end="")
f.close() if __name__ == "__main__":
main()

2: "With" Method.

def main():
with open(r"./CentOS-6.5-i386.iso", "rb") as f:
for line in f:
print(line, end="") if __name__ == "__main__":
main()

3:  "readlines" Method. [Bad Idea]

#NO. readlines() is really bad for large files.
#Memory Error.
def main():
for line in open(r"./CentOS-6.5-i386.iso", "rb").readlines():
print(line, end="") if __name__ == "__main__":
main()

4: "fileinput" Method.

import fileinput

def main():
for line in fileinput.input(files=r"./CentOS-6.5-i386.iso", mode="rb"):
print(line, end="") if __name__ == "__main__":
main()

5: "Generator" Method.

def readFile():
with open(r"./CentOS-6.5-i386.iso", "rb") as f:
for line in f:
yield line def main():
for line in readFile():
print(line, end="") if __name__ == "__main__":
main()

The methods above, all work well for small files, but not always for large files(readlines Method). The readlines() function loads the entire file into memory as it runs.

When I run the readlines Method, I got the following error message:

When using the readlines Method, the Percentage of Used CPU and Used Memory rises rapidly(in the following figure). And when the percentage of Used Memory reaches over 50%, I got the "MemoryError" in Python.

The other methods (Normal Method, With Method, fileinput Method, Generator Method) works well for large files. And when using these methods, the workload for CPU and memory which is shown in the following figure does not get a distinct rise.

By the way, I recommend the generator method, because it shows clearly that you have taken the file size into account.

Reference:

How to read large file, line by line in python

Read Large Files in Python的更多相关文章

  1. Huge CSV and XML Files in Python, Error: field larger than field limit (131072)

    Huge CSV and XML Files in Python January 22, 2009. Filed under python twitter facebook pinterest lin ...

  2. Working with Excel Files in Python

    Working with Excel Files in Python from: http://www.python-excel.org/ This site contains pointers to ...

  3. GitHub 上传文件过大报错:remote: error: GH001: Large files detected.

    1.查看哪个文件过大了 报错信息: remote: Resolving deltas: 100% (24/24), completed with 3 local objects. remote: wa ...

  4. Creating Excel files with Python and XlsxWriter(通过 Python和XlsxWriter来创建Excel文件(xlsx格式))

    以下所有内容翻译至: https://xlsxwriter.readthedocs.io/ #----------------------------------------------------- ...

  5. 【Selenium】【BugList4】执行pip报错:Fatal error in launcher: Unable to create process using '""D:\Program Files\Python36\python.exe"" "D:\Program Files\Python36\Scripts\pip.exe" '

    环境信息: python版本:V3.6.4 安装路径:D:\Program Files\python36 环境变量PATH:D:\Program Files\Python36;D:\Program F ...

  6. Read a large file with python

    python读取大文件 较pythonic的方法,使用with结构 文件可以自动关闭 异常可以在with块内处理 with open(filename, 'rb') as f: for line in ...

  7. reading/writing files in Python

    file types: plaintext files, such as .txt .py Binary files, such as .docx, .pdf, iamges, spreadsheet ...

  8. How to read and write multiple files in Python?

    Goal: I want to write a program for this: In a folder I have =n= number of files; first read one fil ...

  9. Creating Excel files with Python and XlsxWriter——Introduction

    XlsxWriter 是用来写Excel2007版本以上的xlsx文件的Python模块. XlsxWriter 在供选择的可以写Excel的Python模块中有自己的优缺点. #---------- ...

随机推荐

  1. Jetty修改默认端口

    1.webserver: Jetty2.version:   7.6.5, 8.1.53.operation: 修改默认端口3.1 修改Jetty目录下的/etc/jetty.xml 文件中的[por ...

  2. HTML表单页面的运用

    本章目标:掌握表单基本结构<form> 掌握各种表单元素 能理解post和get两种提交方式的区别 本章重点:掌握各种表单元素 本章难点:post和get两种提交方式的区别 一.    H ...

  3. json 字符串 对象 互转

    json对象,json字符串,不注意的话,很容易忽视混淆.例举几个容易混的情况 1,php将变量放到input框中,通过js去读取出来的是json字符串,要想使用就要将json字段串转成json对象 ...

  4. (转)fock函数详解

    转自:http://www.cnblogs.com/bastard/archive/2012/08/31/2664896.html linux中fork()函数详解  一.fork入门知识 一个进程, ...

  5. Canvas清屏的实现

    /** * Canvas清屏的操作 * * 參考资料: http://blog.csdn.net/lfdfhl/article/details/9076001 * */ private void cl ...

  6. db2 设置表 not null

    db2将原表列notnull属性修改为null属性的方法   今天把自己遇到的一个小问题跟大家分享一下如何修改db2数据库表中列的属性--将列的非空属性改为允许空的属性,修改数据表的某一列属性其实很简 ...

  7. RelativeSource.TemplatedParent 属性wpf

    今天看到这一句代码时候,自己只是知道绑定了,可是不知道绑定了什么啊 就去查了一下,后来说的好像是绑定的TemplateParent返回的 一个值.可是这是为什么呢, 有的说是绑定的是一个资源. 下面有 ...

  8. 把資源加载到内存中 BMP 出错

    BMP文件放到VS的資源中時,VS會將BMP的文件頭去掉,即BITMAPFILEHEADER,這個結構體去除.所以當加載BMP到內存中時,如果是使用GDI+或是其它解釋庫時,會解析失敗. 所以在讀取B ...

  9. Domino移动Web上传的附件到RichText域

    只是从网上拷贝下来,没有测试. 得到上传文件的路径http://searchdomino.techtarget.com/tip/Trap-an-attachment-path-via-the-Domi ...

  10. python之range和xrange

    range 前面小节已经说明了,range([start,] stop[, step]),根据start与stop指定的范围以及step设定的步长,生成一个序列. 比如: 1 >>> ...