I have a large file ( ~4G) to process in Python. I wonder whether it is OK to "read" such a large file. So I tried in the following several ways:

The original large file to deal with is not "./CentOS-6.5-i386.iso", I just take this file as an example here.

1:  Normal Method. (ignore try/except/finally)

def main():
f = open(r"./CentOS-6.5-i386.iso", "rb")
for line in f:
print(line, end="")
f.close() if __name__ == "__main__":
main()

2: "With" Method.

def main():
with open(r"./CentOS-6.5-i386.iso", "rb") as f:
for line in f:
print(line, end="") if __name__ == "__main__":
main()

3:  "readlines" Method. [Bad Idea]

#NO. readlines() is really bad for large files.
#Memory Error.
def main():
for line in open(r"./CentOS-6.5-i386.iso", "rb").readlines():
print(line, end="") if __name__ == "__main__":
main()

4: "fileinput" Method.

import fileinput

def main():
for line in fileinput.input(files=r"./CentOS-6.5-i386.iso", mode="rb"):
print(line, end="") if __name__ == "__main__":
main()

5: "Generator" Method.

def readFile():
with open(r"./CentOS-6.5-i386.iso", "rb") as f:
for line in f:
yield line def main():
for line in readFile():
print(line, end="") if __name__ == "__main__":
main()

The methods above, all work well for small files, but not always for large files(readlines Method). The readlines() function loads the entire file into memory as it runs.

When I run the readlines Method, I got the following error message:

When using the readlines Method, the Percentage of Used CPU and Used Memory rises rapidly(in the following figure). And when the percentage of Used Memory reaches over 50%, I got the "MemoryError" in Python.

The other methods (Normal Method, With Method, fileinput Method, Generator Method) works well for large files. And when using these methods, the workload for CPU and memory which is shown in the following figure does not get a distinct rise.

By the way, I recommend the generator method, because it shows clearly that you have taken the file size into account.

Reference:

How to read large file, line by line in python

Read Large Files in Python的更多相关文章

  1. Huge CSV and XML Files in Python, Error: field larger than field limit (131072)

    Huge CSV and XML Files in Python January 22, 2009. Filed under python twitter facebook pinterest lin ...

  2. Working with Excel Files in Python

    Working with Excel Files in Python from: http://www.python-excel.org/ This site contains pointers to ...

  3. GitHub 上传文件过大报错:remote: error: GH001: Large files detected.

    1.查看哪个文件过大了 报错信息: remote: Resolving deltas: 100% (24/24), completed with 3 local objects. remote: wa ...

  4. Creating Excel files with Python and XlsxWriter(通过 Python和XlsxWriter来创建Excel文件(xlsx格式))

    以下所有内容翻译至: https://xlsxwriter.readthedocs.io/ #----------------------------------------------------- ...

  5. 【Selenium】【BugList4】执行pip报错:Fatal error in launcher: Unable to create process using '""D:\Program Files\Python36\python.exe"" "D:\Program Files\Python36\Scripts\pip.exe" '

    环境信息: python版本:V3.6.4 安装路径:D:\Program Files\python36 环境变量PATH:D:\Program Files\Python36;D:\Program F ...

  6. Read a large file with python

    python读取大文件 较pythonic的方法,使用with结构 文件可以自动关闭 异常可以在with块内处理 with open(filename, 'rb') as f: for line in ...

  7. reading/writing files in Python

    file types: plaintext files, such as .txt .py Binary files, such as .docx, .pdf, iamges, spreadsheet ...

  8. How to read and write multiple files in Python?

    Goal: I want to write a program for this: In a folder I have =n= number of files; first read one fil ...

  9. Creating Excel files with Python and XlsxWriter——Introduction

    XlsxWriter 是用来写Excel2007版本以上的xlsx文件的Python模块. XlsxWriter 在供选择的可以写Excel的Python模块中有自己的优缺点. #---------- ...

随机推荐

  1. golang模板语法简明教程

    [模板标签] 模板标签用"{{"和"}}"括起来   [注释] {{/* a comment */}} 使用“{{/*”和“*/}}”来包含注释内容   [变量 ...

  2. HBase学习笔记——配置及Shell操作

    1.HBase的配置 还是以前配置的集群,见:http://www.cnblogs.com/DarrenChan/p/6493373.html 我们约定:weekend03和weekend04放HMa ...

  3. ZABBIX监控原理

    zabbix实现原理及架构详解   想要用好zabbix进行监控,那么我们首要需要了解下zabbix这个软件的实现原理及它的架构.建议多阅读官方文档. 一.总体上zabbix的整体架构如下图所示: 重 ...

  4. 用e2fsck修复受损的linux文件系统

    今天想尝试直接从linux deepin拷贝一些文件到windows 10而不重启电脑,所以就安装Ext2Mgr了并加载了linux的几个磁盘: / /home 再次重启系统想进入linux的时发现系 ...

  5. 火狐调试js

      alert("123"); //警告框显示    console.log(json); //火狐控制台显示

  6. 描述J2EE框架的多层结构,并简要说明各层的作用。

    描述J2EE框架的多层结构,并简要说明各层的作用. 解答: 1) Presentation layer(表示层) a. 表示逻辑(生成界面代码) b. 接收请求 c. 处理业务层抛出的异常 d. 负责 ...

  7. TempData,跳转后的提醒

    TempData与ViewData用法一样,不同的是ViewData是当前action与对应的view中存在,TempData在下个action还有效,再往后就无效了.只是我的浅薄理解,希望不会误人子 ...

  8. 蓝桥杯 第三届C/C++预赛真题(5) 转方阵(C基本功)

    对一个方阵转置,就是把原来的行号变列号,原来的列号变行号 例如,如下的方阵: 1 2 3 4 5 6 7 8 9 10 11 1213 14 15 16 转置后变为: 1 5 9 13 2 6 10 ...

  9. TypeScript 变量声明(二)

    ES6 中,变量声明一共有6种,分别是var.function .let.const.class和import. let 基本语法:let 变量名 :类型.其中类型不是必须的. 1.用于声明变量,其用 ...

  10. 认识tornado(二)

    前面我们对 Tornado 自带的 hello world 作了代码组织上的解释,但是没有更加深入细致地解释.这里我们直接从main()函数开始,单步跟随,看看tornado都干了些什么. 下面是 m ...