I have a large file ( ~4G) to process in Python. I wonder whether it is OK to "read" such a large file. So I tried in the following several ways:

The original large file to deal with is not "./CentOS-6.5-i386.iso", I just take this file as an example here.

1:  Normal Method. (ignore try/except/finally)

def main():
f = open(r"./CentOS-6.5-i386.iso", "rb")
for line in f:
print(line, end="")
f.close() if __name__ == "__main__":
main()

2: "With" Method.

def main():
with open(r"./CentOS-6.5-i386.iso", "rb") as f:
for line in f:
print(line, end="") if __name__ == "__main__":
main()

3:  "readlines" Method. [Bad Idea]

#NO. readlines() is really bad for large files.
#Memory Error.
def main():
for line in open(r"./CentOS-6.5-i386.iso", "rb").readlines():
print(line, end="") if __name__ == "__main__":
main()

4: "fileinput" Method.

import fileinput

def main():
for line in fileinput.input(files=r"./CentOS-6.5-i386.iso", mode="rb"):
print(line, end="") if __name__ == "__main__":
main()

5: "Generator" Method.

def readFile():
with open(r"./CentOS-6.5-i386.iso", "rb") as f:
for line in f:
yield line def main():
for line in readFile():
print(line, end="") if __name__ == "__main__":
main()

The methods above, all work well for small files, but not always for large files(readlines Method). The readlines() function loads the entire file into memory as it runs.

When I run the readlines Method, I got the following error message:

When using the readlines Method, the Percentage of Used CPU and Used Memory rises rapidly(in the following figure). And when the percentage of Used Memory reaches over 50%, I got the "MemoryError" in Python.

The other methods (Normal Method, With Method, fileinput Method, Generator Method) works well for large files. And when using these methods, the workload for CPU and memory which is shown in the following figure does not get a distinct rise.

By the way, I recommend the generator method, because it shows clearly that you have taken the file size into account.

Reference:

How to read large file, line by line in python

Read Large Files in Python的更多相关文章

  1. Huge CSV and XML Files in Python, Error: field larger than field limit (131072)

    Huge CSV and XML Files in Python January 22, 2009. Filed under python twitter facebook pinterest lin ...

  2. Working with Excel Files in Python

    Working with Excel Files in Python from: http://www.python-excel.org/ This site contains pointers to ...

  3. GitHub 上传文件过大报错:remote: error: GH001: Large files detected.

    1.查看哪个文件过大了 报错信息: remote: Resolving deltas: 100% (24/24), completed with 3 local objects. remote: wa ...

  4. Creating Excel files with Python and XlsxWriter(通过 Python和XlsxWriter来创建Excel文件(xlsx格式))

    以下所有内容翻译至: https://xlsxwriter.readthedocs.io/ #----------------------------------------------------- ...

  5. 【Selenium】【BugList4】执行pip报错:Fatal error in launcher: Unable to create process using '""D:\Program Files\Python36\python.exe"" "D:\Program Files\Python36\Scripts\pip.exe" '

    环境信息: python版本:V3.6.4 安装路径:D:\Program Files\python36 环境变量PATH:D:\Program Files\Python36;D:\Program F ...

  6. Read a large file with python

    python读取大文件 较pythonic的方法,使用with结构 文件可以自动关闭 异常可以在with块内处理 with open(filename, 'rb') as f: for line in ...

  7. reading/writing files in Python

    file types: plaintext files, such as .txt .py Binary files, such as .docx, .pdf, iamges, spreadsheet ...

  8. How to read and write multiple files in Python?

    Goal: I want to write a program for this: In a folder I have =n= number of files; first read one fil ...

  9. Creating Excel files with Python and XlsxWriter——Introduction

    XlsxWriter 是用来写Excel2007版本以上的xlsx文件的Python模块. XlsxWriter 在供选择的可以写Excel的Python模块中有自己的优缺点. #---------- ...

随机推荐

  1. [Python]计算闰年时候出现的and和or优先级的问题以及短路逻辑

    好吧题目非常easy.可是有些细节还是挺有意思的. 题目是:计算今年是否是闰年,推断闰年条件,满足年份模400为0,或者模4为0可是模100不为0 答案是这种: import time #计算今年是否 ...

  2. MapReduce实战(七)GroupingComparator

    需求: Order_0000001,Pdt_01,222.8Order_0000001,Pdt_05,25.8Order_0000002,Pdt_05,325.8Order_0000002,Pdt_0 ...

  3. php判断今日是本月的第几个星期几

    php判断今日是本月的第几个星期几 php中有一个非常强悍的系统函数date()函数.巧妙的利用他可以实现显示任意我们需要的时间.比如今天遇到个需要是要判断今天是本月的第几个星期几,这里就不讨论这种说 ...

  4. Swift-9-类和结构体

    // Playground - noun: a place where people can play import UIKit // 几个重要的概念Properties/Methods/Subscr ...

  5. php form 图片上传至服务器上

    本文章也是写给自己看的,因为写的很简洁,连判断都没有,只是直接实现了能上传的功能. 前台: <form action="upload.php" method="PO ...

  6. 【Raspberry Pi】webpy+mysql+GPIO 实现手机控制

    1.mysql http://dev.mysql.com/doc/refman/5.5/en/index.html 安装 sudo apt-get install update sudo apt-ge ...

  7. gcc/g++ 实战之编译的四个过程

    gcc和g++分别是GNU(一个开源组织)的c&c++编译器   对于.c后缀的文件,gcc把它当做是C程序,g++当做是C++程序:对于.cpp后缀的文件,gcc和g++都会当做c++程序. ...

  8. VC++ 在Watch窗口显示GetLastError值以及详细信息

    You can display the value GetLastError() will return by putting "@err" in your watch windo ...

  9. 存储过程根据ouID获取IntlPerson数据表

    /****************************************************************************** ** Name: usp_base_Ge ...

  10. Java Tomcat7性能监控与优化详解

    1.   目的 通过优化tomcat提高网站的并发能力. 2.   服务器资源 服务器所能提供CPU.内存.硬盘的性能对处理能力有决定性影响. 3.   优化配置 3.1. 配置tomcat管理员账户 ...