Read Large Files in Python
I have a large file ( ~4G) to process in Python. I wonder whether it is OK to "read" such a large file. So I tried in the following several ways:
The original large file to deal with is not "./CentOS-6.5-i386.iso", I just take this file as an example here.
1: Normal Method. (ignore try/except/finally)
def main():
f = open(r"./CentOS-6.5-i386.iso", "rb")
for line in f:
print(line, end="")
f.close() if __name__ == "__main__":
main()
2: "With" Method.
def main():
with open(r"./CentOS-6.5-i386.iso", "rb") as f:
for line in f:
print(line, end="") if __name__ == "__main__":
main()
3: "readlines" Method. [Bad Idea]
#NO. readlines() is really bad for large files.
#Memory Error.
def main():
for line in open(r"./CentOS-6.5-i386.iso", "rb").readlines():
print(line, end="") if __name__ == "__main__":
main()
4: "fileinput" Method.
import fileinput def main():
for line in fileinput.input(files=r"./CentOS-6.5-i386.iso", mode="rb"):
print(line, end="") if __name__ == "__main__":
main()
5: "Generator" Method.
def readFile():
with open(r"./CentOS-6.5-i386.iso", "rb") as f:
for line in f:
yield line def main():
for line in readFile():
print(line, end="") if __name__ == "__main__":
main()
The methods above, all work well for small files, but not always for large files(readlines Method). The readlines() function loads the entire file into memory as it runs.
When I run the readlines Method, I got the following error message:

When using the readlines Method, the Percentage of Used CPU and Used Memory rises rapidly(in the following figure). And when the percentage of Used Memory reaches over 50%, I got the "MemoryError" in Python.

The other methods (Normal Method, With Method, fileinput Method, Generator Method) works well for large files. And when using these methods, the workload for CPU and memory which is shown in the following figure does not get a distinct rise.

By the way, I recommend the generator method, because it shows clearly that you have taken the file size into account.
Reference:
How to read large file, line by line in python
Read Large Files in Python的更多相关文章
- Huge CSV and XML Files in Python, Error: field larger than field limit (131072)
Huge CSV and XML Files in Python January 22, 2009. Filed under python twitter facebook pinterest lin ...
- Working with Excel Files in Python
Working with Excel Files in Python from: http://www.python-excel.org/ This site contains pointers to ...
- GitHub 上传文件过大报错:remote: error: GH001: Large files detected.
1.查看哪个文件过大了 报错信息: remote: Resolving deltas: 100% (24/24), completed with 3 local objects. remote: wa ...
- Creating Excel files with Python and XlsxWriter(通过 Python和XlsxWriter来创建Excel文件(xlsx格式))
以下所有内容翻译至: https://xlsxwriter.readthedocs.io/ #----------------------------------------------------- ...
- 【Selenium】【BugList4】执行pip报错:Fatal error in launcher: Unable to create process using '""D:\Program Files\Python36\python.exe"" "D:\Program Files\Python36\Scripts\pip.exe" '
环境信息: python版本:V3.6.4 安装路径:D:\Program Files\python36 环境变量PATH:D:\Program Files\Python36;D:\Program F ...
- Read a large file with python
python读取大文件 较pythonic的方法,使用with结构 文件可以自动关闭 异常可以在with块内处理 with open(filename, 'rb') as f: for line in ...
- reading/writing files in Python
file types: plaintext files, such as .txt .py Binary files, such as .docx, .pdf, iamges, spreadsheet ...
- How to read and write multiple files in Python?
Goal: I want to write a program for this: In a folder I have =n= number of files; first read one fil ...
- Creating Excel files with Python and XlsxWriter——Introduction
XlsxWriter 是用来写Excel2007版本以上的xlsx文件的Python模块. XlsxWriter 在供选择的可以写Excel的Python模块中有自己的优缺点. #---------- ...
随机推荐
- CDH 安装配置指南(Tarball方式)
采用CDH Tarbal方式安装Hadoop集群. 1. 环境组件版本 组件名称 组件版本 用途 jdk 1.8 jdk-8u191-linux-x64 oracle jdk mysql mysql- ...
- SSH初体验系列--Hibernate--2--crud操作
Ok,今天比较详细的学习一下hibernate的C(create).R(read).U(update).D(delete) 相关api... 前言 Session: 是Hibernate持久化操作的基 ...
- Linux的文件权限(简单易懂)
学习这个章节,必须明白以下三个概念: 1.所有者 2.所属组 3.其他人 明白这三个概念后,接下来就学习文件的属性,那么文件的属性有什么呢?如何查看文件的属性? 在命令行下,执行 ls -l 可以得到 ...
- (转)ReentrantLock与Synchronized同步区别
转自:http://blog.csdn.net/fw0124/article/details/6672522 原文:http://www.ibm.com/developerworks/cn/java/ ...
- js json ie不支持json
JSON是包含在JScript 5.8中,而为了向下兼容ie8只有在文档模式是”Internet Explorer 8 Standards”的时候才使用JScripte 5.8,其他时候使用JScri ...
- Alpha matting算法发展
一.抠图算法简介 Alpha matting算法研究的是如何将一幅图像中的前景信息和背景信息分离的问题,即抠图.这类问题是数字图像处理与数字图像编辑领域中的一类经典问题,广泛应用于视频编缉与视频分割领 ...
- AndroidManifest.xml文件详解(activity)(一)
<activity android:allowTaskReparenting=["true" | "false"] android:alwaysRetai ...
- pycharm 相关设置问题
pycharm设置自动换行 file→settings→Editor→General→勾选 Use soft wraps in eitor → ok
- hihocoder 1040(矩形判断)
题目链接:传送门 题目大意:给你四条线段,判断能否围成一个面积大于0的矩形,能输出YES,不能输出NO 题目思路: 合法的四条线段应该满足 1.应该必须有四个不同的点 2.线段斜率分为两组,组内 ...
- mybatis的dao的注解
import com.jianwu.domain.metting.model.CallPreMember;import com.jianwu.domain.metting.model.CallPreM ...