Read Large Files in Python
I have a large file ( ~4G) to process in Python. I wonder whether it is OK to "read" such a large file. So I tried in the following several ways:
The original large file to deal with is not "./CentOS-6.5-i386.iso", I just take this file as an example here.
1: Normal Method. (ignore try/except/finally)
def main():
f = open(r"./CentOS-6.5-i386.iso", "rb")
for line in f:
print(line, end="")
f.close() if __name__ == "__main__":
main()
2: "With" Method.
def main():
with open(r"./CentOS-6.5-i386.iso", "rb") as f:
for line in f:
print(line, end="") if __name__ == "__main__":
main()
3: "readlines" Method. [Bad Idea]
#NO. readlines() is really bad for large files.
#Memory Error.
def main():
for line in open(r"./CentOS-6.5-i386.iso", "rb").readlines():
print(line, end="") if __name__ == "__main__":
main()
4: "fileinput" Method.
import fileinput def main():
for line in fileinput.input(files=r"./CentOS-6.5-i386.iso", mode="rb"):
print(line, end="") if __name__ == "__main__":
main()
5: "Generator" Method.
def readFile():
with open(r"./CentOS-6.5-i386.iso", "rb") as f:
for line in f:
yield line def main():
for line in readFile():
print(line, end="") if __name__ == "__main__":
main()
The methods above, all work well for small files, but not always for large files(readlines Method). The readlines() function loads the entire file into memory as it runs.
When I run the readlines Method, I got the following error message:

When using the readlines Method, the Percentage of Used CPU and Used Memory rises rapidly(in the following figure). And when the percentage of Used Memory reaches over 50%, I got the "MemoryError" in Python.

The other methods (Normal Method, With Method, fileinput Method, Generator Method) works well for large files. And when using these methods, the workload for CPU and memory which is shown in the following figure does not get a distinct rise.

By the way, I recommend the generator method, because it shows clearly that you have taken the file size into account.
Reference:
How to read large file, line by line in python
Read Large Files in Python的更多相关文章
- Huge CSV and XML Files in Python, Error: field larger than field limit (131072)
Huge CSV and XML Files in Python January 22, 2009. Filed under python twitter facebook pinterest lin ...
- Working with Excel Files in Python
Working with Excel Files in Python from: http://www.python-excel.org/ This site contains pointers to ...
- GitHub 上传文件过大报错:remote: error: GH001: Large files detected.
1.查看哪个文件过大了 报错信息: remote: Resolving deltas: 100% (24/24), completed with 3 local objects. remote: wa ...
- Creating Excel files with Python and XlsxWriter(通过 Python和XlsxWriter来创建Excel文件(xlsx格式))
以下所有内容翻译至: https://xlsxwriter.readthedocs.io/ #----------------------------------------------------- ...
- 【Selenium】【BugList4】执行pip报错:Fatal error in launcher: Unable to create process using '""D:\Program Files\Python36\python.exe"" "D:\Program Files\Python36\Scripts\pip.exe" '
环境信息: python版本:V3.6.4 安装路径:D:\Program Files\python36 环境变量PATH:D:\Program Files\Python36;D:\Program F ...
- Read a large file with python
python读取大文件 较pythonic的方法,使用with结构 文件可以自动关闭 异常可以在with块内处理 with open(filename, 'rb') as f: for line in ...
- reading/writing files in Python
file types: plaintext files, such as .txt .py Binary files, such as .docx, .pdf, iamges, spreadsheet ...
- How to read and write multiple files in Python?
Goal: I want to write a program for this: In a folder I have =n= number of files; first read one fil ...
- Creating Excel files with Python and XlsxWriter——Introduction
XlsxWriter 是用来写Excel2007版本以上的xlsx文件的Python模块. XlsxWriter 在供选择的可以写Excel的Python模块中有自己的优缺点. #---------- ...
随机推荐
- (转)64位开源处理器Rocket的源代码简单介绍
转载地址: http://blog.csdn.net/leishangwen/article/details/46604819 最近大概阅读了一下UCB发布的Rocket处理器的源码,对源代码各个文件 ...
- 修改storm ui 默认端口
vim conf/storm.yaml 在下面添加 ui.port: 8080
- Tuning 13 Using oracle blocks Efficiently
推进使用自动管理 automatic segment 1 个 Blocks = 2的幂次方倍 tablespace 像一块地 segment 像一个房子 extents 向一个装砖头的框 blocks ...
- RMAN 总括 组成 配置 检测
RMAN 组件: 1. RMAN 执行程序, 也就是RMAN 命令. 2. Server session : 服务器上的进程, 是真正用来干活的. 3. Target database: 你想要进行备 ...
- JQ 报表插件 jquery.jqplot 使用
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <hea ...
- CentOS7.1 安装Liberty之环境准备(1)
一.基础平台 1.一台装有VMware的windows系统(可联网) 2.CentOS 7.1 64bit镜像 二.最小化安装两台CentOS 7.1 的虚拟机controller.compute1, ...
- python3----练习题(弹幕跟随)
# 导入模块 import requests # 1. 网络请求 2.pip install requests import time # 用于时间控制 import random # 随机模块 产生 ...
- 1、手把手教React Native实战之环境搭建
React Native 的宗旨是,学习一次,高效编写跨平台原生应用. 在Windows下搭建React Native Android开发环境 1.安装jdk 2.安装sdk 在墙的环境下,为了 ...
- webpack 从入门到工程实践
from:https://www.jianshu.com/p/9349c30a6b3e?utm_campaign=maleskine&utm_content=note&utm_medi ...
- OpenSSL Heart Bleed 如何修复
一 . 前言 这两天这个事件沸沸扬扬啊,有了这个bug黑客在电脑前动动手指就能获取各大电商网站.各大银行用户的用户名和密码了,屌爆了 BUG具体内容 : http://heartbleed.com ...