Read Large Files in Python
I have a large file ( ~4G) to process in Python. I wonder whether it is OK to "read" such a large file. So I tried in the following several ways:
The original large file to deal with is not "./CentOS-6.5-i386.iso", I just take this file as an example here.
1: Normal Method. (ignore try/except/finally)
def main():
f = open(r"./CentOS-6.5-i386.iso", "rb")
for line in f:
print(line, end="")
f.close() if __name__ == "__main__":
main()
2: "With" Method.
def main():
with open(r"./CentOS-6.5-i386.iso", "rb") as f:
for line in f:
print(line, end="") if __name__ == "__main__":
main()
3: "readlines" Method. [Bad Idea]
#NO. readlines() is really bad for large files.
#Memory Error.
def main():
for line in open(r"./CentOS-6.5-i386.iso", "rb").readlines():
print(line, end="") if __name__ == "__main__":
main()
4: "fileinput" Method.
import fileinput def main():
for line in fileinput.input(files=r"./CentOS-6.5-i386.iso", mode="rb"):
print(line, end="") if __name__ == "__main__":
main()
5: "Generator" Method.
def readFile():
with open(r"./CentOS-6.5-i386.iso", "rb") as f:
for line in f:
yield line def main():
for line in readFile():
print(line, end="") if __name__ == "__main__":
main()
The methods above, all work well for small files, but not always for large files(readlines Method). The readlines() function loads the entire file into memory as it runs.
When I run the readlines Method, I got the following error message:

When using the readlines Method, the Percentage of Used CPU and Used Memory rises rapidly(in the following figure). And when the percentage of Used Memory reaches over 50%, I got the "MemoryError" in Python.

The other methods (Normal Method, With Method, fileinput Method, Generator Method) works well for large files. And when using these methods, the workload for CPU and memory which is shown in the following figure does not get a distinct rise.

By the way, I recommend the generator method, because it shows clearly that you have taken the file size into account.
Reference:
How to read large file, line by line in python
Read Large Files in Python的更多相关文章
- Huge CSV and XML Files in Python, Error: field larger than field limit (131072)
Huge CSV and XML Files in Python January 22, 2009. Filed under python twitter facebook pinterest lin ...
- Working with Excel Files in Python
Working with Excel Files in Python from: http://www.python-excel.org/ This site contains pointers to ...
- GitHub 上传文件过大报错:remote: error: GH001: Large files detected.
1.查看哪个文件过大了 报错信息: remote: Resolving deltas: 100% (24/24), completed with 3 local objects. remote: wa ...
- Creating Excel files with Python and XlsxWriter(通过 Python和XlsxWriter来创建Excel文件(xlsx格式))
以下所有内容翻译至: https://xlsxwriter.readthedocs.io/ #----------------------------------------------------- ...
- 【Selenium】【BugList4】执行pip报错:Fatal error in launcher: Unable to create process using '""D:\Program Files\Python36\python.exe"" "D:\Program Files\Python36\Scripts\pip.exe" '
环境信息: python版本:V3.6.4 安装路径:D:\Program Files\python36 环境变量PATH:D:\Program Files\Python36;D:\Program F ...
- Read a large file with python
python读取大文件 较pythonic的方法,使用with结构 文件可以自动关闭 异常可以在with块内处理 with open(filename, 'rb') as f: for line in ...
- reading/writing files in Python
file types: plaintext files, such as .txt .py Binary files, such as .docx, .pdf, iamges, spreadsheet ...
- How to read and write multiple files in Python?
Goal: I want to write a program for this: In a folder I have =n= number of files; first read one fil ...
- Creating Excel files with Python and XlsxWriter——Introduction
XlsxWriter 是用来写Excel2007版本以上的xlsx文件的Python模块. XlsxWriter 在供选择的可以写Excel的Python模块中有自己的优缺点. #---------- ...
随机推荐
- nginx+tomcat多节点部署
在一台机器上想要将一个应用程序部署多个节点,可以通过nginx来实现. 1.将tomcat复制多份,修改tomcat配置文件conf/server.xml,将端口号设置成不一样的 2.将多个tomca ...
- spring boot +mybatis分页查询
这是spring boot集合mybatis的分页查询. pom依赖: <!-- 分页插件 --> <dependency> <groupId>com.github ...
- 使用Fiddler调试线上JS代码
在下面的命令框输入“select script”回车来筛选js请求 将HTTP请求重定向到本地的文件,进行web调试.这种调试方式不需要发布到线上再验证,避免了修改不成功.对用户造成影响的风险 左边一 ...
- XV Open Cup named after E.V. Pankratiev Stage 6, Grand Prix of Japan Problem J. Hyperrectangle
题目大意: 给出一个$d$维矩形,第i维的范围是$[0, l_i]$. 求满足$x_1 + x_2 + ...x_d \leq s$ 的点构成的单纯形体积. $d, l_i \leq 300$ 题解: ...
- Img src用base64数据
<img src='data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAMCAgICAgMCAgIDAwMDBAYEBAQEBAgG ...
- jQuery特效:图片的轮播
Flexslider图片轮播.文字图片相结合滑动切换效果 地址:http://www.helloweba.com/view-blog-265.html 示例:http://www.helloweba. ...
- Nucleus PLUS简单介绍
近些年来,随着嵌入式系统飞速的发展,嵌入式实时操作系统广泛地应用在制造工业.过程控制.通讯.仪器仪表.汽车.船舶.航空航天.军事.装备.消费类产 品等方面.今天嵌入式系统带来的工业年产值超过了1万亿美 ...
- oauth 2
OAuth2是基于HTTP的认证API,一般与OAuth2搭配的API也是基于HTTP的REST风格API(比如新浪微博和github),很多人一定想过是否可以直接从浏览器端调用REST API. 我 ...
- php 使用curl 进行简单模拟提交表单
//初始化curl $ch = curl_init(); $url = 'xxx'; $option = [ CURLOPT_URL => $url, CURLOPT_HEADER => ...
- 更轻更快的Vue.js 2.0与其他框架对比(转)
更轻更快的Vue.js 2.0 崭露头角的JavaScript框架Vue.js 2.0版本已经发布,在狂热的JavaScript世界里带来了让人耳目一新的变化. Vue创建者尤雨溪称,Vue 2.0 ...