Java – Reading a Large File Efficiently--转
原文地址:http://www.baeldung.com/java-read-lines-large-file
1. Overview
This tutorial will show how to read all the lines from a large file in Java in an efficient manner.
This article is part of the “Java – Back to Basic” tutorial here on Baeldung.
2. Reading In Memory
The standard way of reading the lines of the file is in-memory – both Guava and Apache Commons IO provide a quick way to do just that:
|
1
|
Files.readLines(new File(path), Charsets.UTF_8); |
|
1
|
FileUtils.readLines(new File(path)); |
The problem with this approach is that all the file lines are kept in memory – which will quickly lead to OutOfMemoryError if the File is large enough.
For example – reading a ~1Gb file:
|
1
2
3
4
5
|
@Testpublic void givenUsingGuava_whenIteratingAFile_thenWorks() throws IOException { String path = ... Files.readLines(new File(path), Charsets.UTF_8);} |
This starts off with a small amount of memory being consumed: (~0 Mb consumed)
|
1
2
|
[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 128 Mb[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 116 Mb |
However, after the full file has been processed, we have at the end: (~2 Gb consumed)
|
1
2
|
[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 2666 Mb[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 490 Mb |
Which means that about 2.1 Gb of memory are consumed by the process – the reason is simple – the lines of the file are all being stored in memory now.
It should be obvious by this point that keeping in-memory the contents of the file will quickly exhaust the available memory – regardless of how much that actually is.
What’s more, we usually don’t need all of the lines in the file in memory at once – instead, we just need to be able to iterate through each one, do some processing and throw it away. So, this is exactly what we’re going to do – iterate through the lines without holding the in memory.
3. Streaming Through the File
Let’s now look at a solution – we’re going to use a java.util.Scanner to run through the contents of the file and retrieve lines serially, one by one:
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
FileInputStream inputStream = null;Scanner sc = null;try { inputStream = new FileInputStream(path); sc = new Scanner(inputStream, "UTF-8"); while (sc.hasNextLine()) { String line = sc.nextLine(); // System.out.println(line); } // note that Scanner suppresses exceptions if (sc.ioException() != null) { throw sc.ioException(); }} finally { if (inputStream != null) { inputStream.close(); } if (sc != null) { sc.close(); }} |
This solution will iterate through all the lines in the file – allowing for processing of each line – without keeping references to them – and in conclusion, without keeping them in memory: (~150 Mb consumed)
|
1
2
|
[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 763 Mb[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 605 Mb |
4. Streaming with Apache Commons IO
The same can be achieved using the Commons IO library as well, by using the customLineIterator provided by the library:
|
1
2
3
4
5
6
7
8
9
|
LineIterator it = FileUtils.lineIterator(theFile, "UTF-8");try { while (it.hasNext()) { String line = it.nextLine(); // do something with line }} finally { LineIterator.closeQuietly(it);} |
Since the entire file is not fully in memory – this will also result in pretty conservative memory consumption numbers: (~150 Mb consumed)
|
1
2
|
[main] INFO o.b.java.CoreJavaIoIntegrationTest - Total Memory: 752 Mb[main] INFO o.b.java.CoreJavaIoIntegrationTest - Free Memory: 564 Mb |
5. Conclusion
This quick article shows how to process lines in a large file without iteratively, without exhausting the available memory – which proves quite useful when working with these large files.
The implementation of all these examples and code snippets can be found in my github project – this is an Eclipse based project, so it should be easy to import and run as it is.
Java – Reading a Large File Efficiently--转的更多相关文章
- Loading Large Bitmaps Efficiently
有效地加载大位图文件-Loading Large Bitmaps Efficiently 图像有各种不同的形状和大小.在许多情况下,他们往往比一个典型应用程序的用户界面(UI)所需要的资源更大.例如, ...
- java之io之file类的常用操作
java io 中,file类是必须掌握的.它的常用api用法见实例. package com.westward.io; import java.io.File; import java.io.IOE ...
- linux出现bash: ./java: cannot execute binary file 问题的解决办法
问题现象描述: 到orcal官网上下载了两个jdk: (1)jdk-7u9-linux-i586.tar.gz ------------>32位 (2)jdk-7u9-linux-x64.tar ...
- java: cannot execute binary file
转自:http://jxwpx.blog.51cto.com/15242/222572 java: cannot execute binary file 如果遇到这个错,一般是操作系统位数出问题了. ...
- -bash: /tyrone/jdk/jdk1.8.0_91/bin/java: cannot execute binary file
问题描述:今天在linux环境下安装了一下JDK,安装成功后,打算输入java -version去测试一下,结果却出错了. 错误信息:-bash: /tyrone/jdk/jdk1.8.0_91/bi ...
- Github Upload Large File 上传超大文件
Github中单个文件的大小限制是100MB,为了能突破这个限制,我们需要使用Git Large File Storage这个工具,参见这个官方帖子,但是按照其给的步骤,博主未能成功上传超大文件,那么 ...
- Reading Lines from File in C++
Reading Lines from File in C++ In C++, istringstream has been used to read lines from a file. code: ...
- 使用JAVA API 解析ORC File
使用JAVA API 解析ORC File orc File 的解析过程中,使用FileInputFormat的getSplits(conf, 1)函数, 然后使用 RecordReaderreade ...
- java.lang.IllegalStateException: Zip File is closed
最近在研究利用sax读取excel大文件时,出现了以下的错误: java.lang.IllegalStateException: Zip File is closed at org.apache.po ...
随机推荐
- eclipse启动Tomcat加载项目时报内存溢出错误解决办法
在eclipse中点击Window->Preferences打开全局属性设置对话框,如下图所示设置Tomcat运行时的JVM参数,添加这段JVM设置:-Xms256M -Xmx768M -XX: ...
- vue 指令的用法
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8&quo ...
- Python学习笔记 capture 1
最近开始学习Python3.x,真的感觉Python的语法与C++,Java有很大的不同,Python从某些方面来说语法更简单.Python作为一种解释性语言和编译型语言如C++来说,还是各有千秋的. ...
- codeforces 140E.New Year Garland
传送门: 解题思路: 要求相邻两行小球颜色集合不同,并且限制行内小球相邻不同. 由此可得:每行小球排列都是独立与外界的, 所以答案应该是对于所有行的颜色集合分类,在将行内的答案乘到上面. 先考虑如何分 ...
- SpringBoot 整合 Mybatis 和 Mysql (详细版)
结构如下 1.引入相关依赖 <!--mysql--><dependency> <groupId>mysql</groupId> <artifact ...
- RHEL7.1安装VNC
1.安装包 yum install vnc* -y 2.创建password vncserver 3.创建參数文件 [root@single ~]# cp /lib/systemd/system/vn ...
- Android 4.4 Fence在SurfaceFlinger中的应用
网上关于android.fence的资料好少啊.差点儿没有,可是这个机制又在GUI系统中起着关键的数据,于是自己通读源代码和凝视.与大家分享下Fence究竟是怎么回事? Fence即栅栏.栅栏的角色与 ...
- iOS8 对开发人员来说意味着什么?
今天凌晨.Apple WWDC2014 iOS8 正式推出. 或许,对于广大iOS用户来说,iOS8的创新并非特别多. 但对于开发人员来说,影响却将会是无比巨大的! 正如Apple官网上的广告:Hug ...
- package-判断安装应用是否存在
今天在修改一个bug的时候,遇到一个问题,就是一个应用卸载了以后,在超级用户权限界面仍然会加载进来这个应用的相关信息.自己修改的时候,为了方便,就直接使用了里面一个加载图标的代码作为条件,也就是说,如 ...
- Razor数组数据
控制器层 public ActionResult DemoArray() { Product[] array = { new Product {Name = "Kayak", Pr ...