原文地址:http://www.baeldung.com/java-read-lines-large-file

1. Overview

This tutorial will show how to read all the lines from a large file in Java in an efficient manner.

This article is part of the “Java – Back to Basic” tutorial here on Baeldung.

2. Reading In Memory

The standard way of reading the lines of the file is in-memory – both Guava and Apache Commons IO provide a quick way to do just that:

1
Files.readLines(new File(path), Charsets.UTF_8);
1
FileUtils.readLines(new File(path));

The problem with this approach is that all the file lines are kept in memory – which will quickly lead to OutOfMemoryError if the File is large enough.

For example – reading a ~1Gb file:

1
2
3
4
5
@Test
public void givenUsingGuava_whenIteratingAFile_thenWorks() throws IOException {
    String path = ...
    Files.readLines(new File(path), Charsets.UTF_8);
}

This starts off with a small amount of memory being consumed: (~0 Mb consumed)

1
2
[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 128 Mb
[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 116 Mb

However, after the full file has been processed, we have at the end: (~2 Gb consumed)

1
2
[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 2666 Mb
[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 490 Mb

Which means that about 2.1 Gb of memory are consumed by the process – the reason is simple – the lines of the file are all being stored in memory now.

It should be obvious by this point that keeping in-memory the contents of the file will quickly exhaust the available memory – regardless of how much that actually is.

What’s more, we usually don’t need all of the lines in the file in memory at once – instead, we just need to be able to iterate through each one, do some processing and throw it away. So, this is exactly what we’re going to do – iterate through the lines without holding the in memory.

3. Streaming Through the File

Let’s now look at a solution – we’re going to use a java.util.Scanner to run through the contents of the file and retrieve lines serially, one by one:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
FileInputStream inputStream = null;
Scanner sc = null;
try {
    inputStream = new FileInputStream(path);
    sc = new Scanner(inputStream, "UTF-8");
    while (sc.hasNextLine()) {
        String line = sc.nextLine();
        // System.out.println(line);
    }
    // note that Scanner suppresses exceptions
    if (sc.ioException() != null) {
        throw sc.ioException();
    }
} finally {
    if (inputStream != null) {
        inputStream.close();
    }
    if (sc != null) {
        sc.close();
    }
}

This solution will iterate through all the lines in the file – allowing for processing of each line – without keeping references to them – and in conclusion, without keeping them in memory(~150 Mb consumed)

1
2
[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 763 Mb
[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 605 Mb

4. Streaming with Apache Commons IO

The same can be achieved using the Commons IO library as well, by using the customLineIterator provided by the library:

1
2
3
4
5
6
7
8
9
LineIterator it = FileUtils.lineIterator(theFile, "UTF-8");
try {
    while (it.hasNext()) {
        String line = it.nextLine();
        // do something with line
    }
} finally {
    LineIterator.closeQuietly(it);
}

Since the entire file is not fully in memory – this will also result in pretty conservative memory consumption numbers(~150 Mb consumed)

1
2
[main] INFO  o.b.java.CoreJavaIoIntegrationTest - Total Memory: 752 Mb
[main] INFO  o.b.java.CoreJavaIoIntegrationTest - Free Memory: 564 Mb

5. Conclusion

This quick article shows how to process lines in a large file without iteratively, without exhausting the available memory – which proves quite useful when working with these large files.

The implementation of all these examples and code snippets can be found in my github project – this is an Eclipse based project, so it should be easy to import and run as it is.

Java – Reading a Large File Efficiently--转的更多相关文章

  1. Loading Large Bitmaps Efficiently

    有效地加载大位图文件-Loading Large Bitmaps Efficiently 图像有各种不同的形状和大小.在许多情况下,他们往往比一个典型应用程序的用户界面(UI)所需要的资源更大.例如, ...

  2. java之io之file类的常用操作

    java io 中,file类是必须掌握的.它的常用api用法见实例. package com.westward.io; import java.io.File; import java.io.IOE ...

  3. linux出现bash: ./java: cannot execute binary file 问题的解决办法

    问题现象描述: 到orcal官网上下载了两个jdk: (1)jdk-7u9-linux-i586.tar.gz ------------>32位 (2)jdk-7u9-linux-x64.tar ...

  4. java: cannot execute binary file

    转自:http://jxwpx.blog.51cto.com/15242/222572 java: cannot execute binary file 如果遇到这个错,一般是操作系统位数出问题了. ...

  5. -bash: /tyrone/jdk/jdk1.8.0_91/bin/java: cannot execute binary file

    问题描述:今天在linux环境下安装了一下JDK,安装成功后,打算输入java -version去测试一下,结果却出错了. 错误信息:-bash: /tyrone/jdk/jdk1.8.0_91/bi ...

  6. Github Upload Large File 上传超大文件

    Github中单个文件的大小限制是100MB,为了能突破这个限制,我们需要使用Git Large File Storage这个工具,参见这个官方帖子,但是按照其给的步骤,博主未能成功上传超大文件,那么 ...

  7. Reading Lines from File in C++

    Reading Lines from File in C++ In C++, istringstream has been used to read lines from a file. code: ...

  8. 使用JAVA API 解析ORC File

    使用JAVA API 解析ORC File orc File 的解析过程中,使用FileInputFormat的getSplits(conf, 1)函数, 然后使用 RecordReaderreade ...

  9. java.lang.IllegalStateException: Zip File is closed

    最近在研究利用sax读取excel大文件时,出现了以下的错误: java.lang.IllegalStateException: Zip File is closed at org.apache.po ...

随机推荐

  1. 携手互联网企业10巨头设VC基金

    包括小米科技.盛大集团.人人网.掌趣科技.游族网络.龙图游戏.蓝港互动.37游戏.星辉互动娱乐.博雅互动等10家知名互联网企业作为出资人(LP)的优格创投基金近日正式成立. 众所周知,伴随着移动互联网 ...

  2. Facebook下载总结

    Facebook是美国的一个社交网络服务网站,至今注册用户已超越20亿,月活用户更是惊人的突破3亿. 这样庞大的一个社交类网站,每日产生的社交数据当然也是非常可观,而这些社交数据,更接近口语,所以是比 ...

  3. COWRUN

    USACO COWRUN 随机化搜索+双重递归调用 题面描述:给出8*N(<=14)组牌,每次按顺序选择8张,FJ可以选择前4张或者后4张,COW从FJ选出的牌中选择前两张或者后两张,然后COW ...

  4. 本地运行github上的vue2.0仿饿了么webapp项目

    在vue刚刚开始流行的时候,大多数人学习大概都见到过这样的一个项目吧,可以作为学习此框架的一个模板了 github源码地址:https://github.com/RegToss/Vue-SPA 课程教 ...

  5. 洛谷 P1239 计数器

    P1239 计数器 题目描述 一本书的页数为N,页码从1开始编起,请你求出全部页码中,用了多少个0,1,2,…,9.其中—个页码不含多余的0,如N=1234时第5页不是0005,只是5. 输入输出格式 ...

  6. 怎么样让用户认为产品更有价值?让他们DIY吧!

    怎么样让用户认为产品更有价值?用户不须要镶钻.贴金的产品,答案可能比你想的简单,那就是在产品里加入DIY的元素. 几年前,学者做了一系列的调查.他们发现当人们自己打造产品的时候.他们会更加珍惜它,并觉 ...

  7. 中科燕园GIS外包--移动GIS

    移动GIS恰逢其时 得益于移动智能终端的普及和移动互联网的发展,伴随着GIS技术的发展和应用的深入.越来越多的企业和普通消费者開始体会到移动GIS的巨大潜力和价值. 移动GIS轻便灵活,受众面广.随时 ...

  8. cmake 常见问题及解决

    1. undefined reference to symbol 'pthread_key_delete@@GLIBC_2.2.5 未定义对某符号的引用,该错误为链接时(linking)发生的错误.有 ...

  9. ble_app_hrs心率程序 nrf51822

    所用程序为: H:\keil\ARM\Device\Nordic\nrf51822\Board\pca10001\s110\ble_app_hrs 上面的路径是安装sdk之后生成在keil软件所在目录 ...

  10. 【Django】MEDIA的配置及用法

    如果需要在数据库中存储图片或视频类的数据,我们可以配置MEDIA. 下面的示例将以上传一张图片的形式来说明MEDIA的配置及用法. 第一步 settings.py # media配置 MEDIA_UR ...