How to remove duplicate lines in a large text file?
How would you remove duplicate lines from a file that is much too large to fit in memory? The duplicate lines are not necessarily adjacent, and say the file is 10 times bigger than RAM.
A better solution is to use HashSet to store each line of input.txt. As set ignores duplicate values, so while storing a line, check if it already present in hashset. Write it to output.txt only if not present in hashset.
Java:
// Efficient Java program to remove
// duplicates from input.txt and
// save output to output.txt import java.io.*;
import java.util.HashSet; public class FileOperation
{
public static void main(String[] args) throws IOException
{
// PrintWriter object for output.txt
PrintWriter pw = new PrintWriter("output.txt"); // BufferedReader object for input.txt
BufferedReader br = new BufferedReader(new FileReader("input.txt")); String line = br.readLine(); // set store unique values
HashSet<String> hs = new HashSet<String>(); // loop for each line of input.txt
while(line != null)
{
// write only if not
// present in hashset
if(hs.add(line))
pw.println(line); line = br.readLine(); } pw.flush(); // closing resources
br.close();
pw.close(); System.out.println("File operation performed successfully");
}
}
How to remove duplicate lines in a large text file?的更多相关文章
- notepad++ remove duplicate line
To remove duplicate lines just press Ctrl + F, select the “Replace” tab and in the “Find” field, pla ...
- Compare, sort, and delete duplicate lines in Notepad ++
Compare, sort, and delete duplicate lines in Notepad ++ Organize Lines: Since version 6.5.2 the app ...
- [LeetCode] Remove Duplicate Letters 移除重复字母
Given a string which contains only lowercase letters, remove duplicate letters so that every letter ...
- 316. Remove Duplicate Letters
Given a string which contains only lowercase letters, remove duplicate letters so that every letter ...
- Remove Duplicate Letters I & II
Remove Duplicate Letters I Given a string which contains only lowercase letters, remove duplicate le ...
- LeetCode Remove Duplicate Letters
原题链接在这里:https://leetcode.com/problems/remove-duplicate-letters/ 题目: Given a string which contains on ...
- leetcode@ [316] Remove Duplicate Letters (Stack & Greedy)
https://leetcode.com/problems/remove-duplicate-letters/ Given a string which contains only lowercase ...
- Remove Duplicate Letters
316. Remove Duplicate Letters Total Accepted: 2367 Total Submissions: 12388 Difficulty: Medium Given ...
- [Swift]LeetCode316. 去除重复字母 | Remove Duplicate Letters
Given a string which contains only lowercase letters, remove duplicate letters so that every letter ...
随机推荐
- Python3学习笔记36-PEP8代码规范
在使用PyCharm时,最右边会有波浪线警告提示代码不符合PEP8代码规范.记录一下犯的错和解决方式 PEP8是风格错误,而不是编码错误.只是为了让代码更具有阅读性. 1)block comment ...
- Appium&Java自动化实现移动端几种典型动作
一.Appium4.0 Pinch&Zoom /* * @FileName Pinch_Zoom: Pinch_Zoom * @author davieyang * @create 2018- ...
- GDB调试器教程
启动和退出GDBGDB(GNU Project Debugger)几乎适用于所有类Unix系统,小巧方便且不失功能强大,Linux/Unix程序员经常用它来调试程序. 总的来说有几下几种方法启动GDB ...
- 鼠标点击自定义文字展现特效JS代码
JS特效使用方法 只需将如下JS代码放到</body>之前就好了 var a_idx = 0; jQuery(document).ready(function($) { $("b ...
- Spring实战(第4版)
第1部分 Spring的核心 Spring的两个核心:依赖注入(dependency injection,DI)和面向切面编程(aspec-oriented programming,AOP) POJO ...
- linux 没有界面内容显示不全解决办法
1.管道 管道简单理解就是,使用管道意味着第一个命令的输出会作为第二个命令的输入,第二个命令的输出又会作为第三个命令的输入,依此类推.利用Linux所提供的管道符“|”将两个命令隔开,管道符左边命令的 ...
- 【2-sat】8.14B. 黑心老板
2-sat 只写过板子 题目大意 有一个长度为$k$取值只有01的序列,现在$n$个人每人下注三个位置,请构造一个序列使每个人最多猜对一个位置 $k\le 5000,n \le 10000$ 题目分析 ...
- Windows下安装配置Apache+PHP+Mysql环境
1.下载相关安装包 Apache下载: http://archive.apache.org/dist/httpd/binaries/win32/ ,选择httpd-2.2.25-win32-x86-n ...
- json注解及序列化
一.json框架 市面上的json框架常用的有 jackson.gson.fastjson.大家比较推崇的是fastjson,但是springmvc默认集成的是 jackson. 在一个项目中建议一个 ...
- Hbuilder + MUI 的简单案例
话不多说 直接上代码 项目结构: index.html 的代码 <!DOCTYPE html><html> <head> <meta ch ...