How would you remove duplicate lines from a file that is  much too large to fit in memory? The duplicate lines are not necessarily adjacent, and say the file is 10 times bigger than RAM.

A better solution is to use HashSet to store each line of input.txt. As set ignores duplicate values, so while storing a line, check if it already present in hashset. Write it to output.txt only if not present in hashset.

Java:

// Efficient Java program to remove
// duplicates from input.txt and
// save output to output.txt import java.io.*;
import java.util.HashSet; public class FileOperation
{
public static void main(String[] args) throws IOException
{
// PrintWriter object for output.txt
PrintWriter pw = new PrintWriter("output.txt"); // BufferedReader object for input.txt
BufferedReader br = new BufferedReader(new FileReader("input.txt")); String line = br.readLine(); // set store unique values
HashSet<String> hs = new HashSet<String>(); // loop for each line of input.txt
while(line != null)
{
// write only if not
// present in hashset
if(hs.add(line))
pw.println(line); line = br.readLine(); } pw.flush(); // closing resources
br.close();
pw.close(); System.out.println("File operation performed successfully");
}
}

  

How to remove duplicate lines in a large text file?的更多相关文章

  1. notepad++ remove duplicate line

    To remove duplicate lines just press Ctrl + F, select the “Replace” tab and in the “Find” field, pla ...

  2. Compare, sort, and delete duplicate lines in Notepad ++

    Compare, sort, and delete duplicate lines in Notepad ++ Organize Lines: Since version 6.5.2 the app ...

  3. [LeetCode] Remove Duplicate Letters 移除重复字母

    Given a string which contains only lowercase letters, remove duplicate letters so that every letter ...

  4. 316. Remove Duplicate Letters

    Given a string which contains only lowercase letters, remove duplicate letters so that every letter ...

  5. Remove Duplicate Letters I & II

    Remove Duplicate Letters I Given a string which contains only lowercase letters, remove duplicate le ...

  6. LeetCode Remove Duplicate Letters

    原题链接在这里:https://leetcode.com/problems/remove-duplicate-letters/ 题目: Given a string which contains on ...

  7. leetcode@ [316] Remove Duplicate Letters (Stack & Greedy)

    https://leetcode.com/problems/remove-duplicate-letters/ Given a string which contains only lowercase ...

  8. Remove Duplicate Letters

    316. Remove Duplicate Letters Total Accepted: 2367 Total Submissions: 12388 Difficulty: Medium Given ...

  9. [Swift]LeetCode316. 去除重复字母 | Remove Duplicate Letters

    Given a string which contains only lowercase letters, remove duplicate letters so that every letter ...

随机推荐

  1. Python&Selenium借助HTMLTestRunner生成自动化测试报告

    一.摘要 本篇博文介绍Python和Selenium进行自动化测试时,借助著名的HTMLTestRunner生成自动化测试报告 HTMLTestRunner.py百度很多,版本也很多,自行搜索下载放到 ...

  2. Appium Desired Capabilities-Android Only

    Android Only These Capabilities are available only on Android-baseddrivers (like UiAutomator2for exa ...

  3. Go语言关于Type Assertions的疑问

    我在"The Go Programming Language Specification"中读到了关于x.(T)这样的语法可以对变量是否符合某一type或interface进行判断 ...

  4. 用idea操作svn

    使用SVN前提必须安装好服务端和客户端,或者知道服务端的url才能对服务器中的文件进行操作. 服务端:SVN service 客户端:TortoiseSVN 提交 第一步:确认SVN 服务器是否开启 ...

  5. Force git to overwrite local files on pull 使用pull强制覆盖本地文件 转载自:http://snowdream.blog.51cto.com/3027865/1102441

    How do I force an overwrite of local files on a git pull? I think this is the right way: $ git fetch ...

  6. mongodb命令---创建数据库,插入文档,更新文档记录

    创建数据库----基本就是使用隐式创建 例如 use 你定义的数据库名, use dingsmongo  如果你使用的是studio 3T软件,那直接选中右侧的地址栏点击右键选择Add Databas ...

  7. Jquery开发&BootStrap 实现“todolist项目”

    作业题目:实现“todolist项目” 作业需求: 基础需求:85%参考链接http://www.todolist.cn/1. 将用户输入添加至待办项2. 可以对todolist进行分类(待办项和已完 ...

  8. git + idea 配置 github设置ssh免登陆方式提交拉取代码

    1.下载安装git,官网:https://git-scm.com/download/win  安装默认配置安装 git2.20版本地址百度网盘地址: 链接:https://pan.baidu.com/ ...

  9. 题解 [CF803C] Maximal GCD

    题面 解析 一开始以为这题很难的... 其实只要设\(d\)为\(a\)的最大公因数, 即\(a[i]=s[i]*d\), 因为\(n=\sum_{i=1}^{n}a[i]=\sum_{i=1}^ns ...

  10. Java xml和map,list格式的转换-摘抄

    import java.io.ByteArrayOutputStream; import java.util.ArrayList; import java.util.HashMap; import j ...