Java:使用ArrayList检查重复行上的CSV文件

Java:使用ArrayList检查重复行上的CSV文件,java,csv,arraylist,Java,Csv,Arraylist,我有一个包含以下内容的CSV文件: 2017-10-29 00:00:00.01005,-10227,0,0,0332894,0,0222332894222332894 2017-10-29 00:00:00.0,1010,-125529,0,0,0,420743,0,0,256,420743,256,420743 2017-10-29 00:00:00.0,1005,-10227,0,0,0,332894,0,0,222,332894,222,332894 2017-10-29 00:00:0

我有一个包含以下内容的CSV文件:

2017-10-29 00:00:00.01005,-10227,0,0,0332894,0,0222332894222332894 2017-10-29 00:00:00.0,1010,-125529,0,0,0,420743,0,0,256,420743,256,420743 2017-10-29 00:00:00.0,1005,-10227,0,0,0,332894,0,0,222,332894,222,332894 2017-10-29 00:00:00.0,1013,-10625,0,0,-687,599098,0,0,379,599098,379,599098 2017-10-29 00:00:00.01604,-1794.9,0,0,-3.994081.07,0,03614081.073614081.07

所以第1行和第3行是重复的。 现在我想在控制台中读取文件并打印出重复的行

我设置了这个Java代码,读取文件并将其逐行放入ArrayList。然后我创建一个不可变的 复制、循环遍历ArrayList,并在binarySearch中使用ArrayList的不可变副本:

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class ReadValidationFile {

public static void main(String[] args) {

    List<String> validationFile = new ArrayList<>();

    try(BufferedReader br = new BufferedReader(new FileReader("validation_small.csv"));){

        String line;
        while((line = br.readLine())!= null){
            validationFile.add(line);
        }

    } catch (FileNotFoundException e) {
        //e.printStackTrace();
        System.out.println("file not found " + e.getMessage());
    } catch (IOException e) {
        e.printStackTrace();
    }

    List<String> validationFileCopy = Collections.unmodifiableList(validationFile);

    for(String line : validationFile){
        int comp = Collections.binarySearch(validationFileCopy,line,new ComparatorLine());
        if (comp <= 0){
            System.out.println(line);
        }

    }
}
}
但我得到的结果是:

2017-10-29 00:00:00.0,"1010",-125529,0,0,0,420743,0,0,256,420743,256,420743
你能帮我看看我做错了什么吗?我觉得我的比较仪还可以。我的头发怎么了 ArrayList

谢谢,, Peter

从输入csv文件中读取行时创建一个集合,任何时候向集合添加元素都会返回假打印行,因为该行是重复行

如果需要所有重复行的列表,请创建一个列表,该列表中的行在尝试添加到集合时返回false

注:

我用静态数据模拟了你的文件读取。 请注意,如果数据只包含数字而不包含字母,则不需要区分大小写的比较。 如果您的数据包含字母表,那么您也不需要特殊的比较器,因为您可以使用addline.toLowerCase将数据插入集合,这将确保所有行都以小写字母进行比较,然后添加到集合中。
其他答案正确地指出您应该使用Set而不是List。但是为了学习,让我们看看你的代码,看看哪里出了问题

public class ReadValidationFile {

public static void main(String[] args) {

    List<String> validationFile = new ArrayList<>();

    try(BufferedReader br = new BufferedReader(new FileReader("validation_small.csv"));){
这一切只需一行即可实现:List validationFile=Files.readAllLinesPaths.getvalidation_small.csv,utf-8

您也可以只搜索validationFile本身。但是,您正在调用binarySearch,它只对已排序的列表起作用,但您的列表没有排序。看

如果在比较给定数据中的字符串时确实需要忽略大小写,那么它看起来没有任何区别,因为它只是数字,然后通过先大写然后小写来存储每个唯一的行。这种显然很麻烦的技术是必要的,因为在处理非英语文本时,仅仅小写是不够的。equalsIgnoreCase方法也可以这样做

public static void main(String[] args) throws Exception {
    Set<String> uniqueLines = new HashSet<>();
    Files.lines(Paths.get("", "utf-8"))
            .filter(line -> !uniqueLines.add(line.toUpperCase().toLowerCase()))
            .forEach(System.out::println);
}

实际上,您不需要两个循环。在第一个循环本身中,将其添加到集合而不是列表。如果“添加”对某行返回false,则还将添加到重复行列表中。。。然后打印重复列表..谢谢可选,我将while循环中的代码修改为boolean add=validationFile.addline;如果添加{System.out.printlnline;}它工作得很好谢谢DodgyCodeException花时间指出我的缺陷。正如我在上面的评论中所说,我现在对它进行了调整。我会继续写你的评论。向你问好,彼得
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.stream.Collectors;

public class ReadValidationFile {
    static List<String> validationFile = new ArrayList<>();
    static {
        validationFile.add("2017-10-29 00:00:00.0,\"1005\",-10227,0,0,0,332894,0,0,222,332894,222,332894");
        validationFile.add("2017-10-29 00:00:00.0,\"1010\",-125529,0,0,0,420743,0,0,256,420743,256,420743");
        validationFile.add("2017-10-29 00:00:00.0,\"1005\",-10227,0,0,0,332894,0,0,222,332894,222,332894");
        validationFile.add("2017-10-29 00:00:00.0,\"1013\",-10625,0,0,-687,599098,0,0,379,599098,379,599098");
        validationFile.add("2017-10-29 00:00:00.0,\"1604\",-1794.9,0,0,-3.99,4081.07,0,0,361,4081.07,361,4081.07");
    }

    public static void main(String[] args) {
        // Option 1 : unique lines only 
        Set<String> uniqueLinesOnly = new HashSet<>(validationFile);

        // Option 2 : unique lines and duplicate lines 
        Set<String> uniqueLines = new HashSet<>();
        Set<String> duplicateLines = new HashSet<>();
        for (String line : validationFile) {
            if (!uniqueLines.add(line.toLowerCase())) {
                duplicateLines.add(line.toLowerCase());
            }
        }

        // Option 3 : unique lines and duplicate lines by Java Streams
        Set<String> uniquesJava8 = new HashSet<>();
        List<String> duplicatesJava8 = validationFile
                                    .stream()
                                    .filter(element -> !uniquesJava8.add(element.toLowerCase()))
                                    .map(element -> element.toLowerCase())
                                    .collect(Collectors.toList());
    }
}
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.stream.Collectors;

public class ReadValidationFile {
    public static void main(String[] args){       
        List<String> validationFile = new ArrayList<>();
        try(BufferedReader br = new BufferedReader(new FileReader("validation_small.csv"));){
            String line;
            while((line = br.readLine())!= null){
                validationFile.add(line);
            }
        } catch (FileNotFoundException e) {
            //e.printStackTrace();
            System.out.println("file not found " + e.getMessage());
        } catch (IOException e) {
            e.printStackTrace();
        }
        Set<String> uniques = new HashSet<>();        
        List<String> duplicates = validationFile.stream().filter(i->!uniques.add(i)).collect(Collectors.toList());
        System.out.println(duplicates);
    }
}
public class ReadValidationFile {

public static void main(String[] args) {

    List<String> validationFile = new ArrayList<>();

    try(BufferedReader br = new BufferedReader(new FileReader("validation_small.csv"));){
        String line;
        while((line = br.readLine())!= null){
            validationFile.add(line);
        }
    } catch (FileNotFoundException e) {
        //e.printStackTrace();
        System.out.println("file not found " + e.getMessage());
    } catch (IOException e) {
        e.printStackTrace();
    }

    List<String> validationFileCopy = Collections.unmodifiableList(validationFile);
    for(String line : validationFile){
        int comp = Collections.binarySearch(validationFileCopy,line,new ComparatorLine());
        if (comp <= 0){
            System.out.println(line);
        }
public static void main(String[] args) throws Exception {
    Set<String> uniqueLines = new HashSet<>();
    Files.lines(Paths.get("", "utf-8"))
            .filter(line -> !uniqueLines.add(line))
            .forEach(System.out::println);
}
public static void main(String[] args) throws Exception {
    Set<String> uniqueLines = new HashSet<>();
    Files.lines(Paths.get("", "utf-8"))
            .filter(line -> !uniqueLines.add(line.toUpperCase().toLowerCase()))
            .forEach(System.out::println);
}