Java:使用ArrayList检查重复行上的CSV文件
我有一个包含以下内容的CSV文件: 2017-10-29 00:00:00.01005,-10227,0,0,0332894,0,0222332894222332894 2017-10-29 00:00:00.0,1010,-125529,0,0,0,420743,0,0,256,420743,256,420743 2017-10-29 00:00:00.0,1005,-10227,0,0,0,332894,0,0,222,332894,222,332894 2017-10-29 00:00:00.0,1013,-10625,0,0,-687,599098,0,0,379,599098,379,599098 2017-10-29 00:00:00.01604,-1794.9,0,0,-3.994081.07,0,03614081.073614081.07 所以第1行和第3行是重复的。 现在我想在控制台中读取文件并打印出重复的行 我设置了这个Java代码,读取文件并将其逐行放入ArrayList。然后我创建一个不可变的 复制、循环遍历ArrayList,并在binarySearch中使用ArrayList的不可变副本:Java:使用ArrayList检查重复行上的CSV文件,java,csv,arraylist,Java,Csv,Arraylist,我有一个包含以下内容的CSV文件: 2017-10-29 00:00:00.01005,-10227,0,0,0332894,0,0222332894222332894 2017-10-29 00:00:00.0,1010,-125529,0,0,0,420743,0,0,256,420743,256,420743 2017-10-29 00:00:00.0,1005,-10227,0,0,0,332894,0,0,222,332894,222,332894 2017-10-29 00:00:0
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class ReadValidationFile {
public static void main(String[] args) {
List<String> validationFile = new ArrayList<>();
try(BufferedReader br = new BufferedReader(new FileReader("validation_small.csv"));){
String line;
while((line = br.readLine())!= null){
validationFile.add(line);
}
} catch (FileNotFoundException e) {
//e.printStackTrace();
System.out.println("file not found " + e.getMessage());
} catch (IOException e) {
e.printStackTrace();
}
List<String> validationFileCopy = Collections.unmodifiableList(validationFile);
for(String line : validationFile){
int comp = Collections.binarySearch(validationFileCopy,line,new ComparatorLine());
if (comp <= 0){
System.out.println(line);
}
}
}
}
但我得到的结果是:
2017-10-29 00:00:00.0,"1010",-125529,0,0,0,420743,0,0,256,420743,256,420743
你能帮我看看我做错了什么吗?我觉得我的比较仪还可以。我的头发怎么了
ArrayList
谢谢,,
Peter从输入csv文件中读取行时创建一个集合,任何时候向集合添加元素都会返回假打印行,因为该行是重复行
如果需要所有重复行的列表,请创建一个列表,该列表中的行在尝试添加到集合时返回false
注:
我用静态数据模拟了你的文件读取。
请注意,如果数据只包含数字而不包含字母,则不需要区分大小写的比较。
如果您的数据包含字母表,那么您也不需要特殊的比较器,因为您可以使用addline.toLowerCase将数据插入集合,这将确保所有行都以小写字母进行比较,然后添加到集合中。
其他答案正确地指出您应该使用Set而不是List。但是为了学习,让我们看看你的代码,看看哪里出了问题
public class ReadValidationFile {
public static void main(String[] args) {
List<String> validationFile = new ArrayList<>();
try(BufferedReader br = new BufferedReader(new FileReader("validation_small.csv"));){
这一切只需一行即可实现:List validationFile=Files.readAllLinesPaths.getvalidation_small.csv,utf-8
您也可以只搜索validationFile本身。但是,您正在调用binarySearch,它只对已排序的列表起作用,但您的列表没有排序。看
如果在比较给定数据中的字符串时确实需要忽略大小写,那么它看起来没有任何区别,因为它只是数字,然后通过先大写然后小写来存储每个唯一的行。这种显然很麻烦的技术是必要的,因为在处理非英语文本时,仅仅小写是不够的。equalsIgnoreCase方法也可以这样做
public static void main(String[] args) throws Exception {
Set<String> uniqueLines = new HashSet<>();
Files.lines(Paths.get("", "utf-8"))
.filter(line -> !uniqueLines.add(line.toUpperCase().toLowerCase()))
.forEach(System.out::println);
}
实际上,您不需要两个循环。在第一个循环本身中,将其添加到集合而不是列表。如果“添加”对某行返回false,则还将添加到重复行列表中。。。然后打印重复列表..谢谢可选,我将while循环中的代码修改为boolean add=validationFile.addline;如果添加{System.out.printlnline;}它工作得很好谢谢DodgyCodeException花时间指出我的缺陷。正如我在上面的评论中所说,我现在对它进行了调整。我会继续写你的评论。向你问好,彼得
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.stream.Collectors;
public class ReadValidationFile {
static List<String> validationFile = new ArrayList<>();
static {
validationFile.add("2017-10-29 00:00:00.0,\"1005\",-10227,0,0,0,332894,0,0,222,332894,222,332894");
validationFile.add("2017-10-29 00:00:00.0,\"1010\",-125529,0,0,0,420743,0,0,256,420743,256,420743");
validationFile.add("2017-10-29 00:00:00.0,\"1005\",-10227,0,0,0,332894,0,0,222,332894,222,332894");
validationFile.add("2017-10-29 00:00:00.0,\"1013\",-10625,0,0,-687,599098,0,0,379,599098,379,599098");
validationFile.add("2017-10-29 00:00:00.0,\"1604\",-1794.9,0,0,-3.99,4081.07,0,0,361,4081.07,361,4081.07");
}
public static void main(String[] args) {
// Option 1 : unique lines only
Set<String> uniqueLinesOnly = new HashSet<>(validationFile);
// Option 2 : unique lines and duplicate lines
Set<String> uniqueLines = new HashSet<>();
Set<String> duplicateLines = new HashSet<>();
for (String line : validationFile) {
if (!uniqueLines.add(line.toLowerCase())) {
duplicateLines.add(line.toLowerCase());
}
}
// Option 3 : unique lines and duplicate lines by Java Streams
Set<String> uniquesJava8 = new HashSet<>();
List<String> duplicatesJava8 = validationFile
.stream()
.filter(element -> !uniquesJava8.add(element.toLowerCase()))
.map(element -> element.toLowerCase())
.collect(Collectors.toList());
}
}
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.stream.Collectors;
public class ReadValidationFile {
public static void main(String[] args){
List<String> validationFile = new ArrayList<>();
try(BufferedReader br = new BufferedReader(new FileReader("validation_small.csv"));){
String line;
while((line = br.readLine())!= null){
validationFile.add(line);
}
} catch (FileNotFoundException e) {
//e.printStackTrace();
System.out.println("file not found " + e.getMessage());
} catch (IOException e) {
e.printStackTrace();
}
Set<String> uniques = new HashSet<>();
List<String> duplicates = validationFile.stream().filter(i->!uniques.add(i)).collect(Collectors.toList());
System.out.println(duplicates);
}
}
public class ReadValidationFile {
public static void main(String[] args) {
List<String> validationFile = new ArrayList<>();
try(BufferedReader br = new BufferedReader(new FileReader("validation_small.csv"));){
String line;
while((line = br.readLine())!= null){
validationFile.add(line);
}
} catch (FileNotFoundException e) {
//e.printStackTrace();
System.out.println("file not found " + e.getMessage());
} catch (IOException e) {
e.printStackTrace();
}
List<String> validationFileCopy = Collections.unmodifiableList(validationFile);
for(String line : validationFile){
int comp = Collections.binarySearch(validationFileCopy,line,new ComparatorLine());
if (comp <= 0){
System.out.println(line);
}
public static void main(String[] args) throws Exception {
Set<String> uniqueLines = new HashSet<>();
Files.lines(Paths.get("", "utf-8"))
.filter(line -> !uniqueLines.add(line))
.forEach(System.out::println);
}
public static void main(String[] args) throws Exception {
Set<String> uniqueLines = new HashSet<>();
Files.lines(Paths.get("", "utf-8"))
.filter(line -> !uniqueLines.add(line.toUpperCase().toLowerCase()))
.forEach(System.out::println);
}