Java 如何快速读取以制表符分隔的文件？_Java_Io

Java 如何快速读取以制表符分隔的文件？

java io

Java 如何快速读取以制表符分隔的文件？,java,io,Java,Io,我有一个函数来读取一个以制表符分隔的文件，该文件将每一列放入一个列表中，并返回一个列表列表，其中包含该列中的所有值。对于我使用的1列1850行的小测试文件来说，这很好，但是我现在尝试使用~30k列，它已经运行了几个小时，但仍然没有完成如何修改下面的代码以更快地执行此操作？如果在一个文件中读取30k行和1850列更快，我还可以转换输入文件 public static List<List<String>> readTabDelimited(String filepath)

我有一个函数来读取一个以制表符分隔的文件，该文件将每一列放入一个列表中，并返回一个列表列表，其中包含该列中的所有值。对于我使用的1列1850行的小测试文件来说，这很好，但是我现在尝试使用~30k列，它已经运行了几个小时，但仍然没有完成

如何修改下面的代码以更快地执行此操作？如果在一个文件中读取30k行和1850列更快，我还可以转换输入文件

public static List<List<String>> readTabDelimited(String filepath) {
    List<List<String>> allColumns = new ArrayList<List<String>>();
    try {
        BufferedReader buf = new BufferedReader(new FileReader(filepath));
        String lineJustFetched = null;
        for (;;) {
            lineJustFetched = buf.readLine();
            if (lineJustFetched == null) {
                break;
            }
            lineJustFetched = lineJustFetched.replace("\n", "").replace("\r", "");
            for (int i = 0; i < lineJustFetched.split("\t").length; i++) {
                try {
                    allColumns.get(i).add(lineJustFetched.split("\t")[i]);
                } catch (IndexOutOfBoundsException e) {
                    List<String> newColumn = new ArrayList<String>();
                    newColumn.add(lineJustFetched.split("\t")[i]);
                    allColumns.add(newColumn);
                }
            }
        }
        buf.close();
    } catch (Exception e) {
        e.printStackTrace();
    }
    return allColumns;
}

公共静态列表readTabDelimited（字符串文件路径）{
List allColumns=new ArrayList（）；
试一试{
BufferedReader buf=新的BufferedReader（新文件读取器（文件路径））；
字符串lineJustFetched=null；
对于（；；）{
lineJustFetched=buf.readLine（）；
if（lineJustFetched==null）{
打破
}
lineJustFetched=lineJustFetched.replace（“\n”，”）.replace（“\r”，”）；
对于（int i=0；i

我不确定您的try-catch是否经常触发（我假设触发了3万次），但这是一件非常耗费资源的事情

try {
  allColumns.get(i).add(lineJustFetched.split("\t")[i]);
 } catch (IndexOutOfBoundsException e) {
   List<String> newColumn = new ArrayList<String>();
   newColumn.add(lineJustFetched.split("\t")[i]);
   allColumns.add(newColumn);
 }

因此，我将输入字符串的计算时间从1153229纳秒减少到354714纳秒（快3倍）

String lineJustFetched=" 1 \t 2 \t 3 \t 1 \t 2 \t 3 \t 1 \t 2 \t 3 \t 1 \t 2 \t 3 \t 1 \t 2 \t 3 \t 1 \t 2 \t 3 \t 5";

这个文件有多大？更重要的是，使用a。为什么要调用

split

三次？只需调用一次，存储结果并重新使用它。是否需要将文件的全部内容存储在内存中？解释@StepTNT所说的：添加两个新变量，

String[]lineParts

和

int count

，而不是每次拆分行以获取计数或获取第i项，

String lineParts[]=lineJustFetched.split（“\t”）;和int count=lineParts.length
在replace之后添加这些变量，\n并且\r现在使用for循环中的新变量并读取值，而不是从split（）…everytime@BoristheSpider这个文件是913兆字节，我将查看CSV阅读器。StepTNT和Yazan谢谢你的提示，我将代码更改为只执行一次拆分
 String[] tempList = lineJustFetched.split("\t");
     for (int i = 0; i < tempList.length; i++) {
         if(allColumns.size()>i){
             allColumns.get(i).add(tempList[i]);
         } else {
             List<String> newColumn = new ArrayList<String>();
             newColumn.add(tempList[i]);
             allColumns.add(newColumn);
         }
    }

String lineJustFetched=" 1 \t 2 \t 3 \t 1 \t 2 \t 3 \t 1 \t 2 \t 3 \t 1 \t 2 \t 3 \t 1 \t 2 \t 3 \t 1 \t 2 \t 3 \t 5";