Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/386.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在Java中逐行读取和写入大型文件的最快方法_Java_Performance_File Io_Bufferedreader - Fatal编程技术网

在Java中逐行读取和写入大型文件的最快方法

在Java中逐行读取和写入大型文件的最快方法,java,performance,file-io,bufferedreader,Java,Performance,File Io,Bufferedreader,我一直在寻找用有限内存(约64MB)的java重新读写大型文件(0.5-1GB)的最快方法。文件中的每一行代表一条记录,因此我需要逐行获取它们。该文件是一个普通文本文件 我尝试了BufferedReader和BufferedWriter,但这似乎不是最好的选择。读写一个大小为0.5 GB的文件大约需要35秒,只需读写,无需处理。我认为这里的瓶颈是写作,因为光是阅读大约需要10秒钟 我试图读取字节数组,但在每个已读取的数组中搜索行需要更多的时间 有什么建议吗? 谢谢我建议查看java.nio包中的

我一直在寻找用有限内存(约64MB)的java重新读写大型文件(0.5-1GB)的最快方法。文件中的每一行代表一条记录,因此我需要逐行获取它们。该文件是一个普通文本文件

我尝试了BufferedReader和BufferedWriter,但这似乎不是最好的选择。读写一个大小为0.5 GB的文件大约需要35秒,只需读写,无需处理。我认为这里的瓶颈是写作,因为光是阅读大约需要10秒钟

我试图读取字节数组,但在每个已读取的数组中搜索行需要更多的时间

有什么建议吗?
谢谢

我建议查看
java.nio
包中的类。对于套接字,非阻塞IO可能更快:

这篇文章的基准测试表明这是正确的:


我怀疑你真正的问题在于你的硬件有限,而你所做的只是软件不会有太大的区别。如果你有足够的内存和CPU,更高级的技巧会有所帮助,但是如果你只是在硬盘上等待,因为文件没有被缓存,那就没什么区别了

顺便说一句:500 MB 10秒或50 MB/秒是硬盘的典型读取速度

尝试运行以下命令,查看系统在什么情况下无法高效缓存文件

public static void main(String... args) throws IOException {
    for (int mb : new int[]{50, 100, 250, 500, 1000, 2000})
        testFileSize(mb);
}

private static void testFileSize(int mb) throws IOException {
    File file = File.createTempFile("test", ".txt");
    file.deleteOnExit();
    char[] chars = new char[1024];
    Arrays.fill(chars, 'A');
    String longLine = new String(chars);
    long start1 = System.nanoTime();
    PrintWriter pw = new PrintWriter(new FileWriter(file));
    for (int i = 0; i < mb * 1024; i++)
        pw.println(longLine);
    pw.close();
    long time1 = System.nanoTime() - start1;
    System.out.printf("Took %.3f seconds to write to a %d MB, file rate: %.1f MB/s%n",
            time1 / 1e9, file.length() >> 20, file.length() * 1000.0 / time1);

    long start2 = System.nanoTime();
    BufferedReader br = new BufferedReader(new FileReader(file));
    for (String line; (line = br.readLine()) != null; ) {
    }
    br.close();
    long time2 = System.nanoTime() - start2;
    System.out.printf("Took %.3f seconds to read to a %d MB file, rate: %.1f MB/s%n",
            time2 / 1e9, file.length() >> 20, file.length() * 1000.0 / time2);
    file.delete();
}
在具有大量内存的windows计算机上

Took 0.395 seconds to write to a 50 MB, file rate: 133.0 MB/s
Took 0.375 seconds to read to a 50 MB file, rate: 140.0 MB/s
Took 0.669 seconds to write to a 100 MB, file rate: 156.9 MB/s
Took 0.569 seconds to read to a 100 MB file, rate: 184.6 MB/s
Took 1.585 seconds to write to a 250 MB, file rate: 165.5 MB/s
Took 1.274 seconds to read to a 250 MB file, rate: 206.0 MB/s
Took 2.513 seconds to write to a 500 MB, file rate: 208.8 MB/s
Took 2.332 seconds to read to a 500 MB file, rate: 225.1 MB/s
Took 5.094 seconds to write to a 1000 MB, file rate: 206.0 MB/s
Took 5.041 seconds to read to a 1000 MB file, rate: 208.2 MB/s
Took 11.509 seconds to write to a 2001 MB, file rate: 182.4 MB/s
Took 9.681 seconds to read to a 2001 MB file, rate: 216.8 MB/s
Took 0.376 seconds to write to a 50 MB, file rate: 139.7 MB/s
Took 0.401 seconds to read to a 50 MB file, rate: 131.1 MB/s
Took 0.517 seconds to write to a 100 MB, file rate: 203.1 MB/s
Took 0.520 seconds to read to a 100 MB file, rate: 201.9 MB/s
Took 1.344 seconds to write to a 250 MB, file rate: 195.4 MB/s
Took 1.387 seconds to read to a 250 MB file, rate: 189.4 MB/s
Took 2.368 seconds to write to a 500 MB, file rate: 221.8 MB/s
Took 2.454 seconds to read to a 500 MB file, rate: 214.1 MB/s
Took 4.985 seconds to write to a 1001 MB, file rate: 210.7 MB/s
Took 5.132 seconds to read to a 1001 MB file, rate: 204.7 MB/s
Took 10.276 seconds to write to a 2003 MB, file rate: 204.5 MB/s
Took 9.964 seconds to read to a 2003 MB file, rate: 210.9 MB/s

我要尝试的第一件事是增加BufferedReader和BufferedWriter的缓冲区大小。默认缓冲区大小没有文档记录,但至少在Oracle VM中是8192个字符,这不会带来太多性能优势

如果您只需要复制文件(不需要实际访问数据),我会放弃读写器方法,直接使用字节数组作为缓冲区处理InputStream和OutputStream:

FileInputStream fis = new FileInputStream("d:/test.txt");
FileOutputStream fos = new FileOutputStream("d:/test2.txt");
byte[] b = new byte[bufferSize];
int r;
while ((r=fis.read(b))>=0) {
    fos.write(b, 0, r);         
}
fis.close();
fos.close();
或者实际使用NIO:

FileChannel in = new RandomAccessFile("d:/test.txt", "r").getChannel();
FileChannel out = new RandomAccessFile("d:/test2.txt", "rw").getChannel();
out.transferFrom(in, 0, Long.MAX_VALUE);
in.close();
out.close();

但是,在对不同的复制方法进行基准测试时,每次运行基准测试之间的差异(持续时间)要比不同实现之间的差异大得多。I/O缓存(在操作系统级别和硬盘缓存上)在这里起着重要作用,很难说什么更快。在我的硬件上,使用BufferedReader和BufferedWriter逐行复制1GB文本文件在某些运行中需要不到5秒的时间,在其他运行中需要30秒以上。

在Java 7中,可以使用Files.readAllLines()和Files.write()方法。以下是一个例子:

List<String> readTextFile(String fileName) throws IOException {
    Path path = Paths.get(fileName);
    return Files.readAllLines(path, StandardCharsets.UTF_8);
}

void writeTextFile(List<String> strLines, String fileName) throws IOException {
    Path path = Paths.get(fileName);
    Files.write(path, strLines, StandardCharsets.UTF_8);
}
List readTextFile(字符串文件名)引发IOException{
Path Path=Path.get(文件名);
返回Files.readAllLines(路径,StandardCharsets.UTF_8);
}
void writeTextFile(列出strLines,字符串文件名)引发IOException{
Path Path=Path.get(文件名);
write(path、strLines、StandardCharsets.UTF_8);
}

我写了一篇内容广泛的文章,介绍了使用1KB到1GB的示例文件相互测试的多种方法,我发现以下3种方法是读取1GB文件最快的方法:

1) java.nio.file.Files.readAllBytes()-读取1GB测试文件只需不到1秒的时间

import java.io.File;
import java.io.IOException;
import java.nio.file.Files;

public class ReadFile_Files_ReadAllBytes {
  public static void main(String [] pArgs) throws IOException {
    String fileName = "c:\\temp\\sample-10KB.txt";
    File file = new File(fileName);

    byte [] fileBytes = Files.readAllBytes(file.toPath());
    char singleChar;
    for(byte b : fileBytes) {
      singleChar = (char) b;
      System.out.print(singleChar);
    }
  }
}
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.util.stream.Stream;

public class ReadFile_Files_Lines {
  public static void main(String[] pArgs) throws IOException {
    String fileName = "c:\\temp\\sample-10KB.txt";
    File file = new File(fileName);

    try (Stream linesStream = Files.lines(file.toPath())) {
      linesStream.forEach(line -&gt; {
        System.out.println(line);
      });
    }
  }
}
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

public class ReadFile_BufferedReader_ReadLine {
  public static void main(String [] args) throws IOException {
    String fileName = "c:\\temp\\sample-10KB.txt";
    FileReader fileReader = new FileReader(fileName);

    try (BufferedReader bufferedReader = new BufferedReader(fileReader)) {
      String line;
      while((line = bufferedReader.readLine()) != null) {
        System.out.println(line);
      }
    }
  }
}
2) java.nio.file.Files.lines()-读取1GB测试文件大约需要3.5秒

import java.io.File;
import java.io.IOException;
import java.nio.file.Files;

public class ReadFile_Files_ReadAllBytes {
  public static void main(String [] pArgs) throws IOException {
    String fileName = "c:\\temp\\sample-10KB.txt";
    File file = new File(fileName);

    byte [] fileBytes = Files.readAllBytes(file.toPath());
    char singleChar;
    for(byte b : fileBytes) {
      singleChar = (char) b;
      System.out.print(singleChar);
    }
  }
}
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.util.stream.Stream;

public class ReadFile_Files_Lines {
  public static void main(String[] pArgs) throws IOException {
    String fileName = "c:\\temp\\sample-10KB.txt";
    File file = new File(fileName);

    try (Stream linesStream = Files.lines(file.toPath())) {
      linesStream.forEach(line -&gt; {
        System.out.println(line);
      });
    }
  }
}
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

public class ReadFile_BufferedReader_ReadLine {
  public static void main(String [] args) throws IOException {
    String fileName = "c:\\temp\\sample-10KB.txt";
    FileReader fileReader = new FileReader(fileName);

    try (BufferedReader bufferedReader = new BufferedReader(fileReader)) {
      String line;
      while((line = bufferedReader.readLine()) != null) {
        System.out.println(line);
      }
    }
  }
}
3) java.io.BufferedReader—读取1GB测试文件大约需要4.5秒

import java.io.File;
import java.io.IOException;
import java.nio.file.Files;

public class ReadFile_Files_ReadAllBytes {
  public static void main(String [] pArgs) throws IOException {
    String fileName = "c:\\temp\\sample-10KB.txt";
    File file = new File(fileName);

    byte [] fileBytes = Files.readAllBytes(file.toPath());
    char singleChar;
    for(byte b : fileBytes) {
      singleChar = (char) b;
      System.out.print(singleChar);
    }
  }
}
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.util.stream.Stream;

public class ReadFile_Files_Lines {
  public static void main(String[] pArgs) throws IOException {
    String fileName = "c:\\temp\\sample-10KB.txt";
    File file = new File(fileName);

    try (Stream linesStream = Files.lines(file.toPath())) {
      linesStream.forEach(line -&gt; {
        System.out.println(line);
      });
    }
  }
}
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

public class ReadFile_BufferedReader_ReadLine {
  public static void main(String [] args) throws IOException {
    String fileName = "c:\\temp\\sample-10KB.txt";
    FileReader fileReader = new FileReader(fileName);

    try (BufferedReader bufferedReader = new BufferedReader(fileReader)) {
      String line;
      while((line = bufferedReader.readLine()) != null) {
        System.out.println(line);
      }
    }
  }
}

这一切都是关于
OutOfMemoryException
,它可以通过Scanner类迭代器有效地处理。它逐行读取文件,而不是批量读取

以下代码解决了该问题:

try(FileInputStream inputStream =new FileInputStream("D:\\File\\test.txt");
  Scanner sc= new Scanner(inputStream, "UTF-8")) {
  while (sc.hasNextLine()) {
    String line = sc.nextLine();
    System.out.println(line);
  }
} catch (IOException e) {
  e.printStackTrace();
}

另请参阅可能的重复:您在这些文件中使用的编码是什么?您的系统默认字符集是什么?我查看了nio,但它只允许从文件中读取数组或缓冲区。处理这个数组以提取行需要更长的时间。这篇文章有一个图表,但我找不到任何关于实际测量的内容。我看到NIO的性能优势的唯一情况是在NIO通道之间使用直接字节缓冲区复制数据。在这种情况下,从Java代码访问数据的速度要慢得多,这样的基准测试的结果或多或少是无用的。首先,在写入文件时,关闭输出流并不能确保所有数据都已物理写入磁盘。它可能仍然潜伏在操作系统级或硬盘上的内存缓冲区中。如果在写入完全相同的文件后直接读取该文件,则数据很可能是从内存缓冲区读取的,而不是从磁盘读取的。根据这个基准测试,我的笔记本硬盘读写速度接近500MB/s,大概是实际性能的10倍左右。我试过代码,速度很快。我想在这里发布,但我不知道如何格式化代码,我是否使用
标记@彼得你的评论太棒了,贾姆乔。很高兴看到您的回答,因为您显然很有知识。@jarnbjo如果我没有明确说明基准测试在做什么,它将测试您在软件写入和读取磁盘缓存方面的能力。它不知道数据是否写入磁盘,甚至不知道您是否有HDD。如果您得到的结果低于这些结果,这是因为您的硬件有限制。@user1785771如果您想更新您的问题,您可以在标题下添加代码,如
Edit:in replay to@PeterLawrey的答案…
谢谢,但我的内存有限,所以我不能真正使用FileChannel方法。为什么不呢?可用内存与使用FileChannel有什么关系?实际上,我需要先处理该文件,然后再对其进行复制。那么,为什么您要写您不处理该文件(仅读/写,不处理)?