Java 从二进制文件中读取大量整数的最快方法_Java_Io_Nio

Java 从二进制文件中读取大量整数的最快方法

java io

Java 从二进制文件中读取大量整数的最快方法,java,io,nio,Java,Io,Nio,我在嵌入式Linux设备上使用Java1.5，希望读取一个具有2MB int值的二进制文件。（现在是4字节的Big-Endian，但我可以决定格式）使用DataInputStream通过BufferedInputStream使用dis.readInt（）），这50万次调用需要17秒才能读取，但读取到一个大字节缓冲区的文件需要5秒如何更快地将该文件读入一个大整数[]？读取过程不应额外使用超过512 kb的内存下面使用nio的代码并不比java io中的readInt（）方法快 //

我在嵌入式Linux设备上使用Java1.5，希望读取一个具有2MB int值的二进制文件。（现在是4字节的Big-Endian，但我可以决定格式）

使用

DataInputStream

通过

BufferedInputStream

使用

dis.readInt（）

），这50万次调用需要17秒才能读取，但读取到一个大字节缓冲区的文件需要5秒

如何更快地将该文件读入一个大整数[]？

读取过程不应额外使用超过512 kb的内存

下面使用

nio

的代码并不比java io中的readInt（）方法快

    // asume I already know that there are now 500 000 int to read:
    int numInts = 500000;
    // here I want the result into
    int[] result = new int[numInts];
    int cnt = 0;

    RandomAccessFile aFile = new RandomAccessFile("filename", "r");
    FileChannel inChannel = aFile.getChannel();

    ByteBuffer buf = ByteBuffer.allocate(512 * 1024);

    int bytesRead = inChannel.read(buf); //read into buffer.

    while (bytesRead != -1) {

      buf.flip();  //make buffer ready for get()

      while(buf.hasRemaining() && cnt < numInts){
       // probably slow here since called 500 000 times
          result[cnt] = buf.getInt();
          cnt++;
      }

      buf.clear(); //make buffer ready for writing
      bytesRead = inChannel.read(buf);
    }


    aFile.close();
    inChannel.close();

//asume我已经知道现在有500000 int需要读取：
整数=500000；
//这里我想把结果输入到
int[]结果=新的int[numInts]；
int-cnt=0；
RandomAccessFile aFile=新的RandomAccessFile（“文件名”、“r”）；
FileChannel inChannel=aFile.getChannel（）；
ByteBuffer buf=ByteBuffer.allocate（512*1024）；
int bytesRead=inChannel.read（buf）//读入缓冲区。
while（字节读取！=-1）{
buf.flip（）；//为get（）准备缓冲区
while（buf.haslaining（）&&cnt


更新：答案评估：
在PC上，采用IntBuffer方法的内存映射是我设置中最快的。

在没有jit的嵌入式设备上，java.io DataiInputStream.readInt（）的速度要快一点（带IntBuffer的MemMap为17秒，而带IntBuffer的MemMap为20秒）
最后结论：
通过算法更改，更容易实现显著的加速。（init的较小文件）
您可以使用nio包中的IntBuffer

通过调用channel.read（intBuffer）

，填充缓冲区

一旦缓冲区已满，您的

intArray

将包含500000个整数

编辑

在意识到通道只支持

ByteBuffer

之后

// asume I already know that there are now 500 000 int to read:
int numInts = 500000;
// here I want the result into
int[] result = new int[numInts];

// 4 bytes per int, direct buffer
ByteBuffer buf = ByteBuffer.allocateDirect( numInts * 4 );

// BIG_ENDIAN byte order
buf.order( ByteOrder.BIG_ENDIAN );

// Fill in the buffer
while ( buf.hasRemaining( ) )
{
   // Per EJP's suggestion check EOF condition
   if( inChannel.read( buf ) == -1 )
   {
       // Hit EOF
       throw new EOFException( );
   }
}

buf.flip( );

// Create IntBuffer view
IntBuffer intBuffer = buf.asIntBuffer( );

// result will now contain all ints read from file
intBuffer.get( result );

我不知道这是否会比Alexander提供的更快，但您可以尝试映射文件

    try (FileInputStream stream = new FileInputStream(filename)) {
        FileChannel inChannel = stream.getChannel();

        ByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
        int[] result = new int[500000];

        buffer.order( ByteOrder.BIG_ENDIAN );
        IntBuffer intBuffer = buffer.asIntBuffer( );
        intBuffer.get(result);
    }

我使用serialize/deserialize、DataInputStream和ObjectInputStream进行了一个相当仔细的实验，它们都基于ByteArrayInputStream以避免IO影响。对于一百万整数，readObject大约是20毫秒，readInt大约是116秒。百万整数数组上的序列化开销为27字节。这是在2013年的MacBook Pro上

话虽如此，对象序列化有点邪恶，你必须用Java程序写出数据。

请检查@Algorithmist我检查了你的链接，但它是从文本文件读取的伯克利有一个大容量IO JNI扩展可用。我没有用过它，但它可能是工作看。目标机器能够多线程吗？是的，但我无法想象如何提高速度；我已经试过了，但是我被困在了“int-bytesRead=inChannel.read（intBuffer）；”这不可编译，我无法将intBuffer传递给inChannel.read（），它会导出一个bytebuffer，因为读取循环不够。如果它遇到过早的EOF，它将永远运行。您应该在

read（）

返回正数时循环。这测试了EOF和hasRemaining（）。在PC上，这是最快的解决方案，但在没有JIT的嵌入式系统上，需要20秒，所以java io仍然是最快的解决方案。有趣…这很有趣，我没有考虑使用writeObject的可能性。writeObject在写入前使用Bits.putInt（）在内部填充字节[]。这可能比简单地调用writeInt（）一百万次要快。（java.nio在PC上比java.io更快，因为它使用DMA访问光盘，而这在嵌入式设备上是不可用的）

    try (FileInputStream stream = new FileInputStream(filename)) {
        FileChannel inChannel = stream.getChannel();

        ByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
        int[] result = new int[500000];

        buffer.order( ByteOrder.BIG_ENDIAN );
        IntBuffer intBuffer = buffer.asIntBuffer( );
        intBuffer.get(result);
    }