Java 如何使用XMLStreamReader获取大文件的进度_Java_Xmlstreamreader

Java 如何使用XMLStreamReader获取大文件的进度

java

Java 如何使用XMLStreamReader获取大文件的进度,java,xmlstreamreader,Java,Xmlstreamreader,我使用下面的代码在hadoop RecordReader中使用XMLStreamReader读取大型xml文件（以GB为单位） public class RecordReader { int progressCouunt = 0; public RecordReader() { XMLInputFactory factory = XMLInputFactory.newInstance(); FSDataInputStream fdDataInputStream = f

我使用下面的代码在hadoop RecordReader中使用XMLStreamReader读取大型xml文件（以GB为单位）

public class RecordReader {
   int progressCouunt = 0;
   public RecordReader() {
    XMLInputFactory factory = XMLInputFactory.newInstance();
    FSDataInputStream fdDataInputStream = fs.open(file); //hdfs file
    try {
          reader = factory.createXMLStreamReader(fdDataInputStream);
    } catch (XMLStreamException exception) {
           throw new RuntimeException("XMLStreamException exception : ", exception);
    }
   }
   @Override
  public float getProgress() throws IOException, InterruptedException {
     return progressCouunt; 
   }
}

我的问题是如何使用XMLStreamReader获取文件的读取进度，因为它不提供任何开始或结束位置来计算进度百分比。我已引用，但无法使用filterReader。

请在此帮助我。

您可以通过扩展

FilterInputStream

来包装

InputStream

public interface InputStreamListener {
    void onBytesRead(long totalBytes);
}

public class PublishingInputStream extends FilterInputStream {
    private final InputStreamListener;
    private long totalBytes = 0;

    public PublishingInputStream(InputStream in, InputStreamListener listener) {
       super(in);
       this.listener = listener;
    }

    @Override
    public int read(byte[] b) {
       int count = super.read(b);
       this.totalBytes += count;
       this.listener.onBytesRead(totalBytes);
    }

    // TODO: override the other read() methods
}

用法

您可以通过扩展

FilterInputStream

来包装

InputStream

public interface InputStreamListener {
    void onBytesRead(long totalBytes);
}

public class PublishingInputStream extends FilterInputStream {
    private final InputStreamListener;
    private long totalBytes = 0;

    public PublishingInputStream(InputStream in, InputStreamListener listener) {
       super(in);
       this.listener = listener;
    }

    @Override
    public int read(byte[] b) {
       int count = super.read(b);
       this.totalBytes += count;
       this.listener.onBytesRead(totalBytes);
    }

    // TODO: override the other read() methods
}

用法

你知道流的完整长度吗？不，使用stax是不可能的，因为它使用拉式流，所以无法获得整个文件大小。我的意思是，从其他地方。因为如果在开始流式传输之前无法确定数据的总长度，则无法跟踪进度。您知道流的完整长度吗？不，使用stax是不可能的，因为它使用拉式流式传输，因此无法获得整个文件大小。我的意思是，从其他地方。因为如果在开始流式传输之前无法确定数据的总长度，则无法跟踪进度。实际上，我使用的是org.apache.hadoop.mapreduce.RecordReader，需要读取其中的进度。你能在这里帮我一下吗？那么，在自定义

InputStreamListener

中更新进度。要获得一个preentage，您需要知道总字节数

InputStream.available（）

不保证返回总字节数（它返回可以在不阻塞的情况下读取的总字节数）。但是你可能会发现这个方法是有效的（取决于InputStream的实现），我已经尝试使用了

.available（）

方法，但是这里的总读取字节数和可用字节数总是相同的。实际上我使用的是org.apache.hadoop.mapreduce.RecordReader，需要读取其中的进度。你能在这里帮我一下吗？那么，在自定义

InputStreamListener

中更新进度。要获得一个preentage，您需要知道总字节数

InputStream.available（）

不保证返回总字节数（它返回可以在不阻塞的情况下读取的总字节数）。但是您可能会发现这个方法是有效的（取决于InputStream实现），我已经尝试使用

.available（）

方法，但是这里的总读取字节数和可用字节数总是相同的。