Java-确定xml文档的大小_Java_Xml

Java-确定xml文档的大小

java xml

Java-确定xml文档的大小,java,xml,Java,Xml,我有一个从给定URL获取xml文件的简单代码： DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(link); 该代码返回xml文档（org.w3c.dom.document）。我只需要得到结果xml文档的大小。有没有什么优雅的方式来做到这一点，而不涉及第三方罐子以KB或MB为单位的p.S.大小，而不是节点数可能是这样的： document.getTextContent().getBytes().length;

我有一个从给定URL获取xml文件的简单代码：

DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(link);

该代码返回xml文档（org.w3c.dom.document）。我只需要得到结果xml文档的大小。有没有什么优雅的方式来做到这一点，而不涉及第三方罐子

以KB或MB为单位的p.S.大小，而不是节点数

可能是这样的：

document.getTextContent().getBytes().length;

也许是这样：

document.getTextContent().getBytes().length;

您可以这样做：

long start = Runtime.getRuntime().freeMemory();

构造XML文档对象。然后再次调用上述方法

Document ocument = parser.getDocument();

long now = Runtime.getRuntime().freeMemory();

System.out.println(" size of Document "+(now - start) );

您可以这样做：

long start = Runtime.getRuntime().freeMemory();

构造XML文档对象。然后再次调用上述方法

Document ocument = parser.getDocument();

long now = Runtime.getRuntime().freeMemory();

System.out.println(" size of Document "+(now - start) );

一旦将XML文件解析为DOM树，源文档（作为字符串）就不再存在。您只是从该文档中构建了一个节点树，因此无法再从DOM文档中准确确定源文档的大小

你可以；但这是一种非常全面的获取大小的方法，它仍然不能与源文档大小完全匹配

对于您尝试执行的操作，最好的方法是自己下载文档，记录大小，然后使用

InputStream

将XML文件解析到DOM树中后，源文档（作为字符串）就不再存在了。您只是从该文档中构建了一个节点树，因此无法再从DOM文档中准确确定源文档的大小

你可以；但这是一种非常全面的获取大小的方法，它仍然不能与源文档大小完全匹配

对于您尝试执行的操作，最好的方法是自己下载文档，记下大小，然后将其传递给

DocumentBuilder.parse

方法，使用

InputStream
第一个原始版本：将文件加载到本地缓冲区。然后你就知道你的输入有多长了。然后从缓冲区中解析XML：
URL url = new URL("...");
InputStream in = new BufferedInputStream(url.openStream());
ByteArrayOutputStream buffer1 = new ByteArrayOutputStream();
int c = 0;
while((c = in.read()) >= 0) {
  buffer1.write(c);
}

System.out.println(String.format("Length in Bytes: %d", 
    buffer1.toByteArray().length));

ByteArrayInputStream buffer2 = new ByteArrayInputStream(buffer1.toByteArray());

Document doc = DocumentBuilderFactory.newInstance()
    .newDocumentBuilder().parse(buffer2);

缺点是RAM中有额外的缓冲区
第二个更优雅的版本：使用自定义的java.io.FilterInputStream
包装输入流，计算通过它的字节数：
URL url = new URL("...");
CountInputStream in = new CountInputStream(url.openStream());
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in);
System.out.println(String.format("Bytes: %d", in.getCount()));

这是CountInputStream
。将覆盖所有read（）
方法以委托给超类并计算结果字节：
public class CountInputStream extends FilterInputStream {

  private long count = 0L;

  public CountInputStream(InputStream in) {
    super(in);
  }

  public int read() throws IOException {
    final int c = super.read();
    if(c >= 0) {
      count++;
    }
    return c;
  }

  public int read(byte[] b, int off, int len) throws IOException {
    final int bytesRead = super.read(b, off, len);
    if(bytesRead > 0) {
      count += bytesRead;
    }
    return bytesRead;
  }

  public int read(byte[] b) throws IOException {
    final int bytesRead = super.read(b);
    if(bytesRead > 0) {
      count += bytesRead;
    }
    return bytesRead;
  }

  public long getCount() {
    return count;
  }
}

第一个原始版本：将文件加载到本地缓冲区。然后你就知道你的输入有多长了。然后从缓冲区中解析XML：
URL url = new URL("...");
InputStream in = new BufferedInputStream(url.openStream());
ByteArrayOutputStream buffer1 = new ByteArrayOutputStream();
int c = 0;
while((c = in.read()) >= 0) {
  buffer1.write(c);
}

System.out.println(String.format("Length in Bytes: %d", 
    buffer1.toByteArray().length));

ByteArrayInputStream buffer2 = new ByteArrayInputStream(buffer1.toByteArray());

Document doc = DocumentBuilderFactory.newInstance()
    .newDocumentBuilder().parse(buffer2);

缺点是RAM中有额外的缓冲区
第二个更优雅的版本：使用自定义的java.io.FilterInputStream
包装输入流，计算通过它的字节数：
URL url = new URL("...");
CountInputStream in = new CountInputStream(url.openStream());
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in);
System.out.println(String.format("Bytes: %d", in.getCount()));

这是CountInputStream
。将覆盖所有read（）
方法以委托给超类并计算结果字节：
public class CountInputStream extends FilterInputStream {

  private long count = 0L;

  public CountInputStream(InputStream in) {
    super(in);
  }

  public int read() throws IOException {
    final int c = super.read();
    if(c >= 0) {
      count++;
    }
    return c;
  }

  public int read(byte[] b, int off, int len) throws IOException {
    final int bytesRead = super.read(b, off, len);
    if(bytesRead > 0) {
      count += bytesRead;
    }
    return bytesRead;
  }

  public int read(byte[] b) throws IOException {
    final int bytesRead = super.read(b);
    if(bytesRead > 0) {
      count += bytesRead;
    }
    return bytesRead;
  }

  public long getCount() {
    return count;
  }
}

大小以kb为单位？还是以节点数为单位？大小以kb为单位？或者在节点数量上？不，getTextContent返回null，尽管文档已填充：\n不，getTextContent返回null，尽管文档已填充：\n这不起作用-将有很多对象（如DOM节点）分配内存，而不仅仅是一个包含文档内容的字符串。这不起作用-将有很多对象（如DOM节点）分配内存，而不仅仅是包含文档内容的字符串。