Java 使用大型base64内容封送XML

Java 使用大型base64内容封送XML,java,xml,jaxb,Java,Xml,Jaxb,我有一个包含base64二进制数据的XML模式。 问题是,如果二进制文件足够大,我会毫不奇怪地得到一个OutOfMemoryError。 我设法生成了受影响的java类,以使用DataHanlder而不是byte[],但仍然使用JAXb seams在RAM中进行封送处理。 使用的模式无法更改,而且非常复杂,因此handy构建XML不是一个解决方案。 我唯一的想法是添加一个占位符,而不是大二进制,然后替换它。但我相信有更好的解决办法 谢谢你的提示 示例架构: <schema xmln

我有一个包含base64二进制数据的XML模式。 问题是,如果二进制文件足够大,我会毫不奇怪地得到一个OutOfMemoryError。 我设法生成了受影响的java类,以使用DataHanlder而不是byte[],但仍然使用JAXb seams在RAM中进行封送处理。 使用的模式无法更改,而且非常复杂,因此handy构建XML不是一个解决方案。 我唯一的想法是添加一个占位符,而不是大二进制,然后替换它。但我相信有更好的解决办法

谢谢你的提示

示例架构:

<schema
    xmlns="http://www.w3.org/2001/XMLSchema"
    targetNamespace="http://example.com/"
    xmlns:xmime="http://www.w3.org/2005/05/xmlmime"
    xmlns:tns="http://example.com/"
    elementFormDefault="qualified">
    <element name="Document">
        <complexType>
            <sequence>
                <element
                    name="text"
                    type="base64Binary"
                    xmime:expectedContentTypes="anything/else" />
            </sequence>
        </complexType>
    </element>
</schema>
package com.example.gen;

import javax.activation.DataHandler;
import javax.xml.bind.annotation.XmlAccessType;
import javax.xml.bind.annotation.XmlAccessorType;
import javax.xml.bind.annotation.XmlElement;
import javax.xml.bind.annotation.XmlMimeType;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.bind.annotation.XmlType;

@XmlAccessorType(XmlAccessType.FIELD)
@XmlType(name = "", propOrder = {
    "text"
})
@XmlRootElement(name = "Document")
public class Document {
    @XmlElement(required = true)
    @XmlMimeType("anything/else")
    protected DataHandler text;
    public DataHandler getText() {
        return text;
    }
    public void setText(DataHandler value) {
        this.text = value;
    }
}
File bigFile = new File("./temp/bigFile.bin");
File outFile = new File("./temp/bigXML.xml");

Document document = new Document();
DataHandler bigDocDH = new DataHandler(new FileDataSource(bigFile));
document.setText(bigDocDH);

JAXBContext jaxbContext = JAXBContext.newInstance("com.example.gen");
Marshaller marshaller = jaxbContext.createMarshaller();

OutputStream outputStream = new FileOutputStream(outFile); 
marshaller.marshal(document, outputStream);
示例代码:

<schema
    xmlns="http://www.w3.org/2001/XMLSchema"
    targetNamespace="http://example.com/"
    xmlns:xmime="http://www.w3.org/2005/05/xmlmime"
    xmlns:tns="http://example.com/"
    elementFormDefault="qualified">
    <element name="Document">
        <complexType>
            <sequence>
                <element
                    name="text"
                    type="base64Binary"
                    xmime:expectedContentTypes="anything/else" />
            </sequence>
        </complexType>
    </element>
</schema>
package com.example.gen;

import javax.activation.DataHandler;
import javax.xml.bind.annotation.XmlAccessType;
import javax.xml.bind.annotation.XmlAccessorType;
import javax.xml.bind.annotation.XmlElement;
import javax.xml.bind.annotation.XmlMimeType;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.bind.annotation.XmlType;

@XmlAccessorType(XmlAccessType.FIELD)
@XmlType(name = "", propOrder = {
    "text"
})
@XmlRootElement(name = "Document")
public class Document {
    @XmlElement(required = true)
    @XmlMimeType("anything/else")
    protected DataHandler text;
    public DataHandler getText() {
        return text;
    }
    public void setText(DataHandler value) {
        this.text = value;
    }
}
File bigFile = new File("./temp/bigFile.bin");
File outFile = new File("./temp/bigXML.xml");

Document document = new Document();
DataHandler bigDocDH = new DataHandler(new FileDataSource(bigFile));
document.setText(bigDocDH);

JAXBContext jaxbContext = JAXBContext.newInstance("com.example.gen");
Marshaller marshaller = jaxbContext.createMarshaller();

OutputStream outputStream = new FileOutputStream(outFile); 
marshaller.marshal(document, outputStream);

好的,我找到了一个适合我的解决方案: 首先,我将指向大文件的DataHandler替换为包含小字节数组作为内容的DataHandler

在此之后,我实现了一个XMLStreamWriterRapper,它将所有方法委托给另一个XMLStreamWriter。如果将具有简单内容的Datahandler的内容写入XMLSteamWriterRapper,我将删除数据并将原始数据流式传输到此位置

建造商和工厂:

/**
 * Constructor.
 * 
 * @param outputStream
 *            {@link #outputStream}
 * @param binaryData
 *            {@link #binaryData}
 * @param token
 *            the search token.
 * @throws XMLStreamException
 *             In case the XMLStreamWriter cannot be constructed.
 */
private XMLStreamWriterWrapper(OutputStream outputStream, DataHandler binaryData, String token) throws XMLStreamException {
    this.xmlStreamWriter = XMLOutputFactory.newFactory().createXMLStreamWriter(outputStream);

    // ensure the OutputStream is buffered. otherwise encoding of large data
    // takes hours.
    if (outputStream instanceof BufferedOutputStream) {
        this.outputStream = outputStream;
    } else {
        this.outputStream = new BufferedOutputStream(outputStream);
    }
    this.binaryData = binaryData;
    // calculate the token.
    byte[] encode = Base64.getEncoder().encode(token.getBytes(Charset.forName("UTF-8")));
    this.tokenAsString = new String(encode, Charset.forName("UTF-8"));
    this.token = this.tokenAsString.toCharArray();
}

/**
 * Factory method to create the {@link XMLStreamWriterWrapper}.
 * 
 * @param outputStream
 *            The OutputStream where to marshal the xml to.
 * @param binaryData
 *            The binary data which shall be streamed to the xml.
 * @param token
 *            The token which akts as placeholder for the binary data.
 * @return The {@link XMLStreamWriterWrapper}
 * @throws XMLStreamException
 *             In case the XMLStreamWriter could not be constructed.
 */
public static XMLStreamWriterWrapper newInstance(OutputStream outputStream, DataHandler binaryData, String token) throws XMLStreamException {
    return new XMLStreamWriterWrapper(outputStream, binaryData, token);
}
writeCharacters实现:

/*
 * (non-Javadoc)
 * 
 * @see javax.xml.stream.XMLStreamWriter#writeCharacters(java.lang.String)
 */
@Override
public void writeCharacters(String text) throws XMLStreamException {
    if (this.tokenAsString.equals(text)) {
        writeCharacters(text.toCharArray(), 0, text.length());
    } else {
        xmlStreamWriter.writeCharacters(text);
    }
}

/*
 * (non-Javadoc)
 * 
 * @see javax.xml.stream.XMLStreamWriter#writeCharacters(char[], int, int)
 */
@Override
public void writeCharacters(char[] text, int start, int len) throws XMLStreamException {
    char[] range = Arrays.copyOfRange(text, 0, len);
    if (Arrays.equals(range, token)) {
        LOGGER.debug("Found replace token. Start streaming binary data.");
        // force the XMLStreamWriter to close the start tag.
        xmlStreamWriter.writeCharacters("");
        try {
            // flush the content of the streams.
            xmlStreamWriter.flush();
            outputStream.flush();
            // do base64 encoding.
            OutputStream wrap = Base64.getMimeEncoder().wrap(outputStream);
            this.binaryData.writeTo(wrap);
        } catch (IOException e) {
            throw new XMLStreamException(e);
        } finally {
            try {
                // flush the output stream
                outputStream.flush();
            } catch (IOException e) {
                throw new XMLStreamException(e);
            }
        }
        LOGGER.debug("Successfully inserted binary data.");
    } else {
        xmlStreamWriter.writeCharacters(text, start, len);
    }
}
用法示例:

//Original file DataHandler
DataHandler bigDocDH = new DataHandler(new FileDataSource(bigFile));

Document document = new Document();
String replaceToken = UUID.randomUUID().toString();
//DataHandler with content replaced by the XMLStreamWriterWrapper
DataHandler tokenDH = new DataHandler(new ByteArrayDataSource(replaceToken.getBytes(Charset.forName("UTF-8")), bigDocDH.getContentType()));
document.setText(tokenDH);

try (OutputStream outStream = new FileOutputStream(outFile)) {
    XMLStreamWriter streamWriter = XMLStreamWriterWrapper.newInstance(outStream, bigDocDH, replaceToken);
    marshaller.marshal(document, streamWriter);
}

您可以添加一些类或示例吗?我已经添加了一个示例。您是否已经尝试增加-Xms1024m和-Xmx1024m?增加堆只是一种解决方法,并不能解决我的问题。我对二进制数据的大小没有影响,并且可能同时执行多个并发序列化。e、 g:对于10个二进制数据为100MB(base64编码约130MB)的并行线程,我需要1300MB堆。