Java Docx4j两个单词文档之间的差异_Java_Ms Word_Diff_Docx4j

Java Docx4j两个单词文档之间的差异

java ms-word

Java Docx4j两个单词文档之间的差异,java,ms-word,diff,docx4j,Java,Ms Word,Diff,Docx4j,我需要检查两个单词docx文件之间的差异。Iam使用docx4j。首先，我必须更改SmartXMLFormatter： public SmartXMLFormatter(Writer w) throws IOException { this.xml = new XMLWriterNSImpl(w, false); if (this.writeXMLDeclaration) { this.xml.xmlDecl(); this.writeXMLDe

我需要检查两个单词docx文件之间的差异。Iam使用docx4j。首先，我必须更改SmartXMLFormatter：

    public SmartXMLFormatter(Writer w) throws IOException {
    this.xml = new XMLWriterNSImpl(w, false);
    if (this.writeXMLDeclaration) {
      this.xml.xmlDecl();
      this.writeXMLDeclaration = false;
    }

    this.xml.setPrefixMapping("http://schemas.openxmlformats.org/wordprocessingml/2006/main", "w");
    this.xml.setPrefixMapping("http://schemas.microsoft.com/office/word/2010/wordml", "w14");
    this.xml.setPrefixMapping("http://schemas.microsoft.com/office/word/2012/wordml", "w15");
    this.xml.setPrefixMapping("http://schemas.openxmlformats.org/officeDocument/2006/relationships", "r");
    this.xml.setPrefixMapping("http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing", "wp");
    this.xml.setPrefixMapping("http://schemas.openxmlformats.org/drawingml/2006/main", "a");
    this.xml.setPrefixMapping("http://schemas.openxmlformats.org/drawingml/2006/picture", "pic");

    this.xml.setPrefixMapping(Constants.BASE_NS_URI, "dfx");
    this.xml.setPrefixMapping(Constants.DELETE_NS_URI, "del");
    this.xml.setPrefixMapping(Constants.INSERT_NS_URI, "ins");
  }

在我没有俄语字母的情况下更改代码后，一切都很好。但当我用俄语字符区分2个docx文档时，会出现以下异常：

    org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 10510; Präfix "w14" für Attribut "w14:paraId", das mit Elementtyp "w:p" verknüpft ist, ist nicht gebunden.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(Unknown Source)
    at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(Unknown Source)
    at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(Unknown Source)
    at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:381)
    at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:361)
    at docx4jDiff.CompareDocumentsUsingDriver.main(CompareDocumentsUsingDriver.java:88)
Exception in thread "main" javax.xml.bind.UnmarshalException
 - with linked exception:
[org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 10510; Präfix "w14" für Attribut "w14:paraId", das mit Elementtyp "w:p" verknüpft ist, ist nicht gebunden.]
    at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.createUnmarshalException(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(Unknown Source)
    at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(Unknown Source)
    at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(Unknown Source)
    at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:381)
    at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:361)
    at docx4jDiff.CompareDocumentsUsingDriver.main(CompareDocumentsUsingDriver.java:88)
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 10510; Präfix "w14" für Attribut "w14:paraId", das mit Elementtyp "w:p" verknüpft ist, ist nicht gebunden.
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
    ... 7 more

有人能帮我吗

以下是主要代码：

    public class CompareDocumentsUsingDriver {

    public static JAXBContext context = org.docx4j.jaxb.Context.jc;

    /**
     * @param args
     */
    public static void main(String[] args) throws Exception {
        System.setProperty("file.encoding", "UTF-8");

        String newerfilepath = "B.docx";
        String olderfilepath = "A.docx";

        // 1. Load the Packages
        WordprocessingMLPackage newerPackage = WordprocessingMLPackage
                .load(new java.io.File(newerfilepath));
        WordprocessingMLPackage olderPackage = WordprocessingMLPackage
                .load(new java.io.File(olderfilepath));

        Body newerBody = ((Document) newerPackage.getMainDocumentPart()
                .getJaxbElement()).getBody();
        Body olderBody = ((Document) olderPackage.getMainDocumentPart()
                .getJaxbElement()).getBody();

        System.out.println("Differencing..");

        // 2. Do the differencing
        StringWriter sw = new StringWriter();

        Docx4jDriver.diff(XmlUtils.marshaltoW3CDomDocument(newerBody)
                .getDocumentElement(),
                XmlUtils.marshaltoW3CDomDocument(olderBody)
                        .getDocumentElement(), sw);
        // The signature which takes Reader objects appears to be broken

        // 3. Get the result

        String contentStr = sw.toString();
        System.out.println("Result: \n\n " + contentStr);

        Body newBody = (Body) XmlUtils.unwrap(XmlUtils.unmarshalString(contentStr));


        // In the general case, you need to handle relationships. Not done here!

        // RelationshipsPart rp =
        // newerPackage.getMainDocumentPart().getRelationshipsPart();
        // handleRels(pd, rp);
        newerPackage.setFontMapper(new IdentityPlusMapper());
        newerPackage.save(new java.io.File("COMPARED.docx"));

    }

    /**
     * In the general case, you need to handle relationships. Although not
     * necessary in this simple example, we do it anyway for the purposes of
     * illustration.
     */
    private static void handleRels(Differencer pd, RelationshipsPart rp) {
        // Since we are going to add rels appropriate to the docs being
        // compared, for neatness and to avoid duplication
        // (duplication of internal part names is fatal in Word,
        // and export xslt makes images internal, though it does avoid
        // duplicating
        // a part ),
        // remove any existing rels which point to images
        List<Relationship> relsToRemove = new ArrayList<Relationship>();
        for (Relationship r : rp.getRelationships().getRelationship()) {
            // Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
            if (r.getType().equals(Namespaces.IMAGE)) {
                relsToRemove.add(r);
            }
ti      }
        for (Relationship r : relsToRemove) {
            rp.removeRelationship(r);
        }

        // Now add the rels we composed
        List<Relationship> newRels = pd.getComposedRels();
        for (Relationship nr : newRels) {
            rp.addRelationship(nr);
        }
    }

}

公共类比较文档使用驱动程序{
公共静态JAXBContext context=org.docx4j.jaxb.context.jc；
/**
*@param args
*/
公共静态void main（字符串[]args）引发异常{
System.setProperty（“file.encoding”、“UTF-8”）；
字符串newerfilepath=“B.docx”；
字符串olderfilepath=“A.docx”；
//1.加载包
WordprocessingMLPackage newerPackage=WordprocessingMLPackage
.load（新的java.io.File（newerfilepath））；
WordprocessingMLPackage olderPackage=WordprocessingMLPackage
.load（新的java.io.File（olderfilepath））；
Body newerBody=（（文档）newerPackage.getMainDocumentPart（）
.getJaxbElement（））.getBody（）；
Body olderBody=（（文档）olderPackage.getMainDocumentPart（）
.getJaxbElement（））.getBody（）；
System.out.println（“差分…”）；
//2.做差分
StringWriter sw=新的StringWriter（）；
Docx4jDriver.diff（XmlUtils.marshaltoW3CDomDocument（newerBody）
.getDocumentElement（），
XmlUtils.marshaltoW3CDomDocument（旧体）
.getDocumentElement（），sw）；
//接收读卡器对象的签名似乎已损坏
//3.得到结果
字符串contentStr=sw.toString（）；
System.out.println（“结果：\n\n”+contentStr）；
Body newBody=（Body）XmlUtils.unwrap（XmlUtils.unmarshalString（contentStr））；
//在一般情况下，你需要处理关系。这里不做！
//关系Spart rp=
//newerPackage.getMainDocumentPart（）.getRelationshipsPart（）；
//handleRels（pd，rp）；
setFontMapper（新的IdentityPlusMapper（））；
保存（新的java.io.File（“COMPARED.docx”）；
}
/**
*在一般情况下，你需要处理关系。虽然不是
*在这个简单的例子中，我们这样做是为了
*插图。
*/
专用静态无效手柄（差速器pd，RelationshipsPart rp）{
//因为我们将要添加与正在处理的文档相适应的rel
//相比之下，为了整洁和避免重复
//（内部零件名称的重复在Word中是致命的，
//导出xslt使图像成为内部图像，尽管它确实避免了
//复制
//一部分），
//删除指向图像的任何现有REL
List relsToRemove=new ArrayList（）；
对于（关系r:rp.getRelationships（）.getRelationship（））{
//类型=”http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
if（r.getType（）.equals（Namespaces.IMAGE））{
relsToRemove.add（r）；
}
ti}
对于（关系r:relsToRemove）{
rp.关系（r）；
}
//现在添加我们编写的rel
List newRels=pd.getComposedRels（）；
for（关系编号：newRels）{
rp.addRelationship（nr）；
}
}
}

致以最良好的祝愿

提姆

编辑：

publicstaticvoidopenresult（字符串nodename，Writer out）抛出IOException{
//一般来说，我们需要避免直接写出来。。。
//因为它可以在格式化程序输出到达之前发生
//未正确声明名称空间：
//4种选择：
// 1:
//OpenElementEvent containerOpen=new OpenElementEventNSImpl（xml1.getNamespaceURI（），rootNodeName）；
//formatter.format（containerOpen）；
////AttributeEvent wNS=新的AttributeEventSimpl（“http://www.w3.org/2000/xmlns/“，“w”，
// //       "http://schemas.openxmlformats.org/wordprocessingml/2006/main");
////formatter.format（wNS）；
//但是AttributeEvent在这个过程中太晚了，无法设置映射。
//所以你可以评论一下。
//但是您仍然需要在中添加w:和其他名称空间
//SmartXMLFormatter构造函数。因此，我们不妨做2：
//2：将所有已知名称空间粘贴到上面的根元素上
//3：修复SmartXMLFormatter
//选择2.因为这是明确的
out.追加（“”）；
}

好的，我自己解决了。我向Docx4jDriver中添加了以下代码：请参见上文：D抱歉，我是stackoverflow的新手；）但是现在当我运行到文档时，每一个都运行得很完美，但是当文件中存在一些差异时，所有的内容都会生成。但有人说我失败了…我想没有人能帮我。。。

public static void openResult(String nodename,  Writer out) throws IOException {
        // In general, we need to avoid writing directly to Writer out...
        // since it can happen before formatter output gets there

        // namespaces not properly declared:
        // 4 options:
        // 1:
        // OpenElementEvent containerOpen = new OpenElementEventNSImpl(xml1.getNamespaceURI(), rootNodeName);
        // formatter.format(containerOpen);
        // // AttributeEvent wNS = new AttributeEventNSImpl("http://www.w3.org/2000/xmlns/" , "w",
        // //       "http://schemas.openxmlformats.org/wordprocessingml/2006/main");
        // // formatter.format(wNS);
        // but AttributeEvent is too late in the process to set the mapping.
        // so you can comment that out.
        // But you still have to add w: and the other namespaces in
        // SmartXMLFormatter constructor. So may as well do 2.:
        // 2: stick all known namespaces on our root element above
        // 3: fix SmartXMLFormatter
        // Go with option 2 .. since this is clear
        out.append("<" + nodename
                + " xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\""  // w: namespace
                + " xmlns:a=\"http://schemas.openxmlformats.org/drawingml/2006/main\""
                + " xmlns:pic=\"http://schemas.openxmlformats.org/drawingml/2006/picture\""
                + " xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\""
                + " xmlns:v=\"urn:schemas-microsoft-com:vml\""
                + " xmlns:w14=\"http://schemas.microsoft.com/office/word/2010/wordml\""
                + " xmlns:w15=\"http://schemas.microsoft.com/office/word/2012/wordml\""
                + " xmlns:w10=\"urn:schemas-microsoft-com:office:word\""
                + " xmlns:wp=\"http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing\""
                + " xmlns:dfx=\"" + Constants.BASE_NS_URI + "\""  // Add these, since SmartXMLFormatter only writes them on the first fragment
                + " xmlns:del=\"" + Constants.DELETE_NS_URI + "\""
                + " xmlns:ins=\"" + Constants.BASE_NS_URI + "\""
                        + " >" );
    }