Java Docx4j两个单词文档之间的差异

Java Docx4j两个单词文档之间的差异,java,ms-word,diff,docx4j,Java,Ms Word,Diff,Docx4j,我需要检查两个单词docx文件之间的差异。Iam使用docx4j。 首先,我必须更改SmartXMLFormatter: public SmartXMLFormatter(Writer w) throws IOException { this.xml = new XMLWriterNSImpl(w, false); if (this.writeXMLDeclaration) { this.xml.xmlDecl(); this.writeXMLDe

我需要检查两个单词docx文件之间的差异。Iam使用docx4j。 首先,我必须更改SmartXMLFormatter:

    public SmartXMLFormatter(Writer w) throws IOException {
    this.xml = new XMLWriterNSImpl(w, false);
    if (this.writeXMLDeclaration) {
      this.xml.xmlDecl();
      this.writeXMLDeclaration = false;
    }

    this.xml.setPrefixMapping("http://schemas.openxmlformats.org/wordprocessingml/2006/main", "w");
    this.xml.setPrefixMapping("http://schemas.microsoft.com/office/word/2010/wordml", "w14");
    this.xml.setPrefixMapping("http://schemas.microsoft.com/office/word/2012/wordml", "w15");
    this.xml.setPrefixMapping("http://schemas.openxmlformats.org/officeDocument/2006/relationships", "r");
    this.xml.setPrefixMapping("http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing", "wp");
    this.xml.setPrefixMapping("http://schemas.openxmlformats.org/drawingml/2006/main", "a");
    this.xml.setPrefixMapping("http://schemas.openxmlformats.org/drawingml/2006/picture", "pic");

    this.xml.setPrefixMapping(Constants.BASE_NS_URI, "dfx");
    this.xml.setPrefixMapping(Constants.DELETE_NS_URI, "del");
    this.xml.setPrefixMapping(Constants.INSERT_NS_URI, "ins");
  }
在我没有俄语字母的情况下更改代码后,一切都很好。 但当我用俄语字符区分2个docx文档时,会出现以下异常:

    org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 10510; Präfix "w14" für Attribut "w14:paraId", das mit Elementtyp "w:p" verknüpft ist, ist nicht gebunden.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(Unknown Source)
    at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(Unknown Source)
    at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(Unknown Source)
    at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:381)
    at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:361)
    at docx4jDiff.CompareDocumentsUsingDriver.main(CompareDocumentsUsingDriver.java:88)
Exception in thread "main" javax.xml.bind.UnmarshalException
 - with linked exception:
[org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 10510; Präfix "w14" für Attribut "w14:paraId", das mit Elementtyp "w:p" verknüpft ist, ist nicht gebunden.]
    at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.createUnmarshalException(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(Unknown Source)
    at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(Unknown Source)
    at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(Unknown Source)
    at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(Unknown Source)
    at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:381)
    at org.docx4j.XmlUtils.unmarshalString(XmlUtils.java:361)
    at docx4jDiff.CompareDocumentsUsingDriver.main(CompareDocumentsUsingDriver.java:88)
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 10510; Präfix "w14" für Attribut "w14:paraId", das mit Elementtyp "w:p" verknüpft ist, ist nicht gebunden.
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
    ... 7 more
有人能帮我吗

以下是主要代码:

    public class CompareDocumentsUsingDriver {

    public static JAXBContext context = org.docx4j.jaxb.Context.jc;

    /**
     * @param args
     */
    public static void main(String[] args) throws Exception {
        System.setProperty("file.encoding", "UTF-8");

        String newerfilepath = "B.docx";
        String olderfilepath = "A.docx";

        // 1. Load the Packages
        WordprocessingMLPackage newerPackage = WordprocessingMLPackage
                .load(new java.io.File(newerfilepath));
        WordprocessingMLPackage olderPackage = WordprocessingMLPackage
                .load(new java.io.File(olderfilepath));

        Body newerBody = ((Document) newerPackage.getMainDocumentPart()
                .getJaxbElement()).getBody();
        Body olderBody = ((Document) olderPackage.getMainDocumentPart()
                .getJaxbElement()).getBody();

        System.out.println("Differencing..");

        // 2. Do the differencing
        StringWriter sw = new StringWriter();

        Docx4jDriver.diff(XmlUtils.marshaltoW3CDomDocument(newerBody)
                .getDocumentElement(),
                XmlUtils.marshaltoW3CDomDocument(olderBody)
                        .getDocumentElement(), sw);
        // The signature which takes Reader objects appears to be broken

        // 3. Get the result

        String contentStr = sw.toString();
        System.out.println("Result: \n\n " + contentStr);

        Body newBody = (Body) XmlUtils.unwrap(XmlUtils.unmarshalString(contentStr));


        // In the general case, you need to handle relationships. Not done here!

        // RelationshipsPart rp =
        // newerPackage.getMainDocumentPart().getRelationshipsPart();
        // handleRels(pd, rp);
        newerPackage.setFontMapper(new IdentityPlusMapper());
        newerPackage.save(new java.io.File("COMPARED.docx"));

    }

    /**
     * In the general case, you need to handle relationships. Although not
     * necessary in this simple example, we do it anyway for the purposes of
     * illustration.
     */
    private static void handleRels(Differencer pd, RelationshipsPart rp) {
        // Since we are going to add rels appropriate to the docs being
        // compared, for neatness and to avoid duplication
        // (duplication of internal part names is fatal in Word,
        // and export xslt makes images internal, though it does avoid
        // duplicating
        // a part ),
        // remove any existing rels which point to images
        List<Relationship> relsToRemove = new ArrayList<Relationship>();
        for (Relationship r : rp.getRelationships().getRelationship()) {
            // Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
            if (r.getType().equals(Namespaces.IMAGE)) {
                relsToRemove.add(r);
            }
ti      }
        for (Relationship r : relsToRemove) {
            rp.removeRelationship(r);
        }

        // Now add the rels we composed
        List<Relationship> newRels = pd.getComposedRels();
        for (Relationship nr : newRels) {
            rp.addRelationship(nr);
        }
    }

}
公共类比较文档使用驱动程序{
公共静态JAXBContext context=org.docx4j.jaxb.context.jc;
/**
*@param args
*/
公共静态void main(字符串[]args)引发异常{
System.setProperty(“file.encoding”、“UTF-8”);
字符串newerfilepath=“B.docx”;
字符串olderfilepath=“A.docx”;
//1.加载包
WordprocessingMLPackage newerPackage=WordprocessingMLPackage
.load(新的java.io.File(newerfilepath));
WordprocessingMLPackage olderPackage=WordprocessingMLPackage
.load(新的java.io.File(olderfilepath));
Body newerBody=((文档)newerPackage.getMainDocumentPart()
.getJaxbElement()).getBody();
Body olderBody=((文档)olderPackage.getMainDocumentPart()
.getJaxbElement()).getBody();
System.out.println(“差分…”);
//2.做差分
StringWriter sw=新的StringWriter();
Docx4jDriver.diff(XmlUtils.marshaltoW3CDomDocument(newerBody)
.getDocumentElement(),
XmlUtils.marshaltoW3CDomDocument(旧体)
.getDocumentElement(),sw);
//接收读卡器对象的签名似乎已损坏
//3.得到结果
字符串contentStr=sw.toString();
System.out.println(“结果:\n\n”+contentStr);
Body newBody=(Body)XmlUtils.unwrap(XmlUtils.unmarshalString(contentStr));
//在一般情况下,你需要处理关系。这里不做!
//关系Spart rp=
//newerPackage.getMainDocumentPart().getRelationshipsPart();
//handleRels(pd,rp);
setFontMapper(新的IdentityPlusMapper());
保存(新的java.io.File(“COMPARED.docx”);
}
/**
*在一般情况下,你需要处理关系。虽然不是
*在这个简单的例子中,我们这样做是为了
*插图。
*/
专用静态无效手柄(差速器pd,RelationshipsPart rp){
//因为我们将要添加与正在处理的文档相适应的rel
//相比之下,为了整洁和避免重复
//(内部零件名称的重复在Word中是致命的,
//导出xslt使图像成为内部图像,尽管它确实避免了
//复制
//一部分),
//删除指向图像的任何现有REL
List relsToRemove=new ArrayList();
对于(关系r:rp.getRelationships().getRelationship()){
//类型=”http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
if(r.getType().equals(Namespaces.IMAGE)){
relsToRemove.add(r);
}
ti}
对于(关系r:relsToRemove){
rp.关系(r);
}
//现在添加我们编写的rel
List newRels=pd.getComposedRels();
for(关系编号:newRels){
rp.addRelationship(nr);
}
}
}
致以最良好的祝愿

提姆

编辑:

publicstaticvoidopenresult(字符串nodename,Writer out)抛出IOException{
//一般来说,我们需要避免直接写出来。。。
//因为它可以在格式化程序输出到达之前发生
//未正确声明名称空间:
//4种选择:
// 1:
//OpenElementEvent containerOpen=new OpenElementEventNSImpl(xml1.getNamespaceURI(),rootNodeName);
//formatter.format(containerOpen);
////AttributeEvent wNS=新的AttributeEventSimpl(“http://www.w3.org/2000/xmlns/“,“w”,
// //       "http://schemas.openxmlformats.org/wordprocessingml/2006/main");
////formatter.format(wNS);
//但是AttributeEvent在这个过程中太晚了,无法设置映射。
//所以你可以评论一下。
//但是您仍然需要在中添加w:和其他名称空间
//SmartXMLFormatter构造函数。因此,我们不妨做2:
//2:将所有已知名称空间粘贴到上面的根元素上
//3:修复SmartXMLFormatter
//选择2.因为这是明确的
out.追加(“”);
}

好的,我自己解决了。我向Docx4jDriver中添加了以下代码:请参见上文:D抱歉,我是stackoverflow的新手;)但是现在当我运行到文档时,每一个都运行得很完美,但是当文件中存在一些差异时,所有的内容都会生成。但有人说我失败了…我想没有人能帮我。。。
public static void openResult(String nodename,  Writer out) throws IOException {
        // In general, we need to avoid writing directly to Writer out...
        // since it can happen before formatter output gets there

        // namespaces not properly declared:
        // 4 options:
        // 1:
        // OpenElementEvent containerOpen = new OpenElementEventNSImpl(xml1.getNamespaceURI(), rootNodeName);
        // formatter.format(containerOpen);
        // // AttributeEvent wNS = new AttributeEventNSImpl("http://www.w3.org/2000/xmlns/" , "w",
        // //       "http://schemas.openxmlformats.org/wordprocessingml/2006/main");
        // // formatter.format(wNS);
        // but AttributeEvent is too late in the process to set the mapping.
        // so you can comment that out.
        // But you still have to add w: and the other namespaces in
        // SmartXMLFormatter constructor. So may as well do 2.:
        // 2: stick all known namespaces on our root element above
        // 3: fix SmartXMLFormatter
        // Go with option 2 .. since this is clear
        out.append("<" + nodename
                + " xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\""  // w: namespace
                + " xmlns:a=\"http://schemas.openxmlformats.org/drawingml/2006/main\""
                + " xmlns:pic=\"http://schemas.openxmlformats.org/drawingml/2006/picture\""
                + " xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\""
                + " xmlns:v=\"urn:schemas-microsoft-com:vml\""
                + " xmlns:w14=\"http://schemas.microsoft.com/office/word/2010/wordml\""
                + " xmlns:w15=\"http://schemas.microsoft.com/office/word/2012/wordml\""
                + " xmlns:w10=\"urn:schemas-microsoft-com:office:word\""
                + " xmlns:wp=\"http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing\""
                + " xmlns:dfx=\"" + Constants.BASE_NS_URI + "\""  // Add these, since SmartXMLFormatter only writes them on the first fragment
                + " xmlns:del=\"" + Constants.DELETE_NS_URI + "\""
                + " xmlns:ins=\"" + Constants.BASE_NS_URI + "\""
                        + " >" );
    }