Java ApachePOI-如何检索;rect";“内部对象”;pict";对象

Java ApachePOI-如何检索;rect";“内部对象”;pict";对象,java,apache-poi,Java,Apache Poi,我正在尝试使用ApachePOI将Word文档转换为HTML。我有一个Word文档,在段落后有一条横线。水平线的OOXML如下所示: <w:p w14:paraId="721E1052" w14:textId="05637367" w:rsidR="002D1248" w:rsidRPr="00BB3E82" w:rsidRDefault="00B3113F" w:rsidP="00797596"> <w:pPr>

我正在尝试使用ApachePOI将Word文档转换为HTML。我有一个Word文档,在段落后有一条横线。水平线的OOXML如下所示:

          <w:p w14:paraId="721E1052" w14:textId="05637367" w:rsidR="002D1248" w:rsidRPr="00BB3E82" w:rsidRDefault="00B3113F" w:rsidP="00797596">
            <w:pPr>
              <w:rPr>
                <w:rFonts w:eastAsia="Times New Roman" w:cs="Courier New"/>
                <w:snapToGrid w:val="0"/>
                <w:color w:val="000000"/>
                <w:lang w:eastAsia="fi-FI"/>
              </w:rPr>
            </w:pPr>
            <w:r>
              <w:rPr>
                <w:rFonts w:eastAsia="Times New Roman" w:cs="Courier New"/>
                <w:snapToGrid w:val="0"/>
                <w:color w:val="000000"/>
                <w:lang w:eastAsia="fi-FI"/>
              </w:rPr>
              <w:pict w14:anchorId="534EEFD0">
                <v:rect id="_x0000_i1025" style="width:0;height:1.5pt" o:hralign="center" o:hrstd="t" o:hr="t" fillcolor="#a0a0a0" stroked="f"/>
              </w:pict>
            </w:r>
          </w:p>

对应于这条水平线,我想在HTML中添加一个HR标记。但是,我无法检索“pict”中的“rect”元素。这就是我迄今为止所尝试的:

List<org.openxmlformats.schemas.wordprocessingml.x2006.main.CTPicture> pics = run.getCTR().getPictList();
        if(pics!=null) {
            log.debug("Size of pics = "+pics.size());
            for (org.openxmlformats.schemas.wordprocessingml.x2006.main.CTPicture pic : pics) {
                Node picNode = pic.getDomNode();
                CTGroup ctGroup = CTGroup.Factory.parse(picNode);
                if(ctGroup!=null) {
                    log.debug("Size of rects= "+ctGroup.getRectList().size());
                }
            }

List pics=run.getCTR().getPictList();
如果(pics!=null){
log.debug(“pics的大小=“+pics.Size());
for(org.openxmlformats.schemas.wordprocessingml.x2006.main.CTPicture pic:pics){
Node picNode=pic.getDomNode();
CTGroup-CTGroup=CTGroup.Factory.parse(picNode);
如果(ctGroup!=null){
log.debug(“rects的大小=“+ctGroup.getRectList().Size());
}
}
上述代码给出: 图片大小=1 矩形的大小=0
我不知道为什么会这样。如果您能帮助理解如何检索“rect”对象,我将不胜感激。谢谢。

您无法从
org.openxmlformats.schemas.wordprocessingml.x2006.main.CTPicture
dom节点解析
com.microsoft.schemas.vml.CTGroup
元素

但是所有
ooxml模式
对象都继承自
org.apache.xmlbeans.XmlObject
。因此它们可以使用元素URI和元素本地名来选择子元素。我们需要知道的是
com.microsoft.schemas.vml.
的名称空间URI是“urn:schemas microsoft com:vml”

例如:

import java.io.FileInputStream;

import org.apache.poi.xwpf.usermodel.*;

import org.apache.xmlbeans.XmlObject;

import java.util.List;

public class WordReadCTPictureContent {

 public static void main(String[] args) throws Exception {

  String inFilePath = "./HRBetweenParagraphs.docx";

  XWPFDocument document = new XWPFDocument(new FileInputStream(inFilePath));

  for (XWPFParagraph paragraph : document.getParagraphs()) {
   for (XWPFRun run : paragraph.getRuns()) {

    List<org.openxmlformats.schemas.wordprocessingml.x2006.main.CTPicture> pics = run.getCTR().getPictList();
    System.out.println("Size of pics = " + pics.size());
    for (org.openxmlformats.schemas.wordprocessingml.x2006.main.CTPicture pic : pics) {
     //select com.microsoft.schemas.vml.CTRect children by elementUri and elementLocalName
     XmlObject[] rects = pic.selectChildren("urn:schemas-microsoft-com:vml", "rect");
     System.out.println("Count of rects = " + rects.length);
     for (XmlObject obj : rects) {
      com.microsoft.schemas.vml.CTRect rect = (com.microsoft.schemas.vml.CTRect)obj;
      //now we can work with found com.microsoft.schemas.vml.CTRect
      System.out.println("Id of found rect = " + rect.getId());
     }

    }

   }
  }

  document.close();
 }

}
import java.io.FileInputStream;
导入org.apache.poi.xwpf.usermodel.*;
导入org.apache.xmlbeans.XmlObject;
导入java.util.List;
公共类WordReadCTPictureContent{
公共静态void main(字符串[]args)引发异常{
字符串inFilePath=“/hrbetweenParagps.docx”;
XWPFDocument document=新的XWPFDocument(新文件输入流(inFilePath));
对于(XWPFParagraph段落:document.getParagraphs()){
对于(XWPFRun:paragration.getRuns()){
List pics=run.getCTR().getPictList();
System.out.println(“pics的大小=“+pics.Size());
for(org.openxmlformats.schemas.wordprocessingml.x2006.main.CTPicture pic:pics){
//按elementUri和elementLocalName选择com.microsoft.schemas.vml.CTRect子项
XmlObject[]rects=pic.selectChildren(“urn:schemas-microsoft-com:vml”,“rect”);
System.out.println(“矩形计数=+矩形长度”);
for(XmlObject对象:rects){
com.microsoft.schemas.vml.CTRect rect=(com.microsoft.schemas.vml.CTRect)对象;
//现在我们可以使用找到的com.microsoft.schemas.vml.CTRect
System.out.println(“找到的rect的Id=“+rect.getId());
}
}
}
}
document.close();
}
}

明白了。非常感谢Axel。一如既往,您的回答非常有帮助,非常完美。再次感谢您!