Docx4J将中文转换为pdf

Docx4J将中文转换为pdf,pdf,ms-word,docx4j,Pdf,Ms Word,Docx4j,我有以下代码将docx文件转换为pdf文件,我的docx内容有文本框和中文字符 String myFilePath = "testing.docx"; File docxFile = new File("testing.docx"); WordprocessingMLPackage wordprocessingMLPackage = WordprocessingMLPackage.load(docxFile); Mapper identifierFontMapper = new Identi

我有以下代码将docx文件转换为pdf文件,我的docx内容有文本框和中文字符

String myFilePath = "testing.docx";

File docxFile = new File("testing.docx");
WordprocessingMLPackage wordprocessingMLPackage = WordprocessingMLPackage.load(docxFile);

Mapper identifierFontMapper = new IdentityPlusMapper();
wordprocessingMLPackage.setFontMapper(identifierFontMapper);

Mapper bestMatchingMapper = new BestMatchingMapper();
wordprocessingMLPackage.setFontMapper(bestMatchingMapper);

Docx4J.toPDF(wordprocessingMLPackage, new FileOutputStream(myFilePath + ".pdf"));
有了这些代码,我可以转换成pdf文件,但问题是汉字变成了####

有没有办法解决这个问题


is my document.xml

假设您的类路径上有docx4j导出FO,所以您使用的是XSL FO导出,那么您应该能够看到哪些字符缺少glyphs(打开org.docx4j.font的调试日志),并映射合适的字体

例如,见

9月29日编辑

我明白了:

WARN org.docx4j.fonts.fop.util.FopConfigUtil .declareFonts line 123 - Document font Calibri is not mapped to a physical font!
WARN org.docx4j.fonts.fop.util.FopConfigUtil .declareFonts line 123 - Document font SimHei is not mapped to a physical font!
WARN org.docx4j.fonts.fop.util.FopConfigUtil .declareFonts line 123 - Document font Arial is not mapped to a physical font!
WARN org.docx4j.fonts.fop.util.FopConfigUtil .declareFonts line 123 - Document font Wingdings is not mapped to a physical font!
WARN org.docx4j.fonts.fop.util.FopConfigUtil .declareFonts line 123 - Document font 華康中黑體 is not mapped to a physical font!

WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Font "Symbol,normal,700" not found. Substituting with "Symbol,normal,400".
WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Font "ZapfDingbats,normal,700" not found. Substituting with "ZapfDingbats,normal,400".
WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Font "Calibri,normal,700" not found. Substituting with "any,normal,700".
WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Glyph "这" (0x8fd9) not available in font "Times-Bold".
WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Glyph "些" (0x4e9b) not available in font "Times-Bold".
WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Glyph "都" (0x90fd) not available in font "Times-Bold".
WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Glyph "只" (0x53ea) not available in font "Times-Bold".
WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Glyph "是" (0x662f) not available in font "Times-Bold".
WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Glyph "测" (0x6d4b) not available in font "Times-Bold".
WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Glyph "试" (0x8bd5) not available in font "Times-Bold".
WARN org.apache.fop.apps.FOUserAgent .processEvent line 94 - Glyph "而" (0x800c) not available in font "Times-Bold".
请注意,字形X在字体Y消息中不可用。因此,我需要这样的东西:

    Mapper fontMapper = new IdentityPlusMapper();
    wordMLPackage.setFontMapper(fontMapper);

    fontMapper.put("Times-Bold", PhysicalFonts.get(some Chinese font installed in my OS));  

正如您所说,docx4j导出FO是依赖项吗?例如docx4j-export-fo-3.3.6.jar?内部依赖项而不是非类路径依赖项?对于中文字符,我可以从日志中得到的是[org.docx4j.fonts.fop.util.FopConfigUtil](默认任务-75)文档字体?????未映射到物理字体!我相信这个????是字体。对于文本框,您的意思是无法在PDF文件中显示文本框吗?请将XML添加到您的问题中(解压docx,然后进入word/document.XML),以便我们可以看到指定的字体。另外,请为您的文本框问题提出第二个问题(并在其中添加XML)。另请参阅我已经附加了指向我的document.XML的链接,请查看最佳解决方案是确保计算机上安装了所需的字体。如果您不能做到这一点,那么您必须提供一个到已安装字体(并且包含字形)的映射