Java 如何将Word文档转换为PDF？_Java_Pdf_Ms Word

Java 如何将Word文档转换为PDF？

java pdf ms-word

Java 如何将Word文档转换为PDF？,java,pdf,ms-word,Java,Pdf,Ms Word,如何将Word文档转换为PDF，其中文档包含各种内容，例如表格。尝试使用iText时，原始文档与转换后的PDF看起来不同。是否有一个开源API/库，而不是调用一个可执行文件，我可以使用它？这是一个相当困难的任务，如果你想要完美的结果（不使用Word是不可能的），就更难了。因此，我相信，在纯Java中为你做这一切的开源API的数量是零（更新：我错了，见下文）您的基本选择如下：使用JNI/a C#web服务/etc脚本MS Office（100%完美结果的唯一选项）使用可用的API脚本Open

如何将Word文档转换为PDF，其中文档包含各种内容，例如表格。尝试使用iText时，原始文档与转换后的PDF看起来不同。是否有一个开源API/库，而不是调用一个可执行文件，我可以使用它？

这是一个相当困难的任务，如果你想要完美的结果（不使用Word是不可能的），就更难了。因此，我相信，在纯Java中为你做这一切的开源API的数量是零（更新：我错了，见下文）

您的基本选择如下：

使用JNI/a C#web服务/etc脚本MS Office（100%完美结果的唯一选项）

使用可用的API脚本Open Office（90%以上完美）

使用apachepoi&iText（非常大的作业，永远不会完美）

更新-2016-02-11 这里是我关于这个主题的博客文章的一个缩略副本，它概述了在Java中支持Word到PDF的现有产品

据我所知，有三种产品可以呈现Office文档：

不定期维护，纯Java，开源将多个库绑定在一起以执行转换

积极开发，纯Java，开源 Java API可以将使用MS Office（docx）或OpenOffice（odt）、LibreOffice（odt）创建的XML文档与Java模型合并，生成报告，并在需要时将其转换为其他格式（PDF、XHTML…）

封闭源代码，纯Java Snowbound似乎是一个100%的Java解决方案，成本超过2500美元。它包含描述如何在评估下载中转换文档的示例

原料药开源，而非纯Java-需要安装Open Office OpenOffice是一个支持JavaAPI的本机Office套件。这支持阅读Office文档和编写PDF文档。SDK包含一个文档转换示例（examples/java/DocumentHandling/DocumentConverter.java）。要编写pdf，您需要通过“writer\u pdf\u导出”编写器，而不是“MS Word 97”编写器。或者您可以使用包装器API

-截至2016年2月11日死亡

使用ApachePOI读取Word文档，使用iText编写PDF。完全免费，100%Java，但也有一些。

我同意将OpenOffice列为具有Java API的word/pdf文档高保真导入/导出工具的海报，它也可以跨平台工作。OpenOffice导入/导出过滤器功能非常强大，在转换为各种格式（包括PDF）的过程中保留了大部分格式。与直接学习OpenOffice API相比，它更容易实现增值。由于UNO API的风格和与崩溃相关的错误，这可能是一项挑战。

为此，您可以使用JODConverter。它可用于在不同office格式之间转换文档。例如：

从Microsoft Office到OpenDocument，反之亦然

任何PDF格式

并且支持更多的转换

它还可以将MS office 2007文档转换为PDF以及几乎所有格式

更多详细信息可在此处找到：

，它是一个在UNIX中工作的python工具。虽然我在UNIX中使用Java调用shell，但它对我来说非常适合。我的源代码：。据说JODConverter和unoconv都使用open office/libre office

docx4j/docxreport、POI、PDFBox都不错，但在转换过程中缺少一些格式。

请检查。它是一个轻量级的解决方案，专为将文档转换为pdf而设计

为什么?

我想要一个可以转换Microsoft Office文档的简单程序 PDF格式，但不依赖LibreOffice或专有解决方案。查看如何转换代码和库每个单独的格式都分散在网络上，我决定将所有这些解决方案合并到一个程序中。一路上，我由于我也遇到了代码，所以决定添加ODT支持

您可以使用Cloudmersive本机Java库。它每月最多可免费转换50000次，并且根据我的经验，它的保真度比其他基于iText或ApachePOI的方法高得多。这些文档实际上看起来与Microsoft Word中的文档一样，而Microsoft Word对我来说是关键。顺便说一句，它还可以将XLSX、PPTX和遗留文档、XLS和PPT转换为PDF

代码如下所示，首先添加导入：

import com.cloudmersive.client.invoker.ApiClient;
import com.cloudmersive.client.invoker.ApiException;
import com.cloudmersive.client.invoker.Configuration;
import com.cloudmersive.client.invoker.auth.*;
import com.cloudmersive.client.ConvertDocumentApi;

然后转换文件：

ApiClient defaultClient = Configuration.getDefaultApiClient();

// Configure API key authorization: Apikey
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
Apikey.setApiKey("YOUR API KEY");

ConvertDocumentApi apiInstance = new ConvertDocumentApi();
File inputFile = new File("/path/to/input.docx"); // File to perform the operation on.
try {
  byte[] result = apiInstance.convertDocumentDocxToPdf(inputFile);
  System.out.println(result);
} catch (ApiException e) {
  System.err.println("Exception when calling ConvertDocumentApi#convertDocumentDocxToPdf");
e.printStackTrace();
}

您可以从门户网站免费获得一份电子邮件。

使用call Office Word是100%完美的解决方案。但它只支持Windows平台，因为需要安装Office Word

下载JACOB archive（最新版本为1.19）

将jacob.jar添加到项目类路径中

将jacob-1.19-x32.dll或jacob-1.19-x64.dll（取决于您的jdk版本）添加到…\Java\jdk1.x.x_xxx\jre\bin

使用jacobapi调用officeword将doc/docx转换为pdf

public void convertDocx2pdf(String docxFilePath) {
File docxFile = new File(docxFilePath);
String pdfFile = docxFilePath.substring(0, docxFilePath.lastIndexOf(".docx")) + ".pdf";

if (docxFile.exists()) {
    if (!docxFile.isDirectory()) { 
        ActiveXComponent app = null;

        long start = System.currentTimeMillis();
        try {
            ComThread.InitMTA(true); 
            app = new ActiveXComponent("Word.Application");
            Dispatch documents = app.getProperty("Documents").toDispatch();
            Dispatch document = Dispatch.call(documents, "Open", docxFilePath, false, true).toDispatch();
            File target = new File(pdfFile);
            if (target.exists()) {
                target.delete();
            }
            Dispatch.call(document, "SaveAs", pdfFile, 17);
            Dispatch.call(document, "Close", false);
            long end = System.currentTimeMillis();
            logger.info("============Convert Finished：" + (end - start) + "ms");
        } catch (Exception e) {
            logger.error(e.getLocalizedMessage(), e);
            throw new RuntimeException("pdf convert failed.");
        } finally {
            if (app != null) {
                app.invoke("Quit", new Variant[] {});
            }
            ComThread.Release();
        }
    }
}

}

现在已经是2019年了，我真不敢相信还没有最简单、最方便的方法将Java世界中最流行的Micro$oft Word文档转换为Adobe PDF格式

我几乎尝试了以上提到的每一种方法，我发现满足我需求的最好的也是唯一的方法就是使用OpenOffice或LibreOffice。实际上我不太清楚它们之间的区别，似乎它们都提供了

soffice

命令行

我的要求是：

它必须在Linux上运行，更确切地说是在CentOS上，而不是在Windows上，因此我们不能在它上安装Microsoft Office

它必须支持中文字符，因此ISO-8859-1字符编码不是一种选择，它必须支持Unicode

首先想到的是

doc-to-pdf转换器

，但它缺乏维护，上次更新发生在4年前，我不会使用无人维护的解决方案

Xdocreport

似乎是一个很有前途的选择，但它只能转换

docx

，而不能转换作为命令的

doc

二进制文件

Runtime.getRuntime().exec("soffice --convert-to pdf -outdir . /path/some.doc");

<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-Internal</artifactId>
    <version>8.0.0</version>
</dependency>
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-ReferenceImpl</artifactId>
    <version>8.0.0</version>
</dependency>
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-MOXy</artifactId>
    <version>8.0.0</version>
</dependency>
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-export-fo</artifactId>
    <version>8.0.0</version>
</dependency>

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;

import org.docx4j.Docx4J;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;

public class DocToPDF {

    public static void main(String[] args) {
        
        try {
            InputStream templateInputStream = new FileInputStream("D:\\\\Workspace\\\\New\\\\Sample.docx");
            WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(templateInputStream);
            MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();

            String outputfilepath = "D:\\\\Workspace\\\\New\\\\Sample.pdf";
            FileOutputStream os = new FileOutputStream(outputfilepath);
            Docx4J.toPDF(wordMLPackage,os);
            os.flush();
            os.close();
        } catch (Throwable e) {

            e.printStackTrace();
        } 
    }

}

import com.spire.doc.Document;
import com.spire.doc.FileFormat;
import com.spire.doc.ToPdfParameterList;

public class WordToPDF {
public static void main(String[] args)  {

    //Create Document object
    Document doc = new Document();

    //Load the file from disk.
    doc.loadFromFile("Sample.docx");

    //create an instance of ToPdfParameterList.
    ToPdfParameterList ppl=new ToPdfParameterList();

    //embeds full fonts by default when IsEmbeddedAllFonts is set to true.
    ppl.isEmbeddedAllFonts(true);

    //set setDisableLink to true to remove the hyperlink effect for the result PDF page.
    //set setDisableLink to false to preserve the hyperlink effect for the result PDF page.
    ppl.setDisableLink(true);

    //Set the output image quality as 40% of the original image. 80% is the default setting.
    doc.setJPEGQuality(40);

    //Save to file.
    doc.saveToFile("output/ToPDF.pdf",FileFormat.PDF);
}
}