Java PDF iText中的阿拉伯语翻译错误_Java_Itext7

Java PDF iText中的阿拉伯语翻译错误

java

Java PDF iText中的阿拉伯语翻译错误,java,itext7,Java,Itext7,我正在从HTML字符串生成PDF文件，但生成PDF文件时，HTML和PDF中的内容不匹配。内容是PDF，是一些随机内容。我在谷歌上看到了这个问题，他们建议使用Unicode符号，比如%u0627%u0646%u0627%20%u0627%u0633%u0645%u0649%20%u0639%u0628%u062F%u0627%u0644%u0644%u0647。但我把它放进我的HTML，它正在打印相关问题： package com.example.demo；导入com.itextpdf.ht

我正在从HTML字符串生成PDF文件，但生成PDF文件时，HTML和PDF中的内容不匹配。内容是PDF，是一些随机内容。我在谷歌上看到了这个问题，他们建议使用Unicode符号，比如

%u0627%u0646%u0627%20%u0627%u0633%u0645%u0649%20%u0639%u0628%u062F%u0627%u0644%u0644%u0647

。但我把它放进我的HTML，它正在打印

相关问题：

package com.example.demo；
导入com.itextpdf.html2pdf.ConverterProperties；
导入com.itextpdf.html2pdf.HtmlConverter；
导入com.itextpdf.styledxmlparser.css.media.MediaDeviceDescription；
导入com.itextpdf.styledxmlparser.css.media.MediaType；
导入com.itextpdf.html2pdf.resolver.font.DefaultFontProvider；
导入com.itextpdf.layout.font.FontProvider；
导入org.springframework.boot.SpringApplication；
导入org.springframework.boot.autoconfigure.springboot应用程序；
导入java.io.ByteArrayOutputStream；
导入java.io.File；
导入java.io.FileOutputStream；
导入java.io.IOException；
@SpringBoot应用程序
公共类演示应用程序{
公共静态void main（字符串[]args）引发IOException{
run（DemoApplication.class，args）；
字符串htmlSource=getContent（）；
ByteArrayOutputStream outputStream=新建ByteArrayOutputStream（）；
ConverterProperties ConverterProperties=新的ConverterProperties（）；
FontProvider dfp=新的DefaultFontProvider（真、假、假）；
dfp.addFont（“/Library/Fonts/Arial.ttf”）；
converterProperties.setFontProvider（dfp）；
converterproperty.setMediaDeviceDescription（新MediaDeviceDescription（MediaType.PRINT））；
convertToPdf（htmlSource、outputStream、converterProperties）；
byte[]bytes=outputStream.toByteArray（）；
文件pdfFile=新文件（“java19.pdf”）；
FileOutputStream fos=新的FileOutputStream（Pdfile）；
fos.写入（字节）；
fos.flush（）；
fos.close（）；
}
私有静态字符串getContent（）{
返回“\n”+
“\n”+
“\n”+
“\n”+
“\n”+
“\n”+
“\n”+
“文档\n”+
“\n”+
“@page{\n”+
“页边距：0；\n”+
“字体系列：arial；\n”+
“}\n”+
“\n”+
“\n”+
“\n”+
“\n”+
“\n”+
“\n”+
“\n”+
“\n”+
“+
“\n”+
“\n”+
“\n”+
“\n”+
“\n”+
“\n”+
“\n”+
“\n”+
"";
}
}

确保字体支持所需的字符，如果在生成期间使用Maven资源目录包含额外字体，请检查字体文件是否未被过滤（属性替换），因为这会损坏文件：

请检查以确保源文件和编译器使用相同的编码，例如UTF-8。我有时会通过包含仅在unicode中可用而在其他经典代码页中不可用的字符来检查这一点

我试图重现该问题，在运行示例代码时，我在日志记录中得到以下警告：

找不到布局属性之一隐式要求的pdfCalligraph模块

亚历克赛·苏巴赫已经提到了这一点，并可能导致以下问题：

文本方向问题（我不是阿拉伯语专家，但文本向右对齐）
字符组合错误（有关详细信息，请参阅本文档：）

这是我在没有pdfCalligraph的情况下得到的输出：

使用上的代码库创建

因此，为了让一切都像浏览器对阿拉伯语HTML所做的那样完美运行，您还需要：

营业执照
加载许可证文件的代码（或者您将获得LicenseFileNotLoadedException）
这种依赖

您的问题被标记为关于iText7，但根据您的需求，可能还有其他问题，如ApacheFop，应根据使用阿拉伯文连字，但可能需要返工，因为它基于XSL-FO。理论上，您可以使用当前使用的任何模板机制生成XSL-FO，例如JSP/JSF/Thymeleaf等，并在请求期间（在web应用程序中）使用类似ServletFilter的工具将XSL-FO动态转换为PDF。

如果不看到错误的输出，很难确定问题到底是什么。但是你的“随机内容”听起来像是一个编码问题

由于源代码中直接包含阿拉伯语内容，因此必须小心编码。例如，使用

ISO-8859-1

，生成的PDF输出为：

使用Unicode转义序列（

\uxxx

），确实可以避免这些编码问题。替换

"                    <p> انا اسمى عبدالله\n" +

您链接的问题已存在5年，与iText 5有关。您正在使用iText 7+pdfHTML，因此链接问题可能不适用于您。请附上结果PDF。您正在使用pdfCalligraph吗？请检查此线程。我使用支持阿拉伯语的arial字体。我的pdf文件有内容，但当我使用Unicode转义序列（如\u0627\u0646）时，它与HTML文件不同。它是按PDF中的格式打印的，您使用的是

\u0627

，而不是您在问题中提到的

%u0627

？打印的是阿拉伯语文本，但翻译错误。我按照你的建议用unicode格式编写了代码，但是

"                    <p> انا اسمى عبدالله\n" +

"                    <p>\u0627\u0646\u0627 \u0627\u0633\u0645\u0649 \u0639\u0628\u062F\u0627\u0644\u0644" +

public static void main(String[] args) throws IOException {
    // Needed for pdfCalligraph
    LicenseKey.loadLicenseFile("all-products.xml");

    File pdfFile = new File("java19.pdf");
    OutputStream outputStream = new FileOutputStream(pdfFile);
    String htmlSource = getContent();
    ConverterProperties converterProperties = new ConverterProperties();
    FontProvider dfp = new DefaultFontProvider(true, false, false);
    dfp.addFont("/Library/Fonts/Arial.ttf");
    converterProperties.setFontProvider(dfp);
    converterProperties.setMediaDeviceDescription(new MediaDeviceDescription(MediaType.PRINT));
    HtmlConverter.convertToPdf(htmlSource, outputStream, converterProperties);
}

private static String getContent() {
    return "<!DOCTYPE html>\n" +
            "<html lang=\"en\">\n" +
            "\n" +
            "<head>\n" +
            "    <meta charset=\"UTF-8\">\n" +
            "    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n" +
            "    <meta http-equiv=\"X-UA-Compatible\" content=\"ie=edge\">\n" +
            "    <title>Document</title>\n" +
            "    <style>\n" +
            "      @page {\n" +
            "        margin: 0;\n" +
            "        font-family: arial;\n" +
            "      }\n" +
            "    </style>\n" +
            "</head>\n" +
            "\n" +
            "<body\n" +
            "    style=\"margin: 0;padding: 0;font-family: arial, sans-serif;font-size: 14px;line-height: 125%;width: 100%;-ms-text-size-adjust: 100%;-webkit-text-size-adjust: 100%;color: #222222;\">\n" +
            "    <table cellpadding=\"0\" cellspacing=\"0\" width=\"100%\" style=\"background: white; direction: rtl;\">\n" +
            "        <tbody>\n" +
            "            <tr>\n" +
            "                <td style=\"padding: 0 35px;\">\n" +
// Arabic content
//            "                    <p> انا اسمى عبدالله\n" +
// Arabic content with Unicode escape sequences
            "                    <p>\u0627\u0646\u0627 \u0627\u0633\u0645\u0649 \u0639\u0628\u062F\u0627\u0644\u0644\u0647" +
            "                    </p>\n" +
            "                </td>\n" +
            "            </tr>\n" +
            "        </tbody>\n" +
            "    </table>\n" +
            "\n" +
            "</body>\n" +
            "\n" +
            "</html>";
}