Java 使用PDFBOX书写阿拉伯语，字符正确，呈现形式不分离_Java_Pdf_Pdf Generation_Pdfbox

Java 使用PDFBOX书写阿拉伯语，字符正确，呈现形式不分离

java pdf

Java 使用PDFBOX书写阿拉伯语，字符正确，呈现形式不分离,java,pdf,pdf-generation,pdfbox,Java,Pdf,Pdf Generation,Pdfbox,我正在尝试使用PDFBox Apache生成一个包含阿拉伯语文本的PDF，但文本是作为分隔字符生成的，因为Apache将给定的阿拉伯语字符串解析为一个通用的“官方”Unicode字符序列，该序列相当于阿拉伯语字符的分隔形式下面是一个例子：要以PDF格式写入的目标文本“应以PDF文件格式输出”->جمل㶠㶌㶉㶖ي 我在PDF文件中获得的内容-> 我尝试了一些方法，但没有用，这里有一些： 1.将字符串转换为比特流并尝试提取正确的值 2.使用UTF-8&&UTF-16处理字节序列并从中提取值有

我正在尝试使用PDFBox Apache生成一个包含阿拉伯语文本的PDF，但文本是作为分隔字符生成的，因为Apache将给定的阿拉伯语字符串解析为一个通用的“官方”Unicode字符序列，该序列相当于阿拉伯语字符的分隔形式

下面是一个例子：
要以PDF格式写入的目标文本“应以PDF文件格式输出”->جمل㶠㶌㶉㶖ي
我在PDF文件中获得的内容->

我尝试了一些方法，但没有用，这里有一些：
1.将字符串转换为比特流并尝试提取正确的值
2.使用UTF-8&&UTF-16处理字节序列并从中提取值

有一种方法似乎很有希望获得每个字符的值“Unicode”，但它会生成通用的“官方Unicode”，这就是我的意思

System.out.println( Integer.toHexString( (int)(new String("كلمة").charAt(1))) );

输出是644，但fee0是预期的输出，因为这个字符位于中间，从那时起我应该得到中间的Unicode fee0

所以我想要的是生成正确Unicode的方法，而不仅仅是官方的方法

下面链接中第一个表中最左边的一列表示通用Unicode

注意：此答案中的示例代码可能已过时。有关工作示例代码，请参阅

首先，我要感谢并向我展示了使使用PDFBox Apache编写阿拉伯语成为可能的库。

此答案将分为两部分：

下载库并安装它

如何使用图书馆

下载库并安装它我们将使用图书馆。
ICU代表Unicode的国际组件，它是一套成熟、广泛使用的C/C++和Java库，为软件应用程序提供Unicode和全球化支持。ICU具有广泛的可移植性，在所有平台上以及在C/C++和Java软件之间为应用程序提供相同的结果

要下载库，请转到的下载页面。
选择最新版本的ICU4J，如下图所示。

您将被转移到另一个页面，您将发现一个包含所需组件直接链接的框。继续下载三个文件，您将在下一张图片中找到突出显示的文件

icu4j-docs.jar

icu4j-src.jar

icu4j.jar

关于在Netbeans IDE中创建和添加库的以下说明

import com.ibm.icu.text.ArabicShaping;
import com.ibm.icu.text.ArabicShapingException;
import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.*;

public class Main {
    public static void main(String[] args) throws IOException , ArabicShapingException
{
        File f = new File("Arabic Font File of format.ttf");
        PDDocument doc = new PDDocument();
        PDPage Page = new PDPage();
        doc.addPage(Page);
        PDPageContentStream Writer = new PDPageContentStream(doc, Page);
        Writer.beginText();
        Writer.setFont(PDType0Font.load(doc, f), 20);
        Writer.newLineAtOffset(0, 700);
        //The Trick in the next Line of Code But Here is some few Notes first  
        //We have to reverse the string because PDFBox is Writting from the left but Arabic is RTL Language  
        //The output will be perfect except every line will be justified to the left "It's not hard to resolve this"
        // So we have to write arabic string to pdf line by line..It will be like this
        String s ="جملة بالعربي لتجربة الكلاس اللذي يساعد علي وصل الحروف بشكل صحيح";
        Writer.showText(new StringBuilder(new ArabicShaping(reverseNumbersInString(ArabicShaping.LETTERS_SHAPE).shape(s))).reverse().toString());
        // Note the previous line of code throws ArabicShapingExcpetion 
        Writer.endText();
        Writer.close();
        doc.save(new File("File_Test.pdf"));
        doc.close();
    }
}

导航到工具栏并单击“工具”

选择图书馆

在左下角，您将找到新建库按钮创建您的库

导航到在“库”列表中创建的库

单击它并像那样添加jar文件夹

在类路径中添加icu4j.jar

在源代码中添加icu4j-src.jar

在Javadoc中添加icu4j-docs.jar

从最右侧查看已打开的项目

展开要在其中使用库的项目

右键单击“库”文件夹并选择“添加库”

最后选择您刚刚创建的库现在您已经准备好使用该库了，只需像那样导入所需的内容即可

import com.ibm.icu.What_You_Want_To_Import;

如何使用图书馆使用ArabicsShaping类并反转字符串，我们可以写出正确的附加阿拉伯语行
下面是代码注意下面代码中的注释

import com.ibm.icu.text.ArabicShaping; import com.ibm.icu.text.ArabicShapingException; import java.io.File; import java.io.IOException; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDPage; import org.apache.pdfbox.pdmodel.PDPageContentStream; import org.apache.pdfbox.pdmodel.font.*; public class Main { public static void main(String[] args) throws IOException , ArabicShapingException { File f = new File("Arabic Font File of format.ttf"); PDDocument doc = new PDDocument(); PDPage Page = new PDPage(); doc.addPage(Page); PDPageContentStream Writer = new PDPageContentStream(doc, Page); Writer.beginText(); Writer.setFont(PDType0Font.load(doc, f), 20); Writer.newLineAtOffset(0, 700); //The Trick in the next Line of Code But Here is some few Notes first //We have to reverse the string because PDFBox is Writting from the left but Arabic is RTL Language //The output will be perfect except every line will be justified to the left "It's not hard to resolve this" // So we have to write arabic string to pdf line by line..It will be like this String s ="جملة بالعربي لتجربة الكلاس اللذي يساعد علي وصل الحروف بشكل صحيح"; Writer.showText(new StringBuilder(new ArabicShaping(reverseNumbersInString(ArabicShaping.LETTERS_SHAPE).shape(s))).reverse().toString()); // Note the previous line of code throws ArabicShapingExcpetion Writer.endText(); Writer.close(); doc.save(new File("File_Test.pdf")); doc.close(); } }
这是输出

我希望我已经把一切都检查过了
更新：倒车后，确保再次倒车，以获得相同的正确号码
这里有几个函数可能会有所帮助

public static boolean isInt(String Input) { try{Integer.parseInt(Input);return true;} catch(NumberFormatException e){return false;} } public static String reverseNumbersInString(String Input) { char[] Separated = Input.toCharArray();int i = 0; String Result = "",Hold = ""; for(;i<Separated.length;i++ ) { if(isInt(Separated[i]+"") == true) { while(i < Separated.length && (isInt(Separated[i]+"") == true || Separated[i] == '.' || Separated[i] == '-')) { Hold += Separated[i]; i++; } Result+=reverse(Hold); Hold=""; } else{Result+=Separated[i];} } return Result; }

公共静态布尔isInt（字符串输入） { 尝试{Integer.parseInt（输入）；返回true；} catch（NumberFormatException e）{return false；} } 公共静态字符串反转EnumberInstring（字符串输入） { char[]Separated=Input.toCharArray（）；int i=0；字符串结果=”，保持=”；对于（；iNotice: 此答案中的示例代码可能已过时。有关工作示例代码，请参阅首先，我要感谢并向我展示了使使用PDFBox Apache编写阿拉伯语成为可能的库。此答案将分为两部分：下载库并安装它如何使用图书馆下载库并安装它我们将使用图书馆。 ICU代表Unicode的国际组件，它是一套成熟、广泛使用的C/C++和Java库，为软件应用程序提供Unicode和全球化支持。ICU具有广泛的可移植性，在所有平台上以及在C/C++和Java软件之间为应用程序提供相同的结果要下载库，请转到的下载页面。选择最新版本的ICU4J，如下图所示。您将被转移到另一个页面，您将发现一个包含所需组件直接链接的框。继续下载三个文件，您将在下一张图片中找到突出显示的文件 icu4j-docs.jar icu4j-src.jar icu4j.jar 关于在Netbeans IDE中创建和添加库的以下说明 import com.ibm.icu.text.ArabicShaping; import com.ibm.icu.text.ArabicShapingException; import java.io.File; import java.io.IOException; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDPage; import org.apache.pdfbox.pdmodel.PDPageContentStream; import org.apache.pdfbox.pdmodel.font.*; public class Main { public static void main(String[] args) throws IOException , ArabicShapingException { File f = new File("Arabic Font File of format.ttf"); PDDocument doc = new PDDocument(); PDPage Page = new PDPage(); doc.addPage(Page); PDPageContentStream Writer = new PDPageContentStream(doc, Page); Writer.beginText(); Writer.setFont(PDType0Font.load(doc, f), 20); Writer.newLineAtOffset(0, 700); //The Trick in the next Line of Code But Here is some few Notes first //We have to reverse the string because PDFBox is Writting from the left but Arabic is RTL Language //The output will be perfect except every line will be justified to the left "It's not hard to resolve this" // So we have to write arabic string to pdf line by line..It will be like this String s ="جملة بالعربي لتجربة الكلاس اللذي يساعد علي وصل الحروف بشكل صحيح"; Writer.showText(new StringBuilder(new ArabicShaping(reverseNumbersInString(ArabicShaping.LETTERS_SHAPE).shape(s))).reverse().toString()); // Note the previous line of code throws ArabicShapingExcpetion Writer.endText(); Writer.close(); doc.save(new File("File_Test.pdf")); doc.close(); } } 导航到工具栏并单击“工具” 选择图书馆在左下角，您将找到新建库按钮创建您的库导航到在“库”列表中创建的库单击它并像那样添加jar文件夹在类路径中添加icu4j.jar 在源代码中添加icu4j-src.jar 在Javadoc中添加icu4j-docs.jar 从最右侧查看已打开的项目展开要在其中使用库的项目
右键单击libraries文件夹并选择AddLibra