Android 使用iText从pdf文件读取json时出错_Android_Json_Itext_Gson

Android 使用iText从pdf文件读取json时出错

android json itext

Android 使用iText从pdf文件读取json时出错,android,json,itext,gson,Android,Json,Itext,Gson,我一直在尝试从pdf文件中读取JSON。我能够将JSON字符串写入pdf，但当我读取pdf时，我得到如下错误原因：com.google.gson.stream.MalformedJSON异常：未终止对象位于第60行第3列路径$。所有路由[0]。路由数据在写入文件之前，我打印了JSON，并使用JSON验证器在线验证它，它是有效的JSON，但在我将其写入pdf后，它将变得无效。我只是从pdf复制了JSON，并在线验证了它，但没有验证，并且给出了错误下面是将JSON写入pdf文件的代码 try

我一直在尝试从pdf文件中读取JSON。我能够将JSON字符串写入pdf，但当我读取pdf时，我得到如下错误

原因：com.google.gson.stream.MalformedJSON异常：未终止对象位于第60行第3列路径$。所有路由[0]。路由数据

在写入文件之前，我打印了JSON，并使用JSON验证器在线验证它，它是有效的JSON，但在我将其写入pdf后，它将变得无效。我只是从pdf复制了JSON，并在线验证了它，但没有验证，并且给出了错误

下面是将JSON写入pdf文件的代码

try {
    File file = AppUtils.createFile(".pdf");
    Document document = new Document();
    document.setPageSize(PageSize.A4);
    document.addCreationDate();
    document.addAuthor("Me");
    PdfWriter.getInstance(document, new FileOutputStream(file));
    document.open();

    String jsonBody = new Gson().toJson(backUpModel);

    Gson gson = new GsonBuilder().setPrettyPrinting().create();
    JsonParser parser = new JsonParser();
    JsonElement jsonElement = parser.parse(jsonBody);
    String prettyJsonBody = gson.toJson(jsonElement);

    Log.i(Constants.TAG, "Input Json: " + prettyJsonBody);
    document.add(new Paragraph(prettyJsonBody));
    document.close();

    //Toast.makeText(BackUp.this, "Saved Succesfully", Toast.LENGTH_SHORT).show();
} catch (Exception e) {
    e.printStackTrace();
}

下面是读取PDF文件的代码

try {
    File exportDir = new File(Environment.getExternalStorageDirectory(), Constants.TAG);
    String filePath = exportDir.getPath() + File.separator + getFileName(fileUri);
    PdfReader pdfReader = new PdfReader(filePath);
    int numberOfPages = pdfReader.getNumberOfPages();
    StringBuilder stringBuilder = new StringBuilder();
    for (int i = 1; i <= numberOfPages; i++) {
        stringBuilder.append(PdfTextExtractor.getTextFromPage(pdfReader, i));
    }
    pdfReader.close();
    String jsonBody = stringBuilder.toString();
    BackUpModel backUpModel = new Gson().fromJson(jsonBody, BackUpModel.class);
} catch (IOException e) {
    e.printStackTrace();
}

试试看{
File exportDir=新文件（Environment.getExternalStorageDirectory（），Constants.TAG）；
字符串filePath=exportDir.getPath（）+File.separator+getFileName（fileUri）；
PdfReader PdfReader=新PdfReader（文件路径）；
int numberOfPages=pdfReader.getNumberOfPages（）；
StringBuilder StringBuilder=新的StringBuilder（）；
对于（inti=1；i比较输入json和输出，很明显，您无法从当前代码生成的PDF中忠实地提取json
问题出现在将字符串呈现为PDF时添加换行符，以防止文本进入页边空白处。结果中的每个换行符可能已经出现在输入字符串中，或者可能是由iText引入的，通常情况下无法识别
如果iText在名称或值外的空格或标点符号（冒号、逗号、括号）处换行，这些额外的换行符不会改变json对象的含义，但名称和值内的换行符是另一回事
即使我们可以假设名称或值中没有任何换行符（实际上，您共享的json中的值中有换行符，但这些换行符可能是由于您共享的方式而出现的）因此，我们可以简单地删除它们，其中一些换行符被应用于原始值中有空格的地方，而另一些则没有。当一行在某个空格中被打断时，该空格被删除，并且不再出现在最终输出中。同样，一般来说，它是不可识别的，这是仅提取的空格的情况手头的产出
因此，忠实的提取是不可能的

因此，您必须改变在PDF中嵌入json的方式。由于您根本没有提到为什么要这样做以及您有哪些备选方案，因此我无法给出最终建议，仅提供一些可能与您的要求兼容或不兼容的选项：

嵌入json不是作为常规的静态页面内容，而是作为多行表单文本字段的值。表单字段中的值可以忠实地从PDF中提取
除了页面内容中可见的json之外，还将json嵌入PDF中的私有流对象中；然后您可以忠实地从该流对象中提取json
使用小字体大小，以便在渲染过程中iText不会添加换行符（但结果很可能太小，无法在不放大的情况下读取）
手动呈现json（使用低级iText API），并以某种方式标记添加的换行符和删除的空格。在提取过程中，您必须对这些标记做出反应


例如，要实现选项1，将json嵌入为多行表单文本字段的值，只需添加如下内容：
Document document = new Document();
document.setPageSize(PageSize.A4);
document.addCreationDate();
document.addAuthor("Me");
PdfWriter pdfWriter = PdfWriter.getInstance(document, new FileOutputStream(jsonPdfFile));
document.open();
pdfWriter.getAcroForm().setNeedAppearances(true);
TextField textField = new TextField(pdfWriter, document.getPageSize(), "json");
textField.setOptions(TextField.MULTILINE | TextField.READ_ONLY);
PdfFormField field = textField.getTextField();
field.setValueAsString(originalJson);
pdfWriter.addAnnotation(field);
document.close();

PdfReader pdfReader = new PdfReader(jsonPdfFile.getAbsolutePath());
String jsonBody = pdfReader.getAcroFields().getField("json");
pdfReader.close();

然后像这样再次提取它：
Document document = new Document();
document.setPageSize(PageSize.A4);
document.addCreationDate();
document.addAuthor("Me");
PdfWriter pdfWriter = PdfWriter.getInstance(document, new FileOutputStream(jsonPdfFile));
document.open();
pdfWriter.getAcroForm().setNeedAppearances(true);
TextField textField = new TextField(pdfWriter, document.getPageSize(), "json");
textField.setOptions(TextField.MULTILINE | TextField.READ_ONLY);
PdfFormField field = textField.getTextField();
field.setValueAsString(originalJson);
pdfWriter.addAnnotation(field);
document.close();

PdfReader pdfReader = new PdfReader(jsonPdfFile.getAbsolutePath());
String jsonBody = pdfReader.getAcroFields().getField("json");
pdfReader.close();

（testtestJsonToPdfToJsonFormField
）
我正在使用当前的iText 5.5.14-SNAPSHOT开发分支。不过，该代码应该适用于任何5.5.x版本。
“我一直在尝试从pdf文件读取JSON”-您是如何尝试的？当您将读取的文本与原始json进行比较时，它们有何区别？@mkl您可以从上面的链接获取这两个json。您可以使用@mkl对其进行验证或比较。我已经添加了从PDF文件读取json的代码。您的问题是换行符和额外的空间new paragration（）：
正在添加。新段落（）
是一种抽象，它可以自行处理空格、段落等内容。您需要手动编写完美的JSON
，也就是说，可能在原始JSON包含空格的地方出现了一些换行符，而这些空格可能会在整个过程中丢失。因此，我的问题是，我是否同意ach也可以工作…你能解释一下我该怎么做吗？嵌入json不是作为常规的静态页面内容，而是作为多行表单文本字段的值。表单字段中的值可以从PDF中忠实地提取。“你能解释一下吗…”我在回答中添加了一个示例。