Java Apache FileUtils readFileToString和writeStringToFile问题_Java_Apache_File_Encoding

Java Apache FileUtils readFileToString和writeStringToFile问题

java apache file encoding

Java Apache FileUtils readFileToString和writeStringToFile问题,java,apache,file,encoding,Java,Apache,File,Encoding,我需要将一个java文件（实际上是一个.pdf）解析为一个字符串，然后返回到一个文件。在这些过程之间，我将对给定的字符串应用一些补丁，但在这种情况下这并不重要。我开发了以下JUnit测试用例： String f1String=FileUtils.readFileToString(f1); File temp=File.createTempFile("deleteme", "deleteme"); FileUtils.writeStringToFile(temp, f1S

我需要将一个java文件（实际上是一个.pdf）解析为一个字符串，然后返回到一个文件。在这些过程之间，我将对给定的字符串应用一些补丁，但在这种情况下这并不重要。我开发了以下JUnit测试用例：

    String f1String=FileUtils.readFileToString(f1);
    File temp=File.createTempFile("deleteme", "deleteme");
    FileUtils.writeStringToFile(temp, f1String);
    assertTrue(FileUtils.contentEquals(f1, temp));

此测试将文件转换为字符串并将其写回。然而，测试失败了。我认为这可能是因为编码的缘故，但在FileUtils中并没有关于这方面的详细信息。有人能帮忙吗？谢谢

为了进一步了解，增加了：为什么我需要这个？我在一台机器中有非常大的PDF，它们在另一台机器中复制。第一个负责创建这些PDF。由于第二台机器的低连接性和PDF的大尺寸，我不想同步整个PDF，只想同步所做的更改。要创建/应用补丁，我使用google库DiffMatchPatch。此库在两个字符串之间创建修补程序。因此，我需要将pdf加载到字符串中，应用生成的修补程序，然后将其放回文件。

我有一些想法：

实际上，其中一个文件中可能有一些BOM（字节顺序标记）字节，这些字节在读取时被剥离，或者在写入时被添加。文件大小是否存在差异（如果是BOM表，则差异应为2或3个字节）

换行符可能不匹配，这取决于创建文件的系统，即一个可能有CR LF，而另一个只有LF或CR。（每个换行符相差1字节）

根据JavaDoc，这两种方法都应该使用JVM的默认编码，这两种操作都应该是相同的。但是，请尝试使用显式设置的编码进行测试（JVM的默认编码将使用

System.getProperty（“file.encoding”）

进行查询）

PDF不是文本文件。对非编码文本的二进制文件进行解码（转换为Java字符）和重新编码是不对称的。例如，如果输入bytestream对于当前编码无效，则可以确保它不会正确重新编码。简言之，不要那样做。使用。Ed Staub awnser指出了我的解决方案不起作用的原因，他建议使用字节而不是字符串。在我的例子中，我需要一个字符串，因此我找到的最终工作解决方案如下：

@Test
public void testFileRWAsArray() throws IOException{
    String f1String="";
    byte[] bytes=FileUtils.readFileToByteArray(f1);
    for(byte b:bytes){
        f1String=f1String+((char)b);
    }
    File temp=File.createTempFile("deleteme", "deleteme");
    byte[] newBytes=new byte[f1String.length()];
    for(int i=0; i<f1String.length(); ++i){
        char c=f1String.charAt(i);
        newBytes[i]= (byte)c;
    }
    FileUtils.writeByteArrayToFile(temp, newBytes);
    assertTrue(FileUtils.contentEquals(f1, temp));
}

@测试
public void testFileRWAsArray（）引发IOException{
字符串f1String=“”；
byte[]bytes=FileUtils.readFileToByteArray（f1）；
for（字节b：字节）{
f1String=f1String+（（字符）b）；
}
File temp=File.createTempFile（“deleteme”、“deleteme”）；
字节[]新字节=新字节[f1String.length（）]；
对于（inti=0；i请尝试以下代码
  public static String fetchBase64binaryEncodedString(String path) {
        File inboundDoc = new File(path);
        byte[] pdfData;
        try {
            pdfData = FileUtils.readFileToByteArray(inboundDoc);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
        byte[] encodedPdfData = Base64.encodeBase64(pdfData);
        String attachment = new String(encodedPdfData);
        return attachment;
    }

//How to decode it
public void testConversionPDFtoBase64() throws IOException
{
   String path = "C:/Documents and Settings/kantab/Desktop/GTR_SDR/MSDOC.pdf";
   File origFile = new File(path);
   String encodedString = CreditOneMLParserUtil.fetchBase64binaryEncodedString(path);

  //now decode it
  byte[] decodeData  = Base64.decodeBase64(encodedString.getBytes());
  String decodedString = new String(decodeData);
  //or actually give the path to pdf file.
  File decodedfile = File.createTempFile("DECODED", ".pdf");
  FileUtils.writeByteArrayToFile(decodedfile,decodeData);
  Assert.assertTrue(FileUtils.contentEquals(origFile, decodedfile));

 // Frame frame = new Frame("PDF Viewer");

 // frame.setLayout(new BorderLayout());

}

谢谢。不过我需要把它转换成一个字符串（我已经扩展了解释，解释了原因）。有并没有任何字符集可以使它对称？或者其他解决方案？我已经找到了解决方案。在帖子中添加了一个awnser。