Java:在每309个字符后插入换行符

Java:在每309个字符后插入换行符,java,split,newline,Java,Split,Newline,让我先说一下,我对Java非常陌生 我有一个包含一行的文件。文件大小约为200MB。我需要在每309个字符后插入一个换行符。我相信我有足够的代码来正确地完成这项工作,但我总是遇到内存错误。我尝试过增加堆空间,但没有效果 是否有一种内存占用较少的处理方法 BufferedReader r = new BufferedReader(new FileReader(fileName)); String line; while ((line=r.readLine()) != null) { Sys

让我先说一下,我对Java非常陌生

我有一个包含一行的文件。文件大小约为200MB。我需要在每309个字符后插入一个换行符。我相信我有足够的代码来正确地完成这项工作,但我总是遇到内存错误。我尝试过增加堆空间,但没有效果

是否有一种内存占用较少的处理方法

BufferedReader r = new BufferedReader(new FileReader(fileName));

String line;

while ((line=r.readLine()) != null) {
  System.out.println(line.replaceAll("(.{309})", "$1\n"));
}

您的代码有两个问题:

  • 您一次将整个文件加载到内存中,假设它是一行,那么您需要至少200MB的堆空间;及

  • 使用这样的正则表达式来添加新行是一种非常低效的方法。简单的代码解决方案将快一个数量级

  • 这两个问题都很容易解决

    使用and一次加载309个字符,添加换行符并写出这些字符

    更新:添加了一个逐字符和缓冲读取的测试。缓冲读取实际上增加了很多复杂性,因为您需要满足可能(但通常非常罕见)的情况,即
    read()
    返回的字节数少于您要求的字节数,但仍有字节需要读取

    首先是简单的版本:

    private static void charRead(boolean verifyHash) {
      Reader in = null;
      Writer out = null;
      long start = System.nanoTime();
      long wrote = 0;
      MessageDigest md = null;
      try {
        if (verifyHash) {
          md = MessageDigest.getInstance("SHA1");
        }
        in = new BufferedReader(new FileReader(IN_FILE));
        out = new BufferedWriter(new FileWriter(CHAR_FILE));
        int count = 0;
        for (int c = in.read(); c != -1; c = in.read()) {
          if (verifyHash) {
            md.update((byte) c);
          }
          out.write(c);
          wrote++;
          if (++count >= COUNT) {
            if (verifyHash) {
              md.update((byte) '\n');
            }
            out.write("\n");
            wrote++;
            count = 0;
          }
        }
      } catch (IOException e) {
        throw new RuntimeException(e);
      } catch (NoSuchAlgorithmException e) {
        throw new RuntimeException(e);
      } finally {
        safeClose(in);
        safeClose(out);
        long end = System.nanoTime();
        System.out.printf("Created %s size %,d in %,.3f seconds. Hash: %s%n",
            CHAR_FILE, wrote, (end - start) / 1000000000.0d, hash(md, verifyHash));
      }
    }
    
    和“块”版本:

    给出此结果(英特尔Q9450、Windows 7 64位、8GB RAM、7200rpm 1.5TB驱动器上的测试运行):

    结论:SHA1哈希验证非常昂贵,这就是为什么我运行有和没有的版本。基本上,在预热后,“高效”版本的速度只有原来的2倍。我想此时该文件已有效地存储在内存中

    如果我颠倒块和字符读取的顺序,结果是:

    Created E:\temp\char.dat size 200,647,249 in 8.071 seconds. Hash: 0x22ce9e17e17a67e5ea6f8fe929d2ce4780e8ffa4
    Created E:\temp\char.dat size 200,647,249 in 8.087 seconds. Hash: 0x22ce9e17e17a67e5ea6f8fe929d2ce4780e8ffa4
    Created E:\temp\char.dat size 200,647,249 in 4.128 seconds. Hash: (not calculated)
    Created E:\temp\char.dat size 200,647,249 in 3.918 seconds. Hash: (not calculated)
    Created E:\temp\char.dat size 200,647,249 in 18.020 seconds. Hash: 0x22ce9e17e17a67e5ea6f8fe929d2ce4780e8ffa4
    Created E:\temp\char.dat size 200,647,249 in 17.953 seconds. Hash: 0x22ce9e17e17a67e5ea6f8fe929d2ce4780e8ffa4
    Created E:\temp\char.dat size 200,647,249 in 7.879 seconds. Hash: (not calculated)
    Created E:\temp\char.dat size 200,647,249 in 8.016 seconds. Hash: (not calculated)
    
    有趣的是,一个字符一个字符的版本在第一次读取文件时会受到更大的初始冲击


    所以,像往常一样,它是效率和简单性之间的选择。

    打开它,一次读取一个字符,然后将该字符写入需要的位置。保留一个计数器,每次计数器足够大时,写出一行换行符并将计数器设置为零。

    不确定此解决方案有多好,但您始终可以逐字读取

  • 读取309个字符并写入文件。不确定是否可以一次完成,或者是否必须一次完成一个字符
  • 写入第309个字符后,将换行符输出到文件中
  • 重复
  • 例如(使用站点):


    将文件读取器包装在BufferedReader中,然后继续循环,每次读取309个字符

    类似(未经测试):


    不要使用
    BufferedReader
    ,因为它会将大部分底层文件保存在内存中。直接使用
    FileReader
    ,然后使用
    read()
    方法获取所需的数据:

    FileReader reader = new FileReader(fileName);
    char[] buffer = new char[309];
    int charsRead = 0;
    
    while ((charsRead = reader.read(buffer, 0, buffer.length)) == buffer.length)
    {
        System.out.println(new String(buffer));
    }
    if (charsRead > 0)
    {
         // print any trailing chars
         System.out.println(new String(buffer, 0, charsRead));
    }
    

    读入长度为309的字节数组,然后写入读取的字节:

       import java.io.*;
    
    
    
       public class Test {
          public static void main(String[] args) throws Exception  {
             InputStream in = null;
             byte[] chars = new byte[309];
             try   {
                in = new FileInputStream(args[0]);
                int read = 0;
    
                while((read = in.read(chars)) != -1)   {
                   System.out.write(chars, 0, read);
                   System.out.println("");
                }
             }finally {
                if(in != null)  {
                   in.close();
                }
             }
          }
    
       }
    

    您可以将程序更改为:

     BufferedReader r = null;
    
     r = new BufferedReader(new FileReader(fileName));
     char[] data = new char[309];
    
     while (r.read(data, 0, 309) > 0) {
         System.out.println(new String(data) + "\n");
     }
    

    这是我的想法,没有经过测试。

    您可以设置BufferedReader的大小,以避免一次读取整个文件。-1:您不能保证reader.read()会填充缓冲区。
    BufferedReader
    不会读取,请将整个文件保存在内存中。问题是,如果文件是一行,那么根据定义,
    readLine()
    将读取整个文件,然后将其包装在BufferedReader中。我只是简单地说了一下regex部分(这不是解决这个问题的最佳方法):在这种情况下,第1组是不必要的。您可以改为引用组0,例如,
    replaceAll(“.{309},“$0\n”)
    。必须有一个标准的Unix实用程序才能做到这一点,不是吗?类似于
    columnif309text>out
    ?无论如何,我认为Java对于这样的东西来说太冗长了。@poly:我实际上从我一直在使用的sed代码中获取了正则表达式:sed's/(.\{309\})/\1\n/g'file.txt>file\u parsed.txt我们已经开始使用Talend ETL工具,所以我希望能够在Java中完成它。另外,感谢正则表达式技巧!字节可能会在多字节编码(如utf-8或utf-16)中中断数据。这在最初的问题中没有具体说明,但仍然存在。如果309字节是多字节字符的第一个字节,那么再见。
    Created E:\temp\char.dat size 200,647,249 in 29.690 seconds. Hash: 0x22ce9e17e17a67e5ea6f8fe929d2ce4780e8ffa4
    Created E:\temp\char.dat size 200,647,249 in 18.177 seconds. Hash: 0x22ce9e17e17a67e5ea6f8fe929d2ce4780e8ffa4
    Created E:\temp\char.dat size 200,647,249 in 7.911 seconds. Hash: (not calculated)
    Created E:\temp\char.dat size 200,647,249 in 7.867 seconds. Hash: (not calculated)
    Created E:\temp\char.dat size 200,647,249 in 8.018 seconds. Hash: 0x22ce9e17e17a67e5ea6f8fe929d2ce4780e8ffa4
    Created E:\temp\char.dat size 200,647,249 in 7.949 seconds. Hash: 0x22ce9e17e17a67e5ea6f8fe929d2ce4780e8ffa4
    Created E:\temp\char.dat size 200,647,249 in 3.958 seconds. Hash: (not calculated)
    Created E:\temp\char.dat size 200,647,249 in 3.909 seconds. Hash: (not calculated)
    
    Created E:\temp\char.dat size 200,647,249 in 8.071 seconds. Hash: 0x22ce9e17e17a67e5ea6f8fe929d2ce4780e8ffa4
    Created E:\temp\char.dat size 200,647,249 in 8.087 seconds. Hash: 0x22ce9e17e17a67e5ea6f8fe929d2ce4780e8ffa4
    Created E:\temp\char.dat size 200,647,249 in 4.128 seconds. Hash: (not calculated)
    Created E:\temp\char.dat size 200,647,249 in 3.918 seconds. Hash: (not calculated)
    Created E:\temp\char.dat size 200,647,249 in 18.020 seconds. Hash: 0x22ce9e17e17a67e5ea6f8fe929d2ce4780e8ffa4
    Created E:\temp\char.dat size 200,647,249 in 17.953 seconds. Hash: 0x22ce9e17e17a67e5ea6f8fe929d2ce4780e8ffa4
    Created E:\temp\char.dat size 200,647,249 in 7.879 seconds. Hash: (not calculated)
    Created E:\temp\char.dat size 200,647,249 in 8.016 seconds. Hash: (not calculated)
    
    FileInputStream fis = new FileInputStream(file);
    char current;
    int counter = 0
       while (fis.available() > 0) {
          current = (char) fis.read();
          counter++;
          // output current to file
          if ((counter%309) = 0) {
             //output newline character
          }
       }
    
    BufferedReader r = new BufferedReader(new FileReader("yourfile.txt"), 1024);
    boolean done = false;
    char[] buffer = new char[309];
    while(!done)
    {
       int read = r.read(buffer,0,309);
       if(read > 0)
       {
         //write buffer to dfestination, appending newline
       }
       else
       {
            done = true;
       }
    }
    
    FileReader reader = new FileReader(fileName);
    char[] buffer = new char[309];
    int charsRead = 0;
    
    while ((charsRead = reader.read(buffer, 0, buffer.length)) == buffer.length)
    {
        System.out.println(new String(buffer));
    }
    if (charsRead > 0)
    {
         // print any trailing chars
         System.out.println(new String(buffer, 0, charsRead));
    }
    
       import java.io.*;
    
    
    
       public class Test {
          public static void main(String[] args) throws Exception  {
             InputStream in = null;
             byte[] chars = new byte[309];
             try   {
                in = new FileInputStream(args[0]);
                int read = 0;
    
                while((read = in.read(chars)) != -1)   {
                   System.out.write(chars, 0, read);
                   System.out.println("");
                }
             }finally {
                if(in != null)  {
                   in.close();
                }
             }
          }
    
       }
    
     BufferedReader r = null;
    
     r = new BufferedReader(new FileReader(fileName));
     char[] data = new char[309];
    
     while (r.read(data, 0, 309) > 0) {
         System.out.println(new String(data) + "\n");
     }