Java 将文件缓存在内存中并并行读取_Java_Performance_File Io_Randomaccessfile

Java 将文件缓存在内存中并并行读取

java performance file-io

Java 将文件缓存在内存中并并行读取,java,performance,file-io,randomaccessfile,Java,Performance,File Io,Randomaccessfile,我有一个程序（简单的日志解析器），它非常慢，因为在某些情况下，它必须完全扫描输入文件。所以我想把整个文件（~100MB）预缓存在内存中，并用多线程读取它在实际配置中，我使用BufferedReader执行“主读取”和RandomAccessFile以转到特定偏移量并读取所需内容我试过这样做： .. Reader reader = null; if (cache) { // caching file in memory br = new BufferedReader(new F

我有一个程序（简单的日志解析器），它非常慢，因为在某些情况下，它必须完全扫描输入文件。所以我想把整个文件（~100MB）预缓存在内存中，并用多线程读取它

在实际配置中，我使用BufferedReader执行“主读取”和RandomAccessFile以转到特定偏移量并读取所需内容

我试过这样做：

..
Reader reader = null;
if (cache) {
    // caching file in memory
    br = new BufferedReader(new FileReader(file));
    buffer = new StringBuilder();
    for (String line = br.readLine(); line != null; line = br.readLine()) {
        buffer.append(line).append(CR);
    }
    br.close();
    reader = new StringReader(buffer.toString());
} else {
    reader = new FileReader(file);
}
br = new BufferedReader(reader);
for (String line = br.readLine(); line != null; line = br.readLine()) {
    offset += line.length() + 1; // Il +1 è per il line.separator
    matcher = Constants.PT_BEGIN_COMPOSITION.matcher(line);
    if (matcher.matches()) {
        linecount++;
        record = new Record();
        record.setCompositionCode(matcher.group(1));
        matcher = Constants.PT_PREFIX.matcher(line);
        if (matcher.matches()) {
            record.setBeginComposition(Constants.SDF_DATE.parse(matcher.group(1)));
            record.setProcessId(matcher.group(2));
            if (cache) {
                executor.submit(new PubblicationParser(buffer, offset, record));
            } else {
                executor.submit(new PubblicationParser(file, offset, record));
            }
            records.add(record);
        } else {
            br.close();
            throw new ParseException(line, 0);
        }
    }
}

在

publiblicationparser

中有一个

init（）

方法，用于选择要使用的自定义读取器。随机访问文件读取器：

if (file != null) {
    this.logReader = new RandomAccessFileReader(file, offset);
} else if (sb != null) {
    this.logReader = new StringBuilderReader(sb, (int) offset);
}

这是我的2个自定义阅读器：

//
public class StringBuilderReader implements LogReader {
    public static final String CR = System.getProperty("line.separator");
    private final StringBuilder sb;
    private int offset;

    public StringBuilderReader(StringBuilder sb, int offset) {
        super();
        this.sb = sb;
        this.offset = offset;
    }

    @Override
    public String readLine() throws IOException {
        if (offset >= sb.length()) {
            return null;
        }
        int indexOf = sb.indexOf(CR, offset);
        if (indexOf < 0) {
            indexOf = sb.length();
        }
        String substring = sb.substring(offset, indexOf);
        offset = indexOf + CR.length();
        return substring;
    }

    @Override
    public void close() throws IOException {
        // TODO Auto-generated method stub
    }
}
//
public class RandomAccessFileReader implements LogReader {
    private static final String FILEMODE_R = "r";
    private final RandomAccessFile raf;

    public RandomAccessFileReader(File file, long offset) throws IOException {
        this.raf = new RandomAccessFile(file, FILEMODE_R);
        this.raf.seek(offset);
    }

    @Override
    public void close() throws IOException {
        raf.close();
    }

    @Override
    public String readLine() throws IOException {
        return raf.readLine();
    }
}

//
公共类StringBuilderReader实现日志读取器{
公共静态最终字符串CR=System.getProperty（“line.separator”）；
私人住宅；私人住宅；私人住宅；
私有整数偏移；
公共StringBuilderReader（StringBuilder sb，int offset）{
超级（）；
这是；
这个偏移量=偏移量；
}
@凌驾
公共字符串readLine（）引发IOException{
如果（偏移量>=sb.length（））{
返回null；
}
int indexOf=sb.indexOf（CR，偏移量）；
if（indexOf<0）{
indexOf=sb.length（）；
}
String substring=sb.substring（offset，indexOf）；
偏移量=indexOf+CR.长度（）；
返回子串；
}
@凌驾
public void close（）引发IOException{
//TODO自动生成的方法存根
}
}
//
公共类RandomAccessFileReader实现LogReader{
私有静态最终字符串文件模式\u R=“R”；
私人最终文件raf；
public RandomAccessFileReader（文件，长偏移量）引发IOException{
this.raf=新的随机访问文件（文件，文件模式）；
本.raf.seek（偏移量）；
}
@凌驾
public void close（）引发IOException{
raf.close（）；
}
@凌驾
公共字符串readLine（）引发IOException{
返回raf.readLine（）；
}
}

问题是“缓存方式”太慢了，我明白为什么

您应该确保确实是I/O导致应用程序运行缓慢，而不是其他原因（例如解析器中的低效逻辑）。为此，您可以使用Java探查器（例如JProfiler）

如果它确实是I/O，那么最好使用一些现成的解决方案将文件加载到内存中——基本上这就是您试图自己实现的

看一看。

关于什么的想法？你的问题是什么？你需要：（0）一个明确的问题陈述。（1）您正在尝试的代码。（2）你期望它做什么。（3）它在做什么。“任何想法”对于这个网站来说都不是一个合适的问题。这不是一个讨论论坛。对不起，伙计，我的错！我已经发布了我试图做的事情，但没有成功。我确信没有不必要的i/O，因为在某些情况下（没有例外或少数情况），“PubblicationParser”必须读取的逻辑块很小（30/50行），所以RAF是最好的解决方案（30''中的12k块）.如果我尝试使用BufferedReader读取同一个块，则经过的时间会随着我的处理而增长，因为（我认为）它需要一个不必要的缓存。问题是，当我有一个包含大量异常（以及堆栈跟踪）的文件时，逻辑块可能是2500行或更多行，在这种情况下需要花费大量时间。尝试阅读你的链接，谢谢！