Java 索引大型文本文件的最快方法
我想索引大约1GB的大文本文件,所以我在另一个文件中存储新行位置,以便以后访问该文件,这是我的代码Java 索引大型文本文件的最快方法,java,android,indexing,full-text-indexing,Java,Android,Indexing,Full Text Indexing,我想索引大约1GB的大文本文件,所以我在另一个文件中存储新行位置,以便以后访问该文件,这是我的代码 while (true) { raf.seek(currentPos); byte[] bytes = new byte[1000000]; raf.read(bytes, 0, bytes.length); for (int i = 0;
while (true) {
raf.seek(currentPos);
byte[] bytes = new byte[1000000];
raf.read(bytes, 0, bytes.length);
for (int i = 0; i < bytes.length; i++) {
if (bytes[i] == 10) {
rafw.writeInt(currentPos + i);
}
}
currentPos = currentPos + sizeOfPacket;
if (currentPos > raf.length()) {
sizeOfPacket = (int) raf.length() - currentPos;
} else if (currentPos == raf.length()) {
break;
}
bytesCounter = bytesCounter + 1000000;
//Log.d("DicData", "Percentage=" + currentPos + " " + raf.length());
int progress = (int) (bytesCounter * 100.0 / folderSize + 0.5);
iDicIndexingListener.onTotalIndexingProgress(progress < 100 ? progress : 100);
在这里,我检查所有文件字节的值10,这意味着\n新行,我的大问题是:这个过程花费太多时间,大约15分钟,我的问题是:有没有比这个更快的方法?谢谢您可以使用lib扫描仪预读文件以索引新行位置:
File file = null;
//init file here
int newLineIndex = 0;
int lineSepLength = System.lineSeparator().length(); // \r, \n or \r\n depend on OS
Scanner sc = new Scanner(file);
while(sc.hasNextLine()) {
newLineIndex = sc.nextLine().length() + lineSepLength;
//persist newLineIndex
}
使用1MIO写入和读取1GB文件。在我的机器上,每条线路的运行时间小于10秒。我怀疑您的性能瓶颈在其他地方
public class Test {
public static void main(String[] args) throws Exception {
File file = new File("test.txt");
System.out.println("writing 1 GB file with 1 mio. lines...");
try(FileOutputStream fos = new FileOutputStream(file)) {
for(int i = 0; i < 1024 * 1024; i++) {
fos.write(new byte[1023]);
fos.write(10);
if(i % 1024 == 0) {
System.out.println(i / 1024 + " MB...");
}
}
}
System.out.println("done.");
System.out.println("reading line positions...");
List<Long> lineStartPositions = new ArrayList<>();
lineStartPositions.add(0L);
long positionInFile = -1;
byte[] buffer = new byte[1024 * 1024];
try(FileInputStream fis = new FileInputStream(file)) {
long read = 0;
while((read = fis.read(buffer)) != -1) {
System.out.println("processing MB: " + positionInFile / 1024 / 1024);
for(int i = 0; i < read; i++) {
positionInFile++;
if(buffer[i] == 10) {
lineStartPositions.add(positionInFile + 1);
}
}
}
// remove the last line index in case the last byte of the file was a newline
if(lineStartPositions.get(lineStartPositions.size() - 1) >= file.length()) {
lineStartPositions.remove(lineStartPositions.size() - 1);
}
}
System.out.println("found: " + lineStartPositions.size());
System.out.println("expected: " + 1024 * 1024);
}
}
也许这能帮上忙?string.getBytes.length!=字符串长度