为什么java DirectoryStream的执行速度如此之慢?
我已经用流做了一些测试,特别是nio包的DirectoryStreams。我只是尝试获取一个目录中所有文件的列表,按上次修改的日期和大小排序 old File.listFiles()的JavaDoc对Files中的方法声明了: 注意,Files类定义了newDirectoryStream方法 打开一个目录,并在 目录在处理非常大的数据时,这可能会使用较少的资源 目录 我在下面多次运行代码(下面的前三次): 首次运行:为什么java DirectoryStream的执行速度如此之慢?,java,performance,stream,nio,Java,Performance,Stream,Nio,我已经用流做了一些测试,特别是nio包的DirectoryStreams。我只是尝试获取一个目录中所有文件的列表,按上次修改的日期和大小排序 old File.listFiles()的JavaDoc对Files中的方法声明了: 注意,Files类定义了newDirectoryStream方法 打开一个目录,并在 目录在处理非常大的数据时,这可能会使用较少的资源 目录 我在下面多次运行代码(下面的前三次): 首次运行: Run time of Arrays.sort: 1516 Run time
Run time of Arrays.sort: 1516
Run time of Stream.sorted as Array: 2912
Run time of Stream.sorted as List: 2875
第二轮:
Run time of Arrays.sort: 1557
Run time of Stream.sorted as Array: 2978
Run time of Stream.sorted as List: 2937
第三次运行:
Run time of Arrays.sort: 1563
Run time of Stream.sorted as Array: 2919
Run time of Stream.sorted as List: 2896
我的问题是:为什么流的性能如此糟糕
import java.io.File;
import java.io.IOException;
import java.io.UncheckedIOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.attribute.FileTime;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;
public class FileSorter {
// This sorts from old to young and from big to small
Comparator<Path> timeSizeComparator = (Path o1, Path o2) -> {
int sorter = 0;
try {
FileTime lm1 = Files.getLastModifiedTime(o1);
FileTime lm2 = Files.getLastModifiedTime(o2);
if (lm2.compareTo(lm1) == 0) {
Long s1 = Files.size(o1);
Long s2 = Files.size(o2);
sorter = s2.compareTo(s1);
} else {
sorter = lm1.compareTo(lm2);
}
} catch (IOException ex) {
throw new UncheckedIOException(ex);
}
return sorter;
};
public String[] getSortedFileListAsArray(Path dir) throws IOException {
Stream<Path> stream = Files.list(dir);
return stream.sorted(timeSizeComparator).
map(Path::getFileName).map(Path::toString).toArray(String[]::new);
}
public List<String> getSortedFileListAsList(Path dir) throws IOException {
Stream<Path> stream = Files.list(dir);
return stream.sorted(timeSizeComparator).
map(Path::getFileName).map(Path::toString).collect(Collectors.
toList());
}
public String[] sortByDateAndSize(File[] fileList) {
Arrays.sort(fileList, (File o1, File o2) -> {
int r = Long.compare(o1.lastModified(), o2.lastModified());
if (r != 0) {
return r;
}
return Long.compare(o1.length(), o2.length());
});
String[] fileNames = new String[fileList.length];
for (int i = 0; i < fileNames.length; i++) {
fileNames[i] = fileList[i].getName();
}
return fileNames;
}
public static void main(String[] args) throws IOException {
// File (io package)
File f = new File("C:\\Windows\\system32");
// Path (nio package)
Path dir = Paths.get("C:\\Windows\\system32");
FileSorter fs = new FileSorter();
long before = System.currentTimeMillis();
String[] names = fs.sortByDateAndSize(f.listFiles());
long after = System.currentTimeMillis();
System.out.println("Run time of Arrays.sort: " + ((after - before)));
long before2 = System.currentTimeMillis();
String[] names2 = fs.getSortedFileListAsArray(dir);
long after2 = System.currentTimeMillis();
System.out.
println("Run time of Stream.sorted as Array: " + ((after2 - before2)));
long before3 = System.currentTimeMillis();
List<String> names3 = fs.getSortedFileListAsList(dir);
long after3 = System.currentTimeMillis();
System.out.
println("Run time of Stream.sorted as List: " + ((after3 - before3)));
}
}
更新2
在对Peter的解决方案进行了一些研究之后,我可以说,使用for ex.Files.getLastModified读取文件属性肯定是一项艰巨的任务。仅将比较器中的该部分更改为:
Comparator<Path> timeSizeComparator = (Path o1, Path o2) -> {
File f1 = o1.toFile();
File f2 = o2.toFile();
long lm1 = f1.lastModified();
long lm2 = f2.lastModified();
int cmp = Long.compare(lm2, lm1);
if (cmp == 0) {
cmp = Long.compare(f2.length(), f1.length());
}
return cmp;
};
但是正如您所看到的,缓存对象是最好的方法。正如jtahlborn提到的,这是一种稳定的排序
更新3(我找到的最佳解决方案)
经过进一步的研究,我发现Files.lastModified和Files.size方法在同一件事情上都做了大量的工作:属性。因此,我制作了三个版本的PathInfo类进行测试:
After doing all hundred times
Mean performance of Peters solution: 432.26
Mean performance of old File solution: 343.11
Mean performance of read attribute object once solution: 255.66
最佳解决方案的PathInfo构造函数中的代码:
public PathInfo(Path path) {
try {
// read the whole attributes once
BasicFileAttributes bfa = Files.readAttributes(path, BasicFileAttributes.class);
fileName = path.getFileName().toString();
modified = bfa.lastModifiedTime().toMillis();
size = bfa.size();
} catch (IOException ex) {
throw new UncheckedIOException(ex);
}
}
我的结果:从不读取属性两次,在对象中缓存会提高性能。Files.list()是一个O(N)操作,而排序是O(N log N)。更可能的情况是,排序中的操作很重要。考虑到这些比较结果并不相同,这是最有可能的解释。在C:/Windows/System32下有许多文件具有相同的修改日期,这意味着需要经常检查文件大小
为了表明大部分时间并没有花在FIles.list(dir)流中,我优化了比较,以便每个文件只获取一次有关文件的数据
import java.io.File;
import java.io.IOException;
import java.io.UncheckedIOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.attribute.FileTime;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;
public class FileSorter {
// This sorts from old to young and from big to small
Comparator<Path> timeSizeComparator = (Path o1, Path o2) -> {
int sorter = 0;
try {
FileTime lm1 = Files.getLastModifiedTime(o1);
FileTime lm2 = Files.getLastModifiedTime(o2);
if (lm2.compareTo(lm1) == 0) {
Long s1 = Files.size(o1);
Long s2 = Files.size(o2);
sorter = s2.compareTo(s1);
} else {
sorter = lm1.compareTo(lm2);
}
} catch (IOException ex) {
throw new UncheckedIOException(ex);
}
return sorter;
};
public String[] getSortedFileListAsArray(Path dir) throws IOException {
Stream<Path> stream = Files.list(dir);
return stream.sorted(timeSizeComparator).
map(Path::getFileName).map(Path::toString).toArray(String[]::new);
}
public List<String> getSortedFileListAsList(Path dir) throws IOException {
Stream<Path> stream = Files.list(dir);
return stream.sorted(timeSizeComparator).
map(Path::getFileName).map(Path::toString).collect(Collectors.
toList());
}
public String[] sortByDateAndSize(File[] fileList) {
Arrays.sort(fileList, (File o1, File o2) -> {
int r = Long.compare(o1.lastModified(), o2.lastModified());
if (r != 0) {
return r;
}
return Long.compare(o1.length(), o2.length());
});
String[] fileNames = new String[fileList.length];
for (int i = 0; i < fileNames.length; i++) {
fileNames[i] = fileList[i].getName();
}
return fileNames;
}
public List<String> getSortedFile(Path dir) throws IOException {
return Files.list(dir).map(PathInfo::new).sorted().map(p -> p.getFileName()).collect(Collectors.toList());
}
static class PathInfo implements Comparable<PathInfo> {
private final String fileName;
private final long modified;
private final long size;
public PathInfo(Path path) {
try {
fileName = path.getFileName().toString();
modified = Files.getLastModifiedTime(path).toMillis();
size = Files.size(path);
} catch (IOException ex) {
throw new UncheckedIOException(ex);
}
}
@Override
public int compareTo(PathInfo o) {
int cmp = Long.compare(modified, o.modified);
if (cmp == 0)
cmp = Long.compare(size, o.size);
return cmp;
}
public String getFileName() {
return fileName;
}
}
public static void main(String[] args) throws IOException {
// File (io package)
File f = new File("C:\\Windows\\system32");
// Path (nio package)
Path dir = Paths.get("C:\\Windows\\system32");
FileSorter fs = new FileSorter();
long before = System.currentTimeMillis();
String[] names = fs.sortByDateAndSize(f.listFiles());
long after = System.currentTimeMillis();
System.out.println("Run time of Arrays.sort: " + ((after - before)));
long before2 = System.currentTimeMillis();
String[] names2 = fs.getSortedFileListAsArray(dir);
long after2 = System.currentTimeMillis();
System.out.println("Run time of Stream.sorted as Array: " + ((after2 - before2)));
long before3 = System.currentTimeMillis();
List<String> names3 = fs.getSortedFileListAsList(dir);
long after3 = System.currentTimeMillis();
System.out.println("Run time of Stream.sorted as List: " + ((after3 - before3)));
long before4 = System.currentTimeMillis();
List<String> names4 = fs.getSortedFile(dir);
long after4 = System.currentTimeMillis();
System.out.println("Run time of Stream.sorted as List with caching: " + ((after4 - before4)));
}
}
如您所见,大约85%的时间用于反复获取文件的修改日期和大小。在基于路径的比较器中运行lm2 compareTo lm1两次。不是世界末日,但可能影响时代。还有,为什么要用Long来表示文件大小(而不是Long)。@jtahlborn你说得对,这可能是问题的一部分。@jtahlborn我已经更新了我的答案/解决方案,所以这可能是解决我的问题的“fastet”解决方案,而无需并行流?!更不用说,如果文件在排序时被触摸,这将提供一个稳定的排序!很好的解释!谢谢你,以为这是我的Comparator@jtahlborn比如javadocexplained@NwDx流提供了一种很好的方法来生成要排序的缓存对象。这减少了重复的工作。@PeterLawrey我已经更新了我的答案/解决方案,所以这可能是解决我问题的“快速”解决方案,而不是并行流?!
public PathInfo(Path path) {
try {
// read the whole attributes once
BasicFileAttributes bfa = Files.readAttributes(path, BasicFileAttributes.class);
fileName = path.getFileName().toString();
modified = bfa.lastModifiedTime().toMillis();
size = bfa.size();
} catch (IOException ex) {
throw new UncheckedIOException(ex);
}
}
import java.io.File;
import java.io.IOException;
import java.io.UncheckedIOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.attribute.FileTime;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;
public class FileSorter {
// This sorts from old to young and from big to small
Comparator<Path> timeSizeComparator = (Path o1, Path o2) -> {
int sorter = 0;
try {
FileTime lm1 = Files.getLastModifiedTime(o1);
FileTime lm2 = Files.getLastModifiedTime(o2);
if (lm2.compareTo(lm1) == 0) {
Long s1 = Files.size(o1);
Long s2 = Files.size(o2);
sorter = s2.compareTo(s1);
} else {
sorter = lm1.compareTo(lm2);
}
} catch (IOException ex) {
throw new UncheckedIOException(ex);
}
return sorter;
};
public String[] getSortedFileListAsArray(Path dir) throws IOException {
Stream<Path> stream = Files.list(dir);
return stream.sorted(timeSizeComparator).
map(Path::getFileName).map(Path::toString).toArray(String[]::new);
}
public List<String> getSortedFileListAsList(Path dir) throws IOException {
Stream<Path> stream = Files.list(dir);
return stream.sorted(timeSizeComparator).
map(Path::getFileName).map(Path::toString).collect(Collectors.
toList());
}
public String[] sortByDateAndSize(File[] fileList) {
Arrays.sort(fileList, (File o1, File o2) -> {
int r = Long.compare(o1.lastModified(), o2.lastModified());
if (r != 0) {
return r;
}
return Long.compare(o1.length(), o2.length());
});
String[] fileNames = new String[fileList.length];
for (int i = 0; i < fileNames.length; i++) {
fileNames[i] = fileList[i].getName();
}
return fileNames;
}
public List<String> getSortedFile(Path dir) throws IOException {
return Files.list(dir).map(PathInfo::new).sorted().map(p -> p.getFileName()).collect(Collectors.toList());
}
static class PathInfo implements Comparable<PathInfo> {
private final String fileName;
private final long modified;
private final long size;
public PathInfo(Path path) {
try {
fileName = path.getFileName().toString();
modified = Files.getLastModifiedTime(path).toMillis();
size = Files.size(path);
} catch (IOException ex) {
throw new UncheckedIOException(ex);
}
}
@Override
public int compareTo(PathInfo o) {
int cmp = Long.compare(modified, o.modified);
if (cmp == 0)
cmp = Long.compare(size, o.size);
return cmp;
}
public String getFileName() {
return fileName;
}
}
public static void main(String[] args) throws IOException {
// File (io package)
File f = new File("C:\\Windows\\system32");
// Path (nio package)
Path dir = Paths.get("C:\\Windows\\system32");
FileSorter fs = new FileSorter();
long before = System.currentTimeMillis();
String[] names = fs.sortByDateAndSize(f.listFiles());
long after = System.currentTimeMillis();
System.out.println("Run time of Arrays.sort: " + ((after - before)));
long before2 = System.currentTimeMillis();
String[] names2 = fs.getSortedFileListAsArray(dir);
long after2 = System.currentTimeMillis();
System.out.println("Run time of Stream.sorted as Array: " + ((after2 - before2)));
long before3 = System.currentTimeMillis();
List<String> names3 = fs.getSortedFileListAsList(dir);
long after3 = System.currentTimeMillis();
System.out.println("Run time of Stream.sorted as List: " + ((after3 - before3)));
long before4 = System.currentTimeMillis();
List<String> names4 = fs.getSortedFile(dir);
long after4 = System.currentTimeMillis();
System.out.println("Run time of Stream.sorted as List with caching: " + ((after4 - before4)));
}
}
Run time of Arrays.sort: 1980
Run time of Stream.sorted as Array: 1295
Run time of Stream.sorted as List: 1228
Run time of Stream.sorted as List with caching: 185