C++ 如何从路径列表中优化目录列表？_C++_String_Performance_Optimization_Fuse

C++ 如何从路径列表中优化目录列表？

c++ string performance optimization

C++ 如何从路径列表中优化目录列表？,c++,string,performance,optimization,fuse,C++,String,Performance,Optimization,Fuse,在编写fuse文件系统时，我有一个无序映射作为缓存，它在启动时使用所有文件和目录进行初始化，以减少硬盘上的读取为了满足readdir（）回调，我编写了以下循环： const int sp = path == "/" ? 0 : path.size(); for (auto it = stat_cache.cbegin(); it != stat_cache.cend(); it++) { if (it->first.size() > sp) { in

在编写fuse文件系统时，我有一个

无序映射

作为缓存，它在启动时使用所有文件和目录进行初始化，以减少硬盘上的读取

为了满足

readdir（）

回调，我编写了以下循环：

const int sp = path == "/" ? 0 : path.size();
for (auto it = stat_cache.cbegin(); it != stat_cache.cend(); it++)
{
    if (it->first.size() > sp)
    {
        int ls = it->first.find_last_of('/');
        if (it->first.find(path, 0) == 0 && ls == sp)
            filler(buf, it->first.substr(ls + 1).c_str(), const_cast<struct stat*>(&it->second), 0, FUSE_FILL_DIR_PLUS);
    }
}

然而，现在速度惊人地慢（特别是在缓存中有50多万个对象的文件系统中）。Valgrind/Callgrind特别指责

std:：string:find_last_of（）

和

std:：string:：find（）

调用

为了加速循环，我已经添加了

if（it->first.size（）>sp）

，但是性能的提高最多也只是最小的

我还尝试通过将循环并行化为四个块来加速这个例程，但在

无序映射：：cbegin（）

我已经没有实际的代码了，但我相信它看起来是这样的：

const int sp = path == "/" ? 0 : path.size();
ThreadPool<4> tpool;
ulong cq = stat_cache.size()/4;
for (int i = 0; i < 4; i++)
{
    tpool.addTask([&] () {
        auto it = stat_cache.cbegin();
        std::next(it, i * cq);
        for (int j = 0; j < cq && it != stat_cache.cend(); j++, it++)
        {
            if (it->first.size() > sp)
            {
                int ls = it->first.find_last_of('/');
                if (it->first.find(path, 0) == 0 && ls == sp)
                    filler(buf, it->first.substr(ls + 1).c_str(), const_cast<struct stat*>(&it->second), 0, FUSE_FILL_DIR_PLUS);
            }
        }
    });
}
tpool.joinAll();

const int sp=path==“/”？0:path.size（）；
线程池tpool；
ulong cq=stat_cache.size（）/4；
对于（int i=0；i<4；i++）
{
tpool.addTask（[&]（）{
auto it=stat_cache.cbegin（）；
std：：next（it，i*cq）；
对于（int j=0；jfirst.size（）>sp）
{
int ls=it->first.find_last_of（'/'）；
if（it->first.find（path，0）==0&&ls==sp）
填充（buf，it->first.substr（ls+1）.c_str（），const_cast（&it->second），0，FUSE_FILL_DIR_PLUS）；
}
}
});
}
tpool.joinAll（）；

我还尝试了用map bucket拆分它，这为提供了一个方便的重载，但它仍然会出错

同样，我目前正在处理第一个（非并行）代码，并且希望得到帮助，因为并行化的代码不起作用。我只是觉得我应该包括我对完整性、noob攻击和努力证明的并行尝试

优化此循环还有其他选项吗？

这里要做的一件小事是更改

if

：

if (it->first.find(path, 0) == 0 && ls == sp)

简单地说：

if (ls == sp && it->first.find(path, 0) == 0)

显然，比较两个整数要比查找子字符串快得多。
我不能保证它会改变性能，但跳过许多不必要的

std:：string:：find

调用可能会有帮助。也许编译器已经这么做了，我会研究反汇编

另外，由于文件路径无论如何都是唯一的，所以我会使用

std:：vector

来代替它-更好的缓存位置，更少的内存分配等等。请记住先保留大小。

真正的问题是

for (auto it = stat_cache.cbegin(); it != stat_cache.cend(); it++)

有效地消除无序的地图最大的优点和暴露的弱点之一。你不仅没有它的O（1）查找，而且你可能必须在地图中搜索以找到一个条目，这使得rutine O（N）具有一个非常大的K（如果不是一个额外的N即O（N^2））

最快的解决方案是O（1）用于查找（在幸运的无序映射中），O（strlen（target））用于bucket方案，或者O（lgN）用于二进制。然后沿着

struct stat

为O（#children）列出子对象。

另一个简单的修改是

it->first.find_last_of（'/'，sp）

我不确定std:：vector是否是一个好的选择，因为缓存将基于字符串键获得大量随机访问，例如

getattr（）

调用。用一个向量，我必须扫描整个物体，直到我发现它不再是无序的O（1）。对吗？@Cobra\u Fast您可以添加所有路径并对它们进行排序，然后使用二进制搜索查找特定的密钥。和往常一样，我们无法保证性能的提高，除非你对这两个建议进行基准测试，否则这些都是理论。可能是因为它在[0；pos]和[pos；npos]之间搜索，如果我正确解释手册的话。@Cobra\u Fast，ops我把它和find\u first\u；无论如何，您仍然可以提前跳转sp开始搜索最后一个“/”（使用或类似boost:：string_ref）如果代码正在运行，并且您只是要求优化它以提高速度/效率，那么您应该将其发布。不要使用

无序映射

，而是使用

映射

，调用

upper_bound

传入目录名以查找O（lgn）中的第一个条目，然后所有其他条目都是相邻的。读取目录的最终成本：O（k+lg N）

for (auto it = stat_cache.cbegin(); it != stat_cache.cend(); it++)