数组中字符串的javascript fs搜索文件。避免表现不佳_Javascript_Node.js_Big O_Fs_Ramda.js

数组中字符串的javascript fs搜索文件。避免表现不佳

javascript node.js big-o

数组中字符串的javascript fs搜索文件。避免表现不佳,javascript,node.js,big-o,fs,ramda.js,Javascript,Node.js,Big O,Fs,Ramda.js,我正在构建一个工具，如果源代码中不再使用本地化字符串，它将清理包含本地化字符串的JSON文件首先，我将本地化文件解析为一个数组，其中包含源代码中使用（或不再使用）的所有id，以获得正确语言的字符串值我有一个数组，看起来像这样： const ids = ['home.title', 'home.description', 'menu.contact', 'menu.social']; const jsFiles = await globbing('./**/*.js', {cwd: dire

我正在构建一个工具，如果源代码中不再使用本地化字符串，它将清理包含本地化字符串的JSON文件

首先，我将本地化文件解析为一个数组，其中包含源代码中使用（或不再使用）的所有id，以获得正确语言的字符串值

我有一个数组，看起来像这样：

const ids = ['home.title', 'home.description', 'menu.contact', 'menu.social'];

const jsFiles = await globbing('./**/*.js', {cwd: directory, ignore: './**/*test.js'});
const results = jsFiles.map(async file => {
  const filePath = path.join(directory, file);
  return readFile(filePath, 'utf8').then((data) => {
      // handle match here
  }).catch(console.log);
});

const cleanLocal = async () => {
  const localIdList = Object.keys(await getLocalMap()); // ids' array
  const matches = [];
  const directory = path.join(__dirname, '..');
  const jsFiles = await globbing('./**/*.js', {cwd: directory, ignore: './**/*test.js'}); // list of files to scan
  const results = jsFiles.map(async file => {
    const filePath = path.join(directory, file);
    return readFile(filePath, 'utf8').then((data) => {
      localIdList.map(id => {
        if (R.contains(id, data)) { // R = ramda.js
          matches.push(id);
        }
      });
    }).catch(console.log);
  });
  await Promise.all(results);
  console.log('matches: ' + R.uniq(matches).length);
  console.log('in local.json: ' + localIdList.length);
};

等等，你明白了

我使用node.js fs promisified readFile和glob搜索.js源代码文件，如下所示：

const ids = ['home.title', 'home.description', 'menu.contact', 'menu.social'];

const jsFiles = await globbing('./**/*.js', {cwd: directory, ignore: './**/*test.js'});
const results = jsFiles.map(async file => {
  const filePath = path.join(directory, file);
  return readFile(filePath, 'utf8').then((data) => {
      // handle match here
  }).catch(console.log);
});

const cleanLocal = async () => {
  const localIdList = Object.keys(await getLocalMap()); // ids' array
  const matches = [];
  const directory = path.join(__dirname, '..');
  const jsFiles = await globbing('./**/*.js', {cwd: directory, ignore: './**/*test.js'}); // list of files to scan
  const results = jsFiles.map(async file => {
    const filePath = path.join(directory, file);
    return readFile(filePath, 'utf8').then((data) => {
      localIdList.map(id => {
        if (R.contains(id, data)) { // R = ramda.js
          matches.push(id);
        }
      });
    }).catch(console.log);
  });
  await Promise.all(results);
  console.log('matches: ' + R.uniq(matches).length);
  console.log('in local.json: ' + localIdList.length);
};

我也有Ramda用于奇特的列表/集合函数，但没有其他库

因此，我将能够在ids数组中循环，并针对每个项扫描整个源代码，以便与上面的函数匹配。但是，用ids.length扫描整个源代码似乎有点过分了。ids阵列位于大约400个ids上，源代码是数百个大文件

为了避免O（M*N），是否有方法将整个数组与整个源代码匹配，并丢弃不匹配的数组项？或者这里的最佳实践是什么

当前解决方案：

const cleanLocal = async () => {
  const localIdList = Object.keys(await getLocalMap());
  const matches = [];
  localIdList.map(async id => {
    const directory = path.join(__dirname, '..');
    const jsFiles = await globbing('./**/*.js', {cwd: directory, ignore: './**/*test.js'});
    jsFiles.map(async file => {
      const filePath = path.join(directory, file);
      return readFile(filePath, 'utf8').then((data) => {
        if (data.indexOf(id) >= 0) {
          console.log(id);
          matches.push(id);
        }
      }).catch(console.log);
    });
  });
};

在这种情况下，您无法避免

O（M*N）

复杂性

但是，为了提高性能，可以切换操作顺序：首先在文件上循环，然后在数组上循环。这是因为在文件上循环是一个代价高昂的IO操作，而在阵列上循环是一个快速内存操作

在代码中，有

内存操作和

M*N

IO（文件系统）操作

如果您首先循环文件，您将有

IO操作和

M*N

内存操作。

因为无法避免O（M*N）在这种情况下，我只能通过在源文件中循环一次，然后在每个文件的ID上循环来优化这个搜索功能，正如@mihai建议的那样，这是一个优化机会

最终结果如下所示：

const ids = ['home.title', 'home.description', 'menu.contact', 'menu.social'];

const jsFiles = await globbing('./**/*.js', {cwd: directory, ignore: './**/*test.js'});
const results = jsFiles.map(async file => {
  const filePath = path.join(directory, file);
  return readFile(filePath, 'utf8').then((data) => {
      // handle match here
  }).catch(console.log);
});

const cleanLocal = async () => {
  const localIdList = Object.keys(await getLocalMap()); // ids' array
  const matches = [];
  const directory = path.join(__dirname, '..');
  const jsFiles = await globbing('./**/*.js', {cwd: directory, ignore: './**/*test.js'}); // list of files to scan
  const results = jsFiles.map(async file => {
    const filePath = path.join(directory, file);
    return readFile(filePath, 'utf8').then((data) => {
      localIdList.map(id => {
        if (R.contains(id, data)) { // R = ramda.js
          matches.push(id);
        }
      });
    }).catch(console.log);
  });
  await Promise.all(results);
  console.log('matches: ' + R.uniq(matches).length);
  console.log('in local.json: ' + localIdList.length);
};

请让我知道是否有其他方法可以优化此功能。

我不确定，但我认为有关最佳做法的问题不适合StackOverflow，甚至有密切的原因（基于主要观点）。您希望如何处理匹配的值？因为方法会相应地更改，所以我希望最终得到两个数组，一个具有匹配的ID，另一个具有不匹配的ID。您如何解析文件以匹配翻译键？你只是在直接搜索吗？因为这可能会产生假阳性，比如

home={}；home.title=“blah”

与中一样，它不是一个键，但与一个匹配。这是有效的，您可以尝试手动优化匹配。或者你只是在寻找翻译键的真正用法？在这种情况下，您可以尝试从所有文件中提取所有密钥，并与翻译文件中的密钥进行交集。另外，这似乎是一个您不会一直使用的工具。它要么是一次性的，要么是周期性的扫描。如果我在这一点上是正确的，我认为性能差也没关系——优化一次性任务并不值得。好吧，这似乎是合乎逻辑的。但是，（请纠正我，如果这是疯狂的）如果我只扫描一次源代码，并将整个源代码存储在一个字符串中会怎么样。然后IO操作将减少到一个，但是内存操作将更加繁重，因为匹配将占用非常大的字符串，甚至可能对于内存来说太大？这会是有益的，甚至是可能的吗？@RasmusPuls嗯，无论如何，您都会从磁盘加载每个文件。您也可以加载一个->进程->加载另一个->进程等，这样您就不会一次就占用大量内存。如果您扫描整个源代码，也不会有什么不同：您仍然需要遍历所有文件一次，然后执行M*N内存操作好的，谢谢分享您的知识。我已经发布了一个建议的答案，你同意我在你的答案中所写的代码吗？可能会有一些合理的清理，使外部的

映射减少，内部的过滤。外部循环可以简单地连接匹配项。这些可能不会显著改变性能，但它们可以使代码更易于阅读。