Algorithm 如何找到两个常用词最多的句子？_Algorithm

Algorithm 如何找到两个常用词最多的句子？

algorithm

Algorithm 如何找到两个常用词最多的句子？,algorithm,Algorithm,给出一个句子列表，找出两个常用词最多的句子。常用词在句子中不需要定位在同一位置（顺序无关紧要） if words[a] equal-to-ignore-case words[b] tempCount++ if tempCount > maxCount sentence1Index = i sentence2Index = j maxCount = tempCount 谢谢 if

给出一个句子列表，找出两个常用词最多的句子。常用词在句子中不需要定位在同一位置（顺序无关紧要）

          if words[a] equal-to-ignore-case words[b]
              tempCount++

  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount

谢谢

          if words[a] equal-to-ignore-case words[b]
              tempCount++

  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount

更新：

          if words[a] equal-to-ignore-case words[b]
              tempCount++

  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount

这个问题是否存在非成对算法？因为成对是非常简单的

          if words[a] equal-to-ignore-case words[b]
              tempCount++

  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount

我的想法是使用倒排索引来存储这个单词出现的位置。这需要遍历每个句子中的每个单词。然后创建一个n*n 2D数组，用于计算两个句子在反向索引中出现在同一个桶中的次数

首先，你需要一种方法，用其中两个句子来确定它们有多少个共同的单词。这可以通过将给定的两个句子作为输入，并从中创建两个按字母顺序包含单词的数组来实现。然后，您可以检查这两个数组，向前推进字母顺序较早的数组（因此，如果当前匹配的是“abacus”和“book”，则将“abacus”移动到下一个单词）。如果有匹配项（“book”和“book”），则增加匹配单词的计数，并将两个数组移动到下一个单词。继续执行此操作，直到到达其中一个数组的末尾（因为另一个数组中的其余单词将没有任何匹配项）

          if words[a] equal-to-ignore-case words[b]
              tempCount++

  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount

一旦实现了此算法，您将需要一个如下所示的循环：

for (i = 0; i < sentenceCount - 1; i++) {
    for (j = i+1; j < sentenceCount; j++) {
    }
}

          if words[a] equal-to-ignore-case words[b]
              tempCount++

  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount

for（i=0；i


在循环中，您将调用函数，该函数使用索引i
和j
处的句子计算常用单词数。您将跟踪到目前为止看到的最常见的单词数量，以及找到这些单词的两个句子。如果一个新句子有更多的共同单词，你将存储该计数和产生该计数的两个句子。最后，你将得到你想要的两个句子。
假设你有一系列句子：
String[] sentences

          if words[a] equal-to-ignore-case words[b]
              tempCount++

  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount

创建一些包含默认值的变量，以跟踪包含最常用单词的两个句子
sentence1Index = -1
sentence2Index = -1
maxCount = -1

          if words[a] equal-to-ignore-case words[b]
              tempCount++

  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount

在句子数组上执行嵌套循环
for i : 0 -> sentences.length
    for j : 0 -> sentences.length

          if words[a] equal-to-ignore-case words[b]
              tempCount++

  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount

确保你没有检查同一个句子
  if i != j

          if words[a] equal-to-ignore-case words[b]
              tempCount++

  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount

将字符串按空格分开（假设将一些符号计算为单词，通常会给出每个单词）
          if words[a] equal-to-ignore-case words[b]
              tempCount++

  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount

为此运行创建临时计数值
  tempCount = 0

          if words[a] equal-to-ignore-case words[b]
              tempCount++

  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount

在两个单词数组之间循环（从正在比较的两个句子中获得）
          if words[a] equal-to-ignore-case words[b]
              tempCount++

  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount

如果单词相同，则递增温度计数
          if words[a] equal-to-ignore-case words[b]
              tempCount++

  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount

比较完单词后，如果tempCount大于当前的maxCount，则更新跟踪您要查找的所有值
          if words[a] equal-to-ignore-case words[b]
              tempCount++

  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount

返回新创建的包含两个句子的数组
          if words[a] equal-to-ignore-case words[b]
              tempCount++

  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount

if sentence1Index != -1 and sentence2Index  != -1
    String[] retArray =   sentences[sentence1Index], sentences[sentence2Index ]
    return retArray

return null

所有伪代码：
          if words[a] equal-to-ignore-case words[b]
              tempCount++

  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount

String[] sentences
sentence1Index = -1
sentence2Index = -1
maxCount = -1

for i : 0 -> sentences.length
    for j : 0 -> sentences.length
      if i != j
          String[] words1 = sentences[i].splitAt(" ")
          String[] words2 = sentences[j].splitAt(" ")
          tempCount = 0
          for a : 0 -> words1 .length
              for b : 0 -> words2.length
                  if words[a] equal-to-ignore-case words[b]
                      tempCount++
          if tempCount > maxCount
              sentence1Index = i
              sentence2Index = j
              maxCount = tempCount

if sentence1Index != -1 and sentence2Index  != -1
    String[] retArray =   sentences[sentence1Index], sentences[sentence2Index ]
    return retArray

return null

如果你展示@axiom，你可能会得到更好的答案，我补充了我的想法。因为我觉得效率不够，所以我一开始没有说。你的方法是蛮力。我所寻求的是一种更有效的方法。@city你在这篇文章之前没有提到你尝试了什么……也没有说明你做了什么，对此我很抱歉。我有点懒：）