Algorithm 如何找到两个常用词最多的句子?

Algorithm 如何找到两个常用词最多的句子?,algorithm,Algorithm,给出一个句子列表,找出两个常用词最多的句子。 常用词在句子中不需要定位在同一位置(顺序无关紧要) if words[a] equal-to-ignore-case words[b] tempCount++ if tempCount > maxCount sentence1Index = i sentence2Index = j maxCount = tempCount 谢谢 if

给出一个句子列表,找出两个常用词最多的句子。 常用词在句子中不需要定位在同一位置(顺序无关紧要)

          if words[a] equal-to-ignore-case words[b]
              tempCount++
  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount
谢谢

          if words[a] equal-to-ignore-case words[b]
              tempCount++
  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount
更新:

          if words[a] equal-to-ignore-case words[b]
              tempCount++
  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount
这个问题是否存在非成对算法?因为成对是非常简单的

          if words[a] equal-to-ignore-case words[b]
              tempCount++
  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount

我的想法是使用倒排索引来存储这个单词出现的位置。这需要遍历每个句子中的每个单词。然后创建一个n*n 2D数组,用于计算两个句子在反向索引中出现在同一个桶中的次数

首先,你需要一种方法,用其中两个句子来确定它们有多少个共同的单词。这可以通过将给定的两个句子作为输入,并从中创建两个按字母顺序包含单词的数组来实现。然后,您可以检查这两个数组,向前推进字母顺序较早的数组(因此,如果当前匹配的是“abacus”和“book”,则将“abacus”移动到下一个单词)。如果有匹配项(“book”和“book”),则增加匹配单词的计数,并将两个数组移动到下一个单词。继续执行此操作,直到到达其中一个数组的末尾(因为另一个数组中的其余单词将没有任何匹配项)

          if words[a] equal-to-ignore-case words[b]
              tempCount++
  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount
一旦实现了此算法,您将需要一个如下所示的循环:

for (i = 0; i < sentenceCount - 1; i++) {
    for (j = i+1; j < sentenceCount; j++) {
    }
}
          if words[a] equal-to-ignore-case words[b]
              tempCount++
  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount
for(i=0;i

在循环中,您将调用函数,该函数使用索引
i
j
处的句子计算常用单词数。您将跟踪到目前为止看到的最常见的单词数量,以及找到这些单词的两个句子。如果一个新句子有更多的共同单词,你将存储该计数和产生该计数的两个句子。最后,你将得到你想要的两个句子。

假设你有一系列句子:

String[] sentences
          if words[a] equal-to-ignore-case words[b]
              tempCount++
  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount
创建一些包含默认值的变量,以跟踪包含最常用单词的两个句子

sentence1Index = -1
sentence2Index = -1
maxCount = -1
          if words[a] equal-to-ignore-case words[b]
              tempCount++
  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount
在句子数组上执行嵌套循环

for i : 0 -> sentences.length
    for j : 0 -> sentences.length
          if words[a] equal-to-ignore-case words[b]
              tempCount++
  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount
确保你没有检查同一个句子

  if i != j
          if words[a] equal-to-ignore-case words[b]
              tempCount++
  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount
将字符串按空格分开(假设将一些符号计算为单词,通常会给出每个单词)

          if words[a] equal-to-ignore-case words[b]
              tempCount++
  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount
为此运行创建临时计数值

  tempCount = 0
          if words[a] equal-to-ignore-case words[b]
              tempCount++
  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount
在两个单词数组之间循环(从正在比较的两个句子中获得)

          if words[a] equal-to-ignore-case words[b]
              tempCount++
  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount
如果单词相同,则递增温度计数

          if words[a] equal-to-ignore-case words[b]
              tempCount++
  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount
比较完单词后,如果tempCount大于当前的maxCount,则更新跟踪您要查找的所有值

          if words[a] equal-to-ignore-case words[b]
              tempCount++
  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount
返回新创建的包含两个句子的数组

          if words[a] equal-to-ignore-case words[b]
              tempCount++
  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount
if sentence1Index != -1 and sentence2Index  != -1
    String[] retArray =   sentences[sentence1Index], sentences[sentence2Index ]
    return retArray

return null
所有伪代码:

          if words[a] equal-to-ignore-case words[b]
              tempCount++
  if tempCount > maxCount
      sentence1Index = i
      sentence2Index = j
      maxCount = tempCount
String[] sentences
sentence1Index = -1
sentence2Index = -1
maxCount = -1

for i : 0 -> sentences.length
    for j : 0 -> sentences.length
      if i != j
          String[] words1 = sentences[i].splitAt(" ")
          String[] words2 = sentences[j].splitAt(" ")
          tempCount = 0
          for a : 0 -> words1 .length
              for b : 0 -> words2.length
                  if words[a] equal-to-ignore-case words[b]
                      tempCount++
          if tempCount > maxCount
              sentence1Index = i
              sentence2Index = j
              maxCount = tempCount

if sentence1Index != -1 and sentence2Index  != -1
    String[] retArray =   sentences[sentence1Index], sentences[sentence2Index ]
    return retArray

return null

如果你展示@axiom,你可能会得到更好的答案,我补充了我的想法。因为我觉得效率不够,所以我一开始没有说。你的方法是蛮力。我所寻求的是一种更有效的方法。@city你在这篇文章之前没有提到你尝试了什么……也没有说明你做了什么,对此我很抱歉。我有点懒:)