挑战！复杂MySQL查询_Mysql_Sql

挑战！复杂MySQL查询

mysql sql

挑战！复杂MySQL查询,mysql,sql,Mysql,Sql,我们正在编写一个小型搜索引擎。数据库表： Documents (DocumentID, Title, Abstract, Author, ...) InvertedIndex (DocumentID, Word, Count) Stopwords (Word) 其中，InvertedIndex为每个文档中的每个单词都有一个条目，以及它出现的次数。Stopwords只是一个我不在乎的单词列表。使用以或分隔的术语列表查询引擎。例如：第1条第2条条款1或条款2 第1条第2条或第3条 ……等等

我们正在编写一个小型搜索引擎。数据库表：

Documents (DocumentID, Title, Abstract, Author, ...)
InvertedIndex (DocumentID, Word, Count)
Stopwords (Word)

其中，InvertedIndex为每个文档中的每个单词都有一个条目，以及它出现的次数。Stopwords只是一个我不在乎的单词列表。使用以或分隔的术语列表查询引擎。例如：

第1条第2条
条款1或条款2
第1条第2条或第3条

……等等。基于相关性的搜索结果，使用布尔扩展模型为每个文档计算。和ed项（所有未进行或运算的项）相乘，并求和或。例如，考虑查询term1 term2或term3，如果这些术语分别出现在文档中3次、4次和5次，文档相关性将为（3*4）+5=12。另外，忽略Stopwords中存在的术语

好了，现在。。。我的教授告诉我们，计算所有文档的相关性可以在一个查询中完成。这就是我需要帮助的地方

我已经为示例查询term1 term2或term3准备了一些伪代码。这就是我计算每个文档相关性的方法，但是我想执行一个MySQL查询。我将此作为相关性公式的澄清

foreach document
    relevance = 0
    foreach term_set // where (term1 term2) would be a term_set and (term3) would be the other
        product = 1
        foreach term
            if term not in stopwords
                SELECT Count FROM InvertedIndex WHERE Word=term AND DocumentID=document
                product *= Count
        relevance += product

（EXP（SUM（LOG（COALESCE（Column，1）））显然是一种执行的方式

任何帮助都将不胜感激。如果这是一件烦人的事情，很抱歉。现在2点，我可能没有很好地解释这一点。

如果我理解您的问题，这可能会帮助您开始（但您必须检查语法，因为我的MySQL已经生锈）：

此查询将为您提供DocumentId列表、“搜索”术语以及包含搜索术语的每个文档的计数。您可以将此作为在DocumentId上聚合的起点，使用Group By DocumentId，然后计算聚合乘法函数（我很乐意留给您）

我还没有使用MySQL，不知道如何排除Stopwords表中的单词（您可以在SQL Server中使用除外），但类似的方法可能会奏效：

Select DocumentId, Word, Count
From Documents
Inner Join InvertedIndex On Documents.DocumentID = InvertedIndex.DocumentID
Where Word In (term1, term2, term3)
And Where Not Exists (
    Select DocumentId, Word, Count
    From Documents
    Inner Join InvertedIndex On Documents.DocumentID = InvertedIndex.DocumentID
    Inner Join Stopwords On InvertedIndex.Word = Stopwords.Word
    Where Word In (term1, term2, term3)
)

祝你的作业顺利。让我们知道结果如何！

如果你的问题不太开放，用“正确”的答案回答会更容易。你可以描述你正在做什么，但不能描述你被困在哪里。

Select DocumentId, Word, Count
From Documents
Inner Join InvertedIndex On Documents.DocumentID = InvertedIndex.DocumentID
Where Word In (term1, term2, term3)
And Where Not Exists (
    Select DocumentId, Word, Count
    From Documents
    Inner Join InvertedIndex On Documents.DocumentID = InvertedIndex.DocumentID
    Inner Join Stopwords On InvertedIndex.Word = Stopwords.Word
    Where Word In (term1, term2, term3)
)