Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/solr/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Search SOLR eDismax短语的打字容差_Search_Solr_Fuzzy_Edismax - Fatal编程技术网

Search SOLR eDismax短语的打字容差

Search SOLR eDismax短语的打字容差,search,solr,fuzzy,edismax,Search,Solr,Fuzzy,Edismax,怎样才能建立一个查询来搜索准确的短语以及一些拼写错误的短语?我被困在这上面,看起来我走错了方向 例如,我的edismax查询中有下一个字段: q=apple iphone 这是可行的,但现在我需要让它更能容忍打字错误。我更新了我的查询,现在它返回与以前相同的结果,即使用户键入错误: q=aple~2 iphane~2 接下来,我发现现在确切的查询匹配并不总是出现在第一页上(例如,我真的有产品“aple iphane”)。因此,我使用'OR'条件添加精确的查询。现在我的问题看起来像 q=(ap

怎样才能建立一个查询来搜索准确的短语以及一些拼写错误的短语?我被困在这上面,看起来我走错了方向

例如,我的edismax查询中有下一个字段:

q=apple iphone
这是可行的,但现在我需要让它更能容忍打字错误。我更新了我的查询,现在它返回与以前相同的结果,即使用户键入错误:

q=aple~2 iphane~2
接下来,我发现现在确切的查询匹配并不总是出现在第一页上(例如,我真的有产品“aple iphane”)。因此,我使用'OR'条件添加精确的查询。现在我的问题看起来像

q=(aple~2 iphane~2) OR 'aple iphane'^3
问题是,它现在只返回精确匹配,而不返回模糊条目enymore。我做错了什么

以下是完整的查询:

http://localhost:8983/solr/test/select?omitHeader=true
&q=(aple~2 iphane~2) OR 'aple iphane'^3
&start=0
&rows=30
&fl=*,score
&fq=itemType:"Product"
&defType=edismax
&qf=title_de^1000 title_de_ranked^1000 description_de^1 category_name_de^50 brand^15 merchant_name^80 uniuque_values^10000 searchable_attribute_product.name^1000 searchable_attribute_product.description.short^100 searchable_attribute_product.description.long^100 searchable_attribute_mb.book.author^500
&mm=90
&pf=title_de^2000 description_de^2
&ps=1
&qs=2
&boost=category_boost
&mm.autoRelax=true
&wt=json
&json.nl=flat
我的查询是否有错误,或者我选择的方式完全错误

我想在“title_de”中找到这个短语,所有其他字段都是次要的。这里是我的模式中的字段类型:

<fieldType name="text_de_ngram" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.GermanNormalizationFilterFactory"/>
        <filter class="solr.GermanLightStemFilterFactory"/>
        <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="25"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.GermanNormalizationFilterFactory"/>
        <filter class="solr.GermanLightStemFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="German" />
    </analyzer>
</fieldType>
所以,这个查询返回了正确的产品(我有6个产品在title_de字段中有这两个词)。 在我给这两个词加上拼写错误后:

"q":"somsung majic"
"q":"somsung~2 majic~2"
没有发现任何产品

然后我在这两个词中添加模糊运算符:

"q":"somsung majic"
"q":"somsung~2 majic~2"
发现6种产品。以下是debugQuery结果:

"debug":{
      "rawquerystring":"somsung~2 majic~2",
      "querystring":"somsung~2 majic~2",
      "parsedquery":"(+(DisjunctionMaxQuery((title_de:somsung~2)) DisjunctionMaxQuery((title_de:majic~2)))~2 DisjunctionMaxQuery((title_de:\"somsung 2 majic 2\")))/no_coord",
      "parsedquery_toString":"+(((title_de:somsung~2) (title_de:majic~2))~2) (title_de:\"somsung 2 majic 2\")",
      "explain":{
            "69019":"\n1.3424492 = sum of:\n  1.3424492 = sum of:\n    1.1036766 = sum of:\n      0.26367697 = weight(title_de:amsung in 305456) [ClassicSimilarity], result of:\n        0.26367697 = score(doc=305456,freq=1.0), product of:\n          0.073149204 = queryWeight, product of:\n            0.6666666 = boost\n            7.209292 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:\n              635.0 = docFreq\n              316313.0 = docCount\n            0.015219777 = queryNorm\n          3.604646 = fieldWeight in 305456, product of:\n            1.0 = tf(freq=1.0), with freq of:\n              1.0 = termFreq=1.0\n            7.209292 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:\n              635.0 = docFreq\n              316313.0 = docCount\n            0.5 = fieldNorm(doc=305456)\n      0.2373093 = weight(title_de:msung in 305456) [ClassicSimilarity], result of:\n        0.2373093 = score(doc=305456,freq=1.0), product of:\n          0.06583429 = queryWeight, product of:\n            0.6 = boost\n            7.209292 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:\n              635.0 = docFreq\n              316313.0 = docCount\n            0.015219777 = queryNorm\n          3.604646 = fieldWeight in 305456, product of:\n            1.0 = tf(freq=1.0), with freq of:\n              1.0 = termFreq=1.0\n            7.209292 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:\n              635.0 = docFreq\n              316313.0 = docCount\n            0.5 = fieldNorm(doc=305456)\n      0.26367697 = weight(title_de:samsun in 305456) [ClassicSimilarity], result of:\n        0.26367697 = score(doc=305456,freq=1.0), product of:\n          0.073149204 = queryWeight, product of:\n            0.6666666 = boost\n            7.209292 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:\n              635.0 = docFreq\n              316313.0 = docCount\n            0.015219777 = queryNorm\n          3.604646 = fieldWeight in 305456, product of:\n            1.0 = tf(freq=1.0), with freq of:\n              1.0 = termFreq=1.0\n            7.209292 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:\n              635.0 = docFreq\n              316313.0 = docCount\n            0.5 = fieldNorm(doc=305456)\n      0.33901328 = weight(title_de:samsung in 305456) [ClassicSimilarity], result of:\n        0.33901328 = score(doc=305456,freq=1.0), product of:\n          0.094048984 = queryWeight, product of:\n            0.85714287 = boost\n            7.209292 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:\n              635.0 = docFreq\n              316313.0 = docCount\n            0.015219777 = queryNorm\n          3.604646 = fieldWeight in 305456, product of:\n            1.0 = tf(freq=1.0), with freq of:\n              1.0 = termFreq=1.0\n            7.209292 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:\n              635.0 = docFreq\n              316313.0 = docCount\n            0.5 = fieldNorm(doc=305456)\n    0.23877257 = sum of:\n      0.23877257 = weight(title_de:magic in 305456) [ClassicSimilarity], result of:\n        0.23877257 = score(doc=305456,freq=1.0), product of:\n          0.0762529 = queryWeight, product of:\n            0.8 = boost\n            6.262649 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:\n              1638.0 = docFreq\n              316313.0 = docCount\n            0.015219777 = queryNorm\n          3.1313245 = fieldWeight in 305456, product of:\n            1.0 = tf(freq=1.0), with freq of:\n              1.0 = termFreq=1.0\n            6.262649 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:\n              1638.0 = docFreq\n              316313.0 = docCount\n            0.5 = fieldNorm(doc=305456)\n",
      },
      "QParser":"ExtendedDismaxQParser"
}
这种行为让我满意,直到我没有真正的产品名为“Somsung majic”。这是理论上的情况,但在实践中,有很多其他错误的搜索结果所造成的这种模糊算子

所以,为了处理这些事情,我的想法就像我最初描述的那样,添加带有增强因子的精确条目(没有模糊修饰符)。因此,现在的问题是,如何更好地实施。我发现,如果我减少mm参数,这个查询的工作原理是可以接受的:

"q":"somsung~2 majic~2 somsung^3 majic^3"
这是因为我在查询中添加了更多的单词,所以“最小匹配”也需要减少。问题是,“mm”越小,我在标题输入准确的长标题上得到的结果就越差(由于其他因素,一些错误的项目可能排名更高)。这是它的调试:

"debug":{
      "rawquerystring":"somsung~2 majic~2 somsung^3 majic^3",
      "querystring":"somsung~2 majic~2 somsung^3 majic^3",
      "parsedquery":"(+(DisjunctionMaxQuery((title_de:somsung~2)) DisjunctionMaxQuery((title_de:majic~2)) DisjunctionMaxQuery((title_de:somsung))^3.0 DisjunctionMaxQuery((title_de:majic))^3.0)~2 DisjunctionMaxQuery((title_de:\"somsung 2 majic 2 somsung 3 majic 3\")))/no_coord",
      "parsedquery_toString":"+(((title_de:somsung~2) (title_de:majic~2) ((title_de:somsung))^3.0 ((title_de:majic))^3.0)~2) (title_de:\"somsung 2 majic 2 somsung 3 majic 3\")",
      "explain":{
            "69019":"\n0.3418829 = sum of:\n  0.3418829 = product of:\n    0.6837658 = sum of:\n      0.5621489 = sum of:\n        0.13430178 = weight(title_de:amsung in 305456) [ClassicSimilarity], result of:\n          0.13430178 = score(doc=305456,freq=1.0), product of:\n            0.037257966 = queryWeight, product of:\n              0.6666666 = boost\n              7.209292 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:\n                635.0 = docFreq\n                316313.0 = docCount\n              0.0077520725 = queryNorm\n            3.604646 = fieldWeight in 305456, product of:\n              1.0 = tf(freq=1.0), with freq of:\n                1.0 = termFreq=1.0\n              7.209292 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:\n                635.0 = docFreq\n                316313.0 = docCount\n              0.5 = fieldNorm(doc=305456)\n        0.12087161 = weight(title_de:msung in 305456) [ClassicSimilarity], result of:\n          0.12087161 = score(doc=305456,freq=1.0), product of:\n            0.033532172 = queryWeight, product of:\n              0.6 = boost\n              7.209292 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:\n                635.0 = docFreq\n                316313.0 = docCount\n              0.0077520725 = queryNorm\n            3.604646 = fieldWeight in 305456, product of:\n              1.0 = tf(freq=1.0), with freq of:\n                1.0 = termFreq=1.0\n              7.209292 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:\n                635.0 = docFreq\n                316313.0 = docCount\n              0.5 = fieldNorm(doc=305456)\n        0.13430178 = weight(title_de:samsun in 305456) [ClassicSimilarity], result of:\n          0.13430178 = score(doc=305456,freq=1.0), product of:\n            0.037257966 = queryWeight, product of:\n              0.6666666 = boost\n              7.209292 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:\n                635.0 = docFreq\n                316313.0 = docCount\n              0.0077520725 = queryNorm\n            3.604646 = fieldWeight in 305456, product of:\n              1.0 = tf(freq=1.0), with freq of:\n                1.0 = termFreq=1.0\n              7.209292 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:\n                635.0 = docFreq\n                316313.0 = docCount\n              0.5 = fieldNorm(doc=305456)\n        0.17267373 = weight(title_de:samsung in 305456) [ClassicSimilarity], result of:\n          0.17267373 = score(doc=305456,freq=1.0), product of:\n            0.047903106 = queryWeight, product of:\n              0.85714287 = boost\n              7.209292 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:\n                635.0 = docFreq\n                316313.0 = docCount\n              0.0077520725 = queryNorm\n            3.604646 = fieldWeight in 305456, product of:\n              1.0 = tf(freq=1.0), with freq of:\n                1.0 = termFreq=1.0\n              7.209292 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:\n                635.0 = docFreq\n                316313.0 = docCount\n              0.5 = fieldNorm(doc=305456)\n      0.12161691 = sum of:\n        0.12161691 = weight(title_de:magic in 305456) [ClassicSimilarity], result of:\n          0.12161691 = score(doc=305456,freq=1.0), product of:\n            0.038838807 = queryWeight, product of:\n              0.8 = boost\n              6.262649 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:\n                1638.0 = docFreq\n                316313.0 = docCount\n              0.0077520725 = queryNorm\n            3.1313245 = fieldWeight in 305456, product of:\n              1.0 = tf(freq=1.0), with freq of:\n                1.0 = termFreq=1.0\n              6.262649 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:\n                1638.0 = docFreq\n                316313.0 = docCount\n              0.5 = fieldNorm(doc=305456)\n    0.5 = coord(2/4)\n"
      },
      "QParser":"ExtendedDismaxQParser"
}
此查询即使在使用较大的“mm”参数(例如,90%)时也有效:

但这里的问题是我得到了430个结果(而不是期望的6个)。 以下是错误产品的调试示例:

"debug":{
      "rawquerystring":"(somsung~2 majic~2) OR (somsung^3 majic^3)",
      "querystring":"(somsung~2 majic~2) OR (somsung^3 majic^3)",
      "parsedquery":"(+((DisjunctionMaxQuery((title_de:somsung~2)) DisjunctionMaxQuery((title_de:majic~2))) (DisjunctionMaxQuery((title_de:somsung))^3.0 DisjunctionMaxQuery((title_de:majic))^3.0))~1 DisjunctionMaxQuery((title_de:\"somsung 2 majic 2 somsung 3 majic 3\")))/no_coord",
      "parsedquery_toString":"+((((title_de:somsung~2) (title_de:majic~2)) (((title_de:somsung))^3.0 ((title_de:majic))^3.0))~1) (title_de:\"somsung 2 majic 2 somsung 3 majic 3\")",
      "explain":{
            "113746":"\n0.1275867 = sum of:\n  0.1275867 = product of:\n    0.2551734 = sum of:\n      0.2551734 = product of:\n        0.5103468 = sum of:\n          0.5103468 = sum of:\n            0.26860356 = weight(title_de:losung in 296822) [ClassicSimilarity], result of:\n              0.26860356 = score(doc=296822,freq=1.0), product of:\n                0.037257966 = queryWeight, product of:\n                  0.6666666 = boost\n                  7.209292 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:\n                    635.0 = docFreq\n                    316313.0 = docCount\n                  0.0077520725 = queryNorm\n                7.209292 = fieldWeight in 296822, product of:\n                  1.0 = tf(freq=1.0), with freq of:\n                    1.0 = termFreq=1.0\n                  7.209292 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:\n                    635.0 = docFreq\n                    316313.0 = docCount\n                  1.0 = fieldNorm(doc=296822)\n            0.24174322 = weight(title_de:osung in 296822) [ClassicSimilarity], result of:\n              0.24174322 = score(doc=296822,freq=1.0), product of:\n                0.033532172 = queryWeight, product of:\n                  0.6 = boost\n                  7.209292 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:\n                    635.0 = docFreq\n                    316313.0 = docCount\n                  0.0077520725 = queryNorm\n                7.209292 = fieldWeight in 296822, product of:\n                  1.0 = tf(freq=1.0), with freq of:\n                    1.0 = termFreq=1.0\n                  7.209292 = idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:\n                    635.0 = docFreq\n                    316313.0 = docCount\n                  1.0 = fieldNorm(doc=296822)\n        0.5 = coord(1/2)\n    0.5 = coord(1/2)\n"
      },
      "QParser":"ExtendedDismaxQParser"
}

因此,虽然我得到了更好的结果,但我仍然需要改进搜索,我仍然不知道选择哪种方式以及为什么会得到这样的结果

我认为edismax不支持模糊运算符~。开发人员在生产中使用了一个历史悠久的补丁,但它还没有进入Solr代码库。

我认为eDiscoveryMax不支持模糊运算符~。开发人员在生产中使用了一个历史悠久的补丁程序,但它还没有进入Solr代码库。

eDiscoveryMax与fuzzy一起工作,但是当您包含mm=90时,您基本上是说Solr应该匹配90个精确的短语。这是一个很高的价格


删除该值或使用较低的百分比(如50%)将允许一些模糊性工作

edismax与模糊性一起工作。但是,当包含mm=90时,基本上是说solr应该匹配90个精确的短语。这是一个很高的价格


删除该值或使用50%这样的低百分比将允许一些模糊性工作

Ok,如果eDiscoveryMax不支持模糊性,为什么在我的查询中“q=aple~2 iphane~2”会找到正确的结果?中是否解释为标准查询?使用degugQuery=true,让我们了解一下,我用新的实验和调试结果更新了描述。看起来edismax真的能够在开箱即用的情况下处理模糊查询。现在我比以前更困惑了:)好吧,如果eDiscoveryMax不支持fuzzy,为什么在我的查询中“q=aple~2 iphane~2”会找到正确的结果?中是否解释为标准查询?使用degugQuery=true,让我们了解一下,我用新的实验和调试结果更新了描述。看起来edismax真的能够在开箱即用的情况下处理模糊查询。现在我比以前更困惑:)