Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/solr/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
带/不带连字符的Solr搜索_Solr - Fatal编程技术网

带/不带连字符的Solr搜索

带/不带连字符的Solr搜索,solr,Solr,我遇到了一个问题,试图获得相关的搜索结果与字有和没有连字符。我创建了两个文档,一个带有“wifi”,另一个在“文本”字段中带有“wi-fi” 搜索“wifi”时,两个文档都会出现在搜索结果中,这很好。搜索“wi-fi”时,搜索结果中仅显示带有“wi-fi”的文档 以下是我的配置: <field name="text" type="text" indexed="true" stored="true" omitNorms="true" /> <fieldType name="te

我遇到了一个问题,试图获得相关的搜索结果与字有和没有连字符。我创建了两个文档,一个带有“wifi”,另一个在“文本”字段中带有“wi-fi”

搜索“wifi”时,两个文档都会出现在搜索结果中,这很好。搜索“wi-fi”时,搜索结果中仅显示带有“wi-fi”的文档

以下是我的配置:

<field name="text" type="text" indexed="true" stored="true" omitNorms="true" />

<fieldType name="text" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
  <analyzer type="index">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <charFilter class="solr.HTMLStripCharFilterFactory"/>
      <filter class="solr.ASCIIFoldingFilterFactory" />
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
      <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.PorterStemFilterFactory"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <charFilter class="solr.HTMLStripCharFilterFactory"/>
      <filter class="solr.ASCIIFoldingFilterFactory" />
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
      <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.PorterStemFilterFactory"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>

搜索“wi-fi”时调试查询。我不明白为什么它找不到这两个文档:

<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">1</int>
  <lst name="params">
    <str name="debugQuery">true</str>
    <str name="indent">true</str>
    <str name="q">wi-fi</str>
    <str name="wt">xml</str>
  </lst>
</lst>
<result name="response" numFound="1" start="0">
  <doc>
    <int name="id">1869</int>
    <str name="route">@sujet_simple?sujet_id=1869&amp;slug=wi-fi</str>
    <str name="name">Wi-fi</str>
    <str name="text">&lt;p&gt;
    Wi-fi&lt;/p&gt;
</str>
    <long name="_version_">1493472450933948416</long></doc>
</result>
<lst name="debug">
  <str name="rawquerystring">wi-fi</str>
  <str name="querystring">wi-fi</str>
  <str name="parsedquery">MultiPhraseQuery(text:"(wi-fi wi) (fi wifi)")</str>
  <str name="parsedquery_toString">text:"(wi-fi wi) (fi wifi)"</str>
  <lst name="explain">
    <str name="1869">
30.33298 = (MATCH) weight(text:"(wi-fi wi) (fi wifi)" in 0) [DefaultSimilarity], result of:
  30.33298 = score(doc=0,freq=1.0 = phraseFreq=1.0
), product of:
    0.99999994 = queryWeight, product of:
      30.332981 = idf(), sum of:
        7.684612 = idf(docFreq=1, maxDocs=1600)
        7.684612 = idf(docFreq=1, maxDocs=1600)
        7.684612 = idf(docFreq=1, maxDocs=1600)
        7.2791467 = idf(docFreq=2, maxDocs=1600)
      0.032967415 = queryNorm
    30.332981 = fieldWeight in 0, product of:
      1.0 = tf(freq=1.0), with freq of:
        1.0 = phraseFreq=1.0
      30.332981 = idf(), sum of:
        7.684612 = idf(docFreq=1, maxDocs=1600)
        7.684612 = idf(docFreq=1, maxDocs=1600)
        7.684612 = idf(docFreq=1, maxDocs=1600)
        7.2791467 = idf(docFreq=2, maxDocs=1600)
      1.0 = fieldNorm(doc=0)
</str>
  </lst>
  <str name="QParser">LuceneQParser</str>
  <lst name="timing">
    <double name="time">1.0</double>
    <lst name="prepare">
      <double name="time">0.0</double>
      <lst name="query">
        <double name="time">0.0</double>
      </lst>
      <lst name="facet">
        <double name="time">0.0</double>
      </lst>
      <lst name="mlt">
        <double name="time">0.0</double>
      </lst>
      <lst name="highlight">
        <double name="time">0.0</double>
      </lst>
      <lst name="stats">
        <double name="time">0.0</double>
      </lst>
      <lst name="debug">
        <double name="time">0.0</double>
      </lst>
    </lst>
    <lst name="process">
      <double name="time">1.0</double>
      <lst name="query">
        <double name="time">0.0</double>
      </lst>
      <lst name="facet">
        <double name="time">0.0</double>
      </lst>
      <lst name="mlt">
        <double name="time">0.0</double>
      </lst>
      <lst name="highlight">
        <double name="time">0.0</double>
      </lst>
      <lst name="stats">
        <double name="time">0.0</double>
      </lst>
      <lst name="debug">
        <double name="time">1.0</double>
      </lst>
    </lst>
  </lst>
</lst>
</response>

0
1.
真的
真的
无线局域网
xml
1869
@sujet_simple?sujet_id=1869&;slug=wi-fi
无线局域网
P
Wi-fi/p
1493472450933948416
无线局域网
无线局域网
多短语(文本:“(wi-fi)(wi-fi)”)
文字:“(wi-fi)(wi-fi)”
30.33298=(匹配)权重(文本:“(wi-fi)(wi-fi)”在0中)[默认相似性],结果:
30.33298=分数(doc=0,freq=1.0=短语频率=1.0
),产品为:
0.9999994=查询重量,产品:
30.332981=idf(),总和:
7.684612=idf(docFreq=1,maxDocs=1600)
7.684612=idf(docFreq=1,maxDocs=1600)
7.684612=idf(docFreq=1,maxDocs=1600)
7.2791467=idf(docFreq=2,maxDocs=1600)
0.032967415=queryNorm
30.332981=田间重量(单位:0),以下各项的乘积:
1.0=tf(频率=1.0),频率为:
1.0=短语频率=1.0
30.332981=idf(),总和:
7.684612=idf(docFreq=1,maxDocs=1600)
7.684612=idf(docFreq=1,maxDocs=1600)
7.684612=idf(docFreq=1,maxDocs=1600)
7.2791467=idf(docFreq=2,maxDocs=1600)
1.0=现场规范(doc=0)
LuceneQParser
1
0
0
0
0
0
0
0
1
0
0
0
0
0
1

感谢您的帮助。

您需要调整模式的分析端。debugQuery=true和Solr分析工具是查找此类错误的朋友

根据您的配置,搜索wifi会生成以下查询:

wifi
"parsedquery_toString": "text:wifi",
至于wi-fi

wi-fi
"parsedquery_toString": "text:\"(wi-fi wi) (fi wifi)\"",
配置的分析端生成的wi-fi术语不匹配

如果我们在分析端更改过滤器以不生成单词部分:

  <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />
至于wi-fi:

"parsedquery_toString": "text:wi-fi text:wifi"
从分析工具中匹配wi-fi和wifi的索引术语

wi-fi, wi, fi, wifi
wifi

注意:文本是本例中的默认字段

否,尚未找到解决方案。您可能对我最近发布的一个类似问题感兴趣:
wi-fi, wi, fi, wifi
wifi