Performance 优化SOLR荧光灯_Performance_Optimization_Solr_Highlight_Highlighting

Performance 优化SOLR荧光灯

performance optimization solr

Performance 优化SOLR荧光灯,performance,optimization,solr,highlight,highlighting,Performance,Optimization,Solr,Highlight,Highlighting,我试图在我的SOLR实例中优化高亮显示，因为这似乎会将查询速度降低2个数量级。我有一个标记化字段索引，并按照以下定义存储： <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <charFilter class="solr.PatternReplaceCharFilterFactory" patt

我试图在我的SOLR实例中优化高亮显示，因为这似乎会将查询速度降低2个数量级。我有一个标记化字段索引，并按照以下定义存储：

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\+" replacement="%2B"/>
    <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrements="true" />
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\+" replacement="%2B"/>
    <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrements="true" />
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

还生成了术语向量等：

<field name="Events" type="text_general" multiValued="true" stored="true" indexed="true" termVectors="true" termPositions="true"  termOffsets="true"/>

对于突出显示组件，我使用默认的SOLR配置。我尝试的查询使用FastVectorHighlighter，但仍然需要约1500ms，这对于约1000个文档来说是非常长的，每个文档的字段中存储了10-20个值。以下是查询：

q=Events:http\://mydomain.com/resource/term/906&fq=(Document_Code:[*+TO+*])&hl.requireFieldMatch=true&facet=true&hl.simple.pre=<b>&hl.fl=*&hl=true&rows=10&version=2&fl=uri,Document_Type,Document_Title,Modification_Date,Study&hl.snippets=1&hl.useFastVectorHighlighter=true

q=Events:http\：//mydomain.com/resource/term/906&fq=（文档代码：[*+TO+*]）&hl.requirefeldmatch=true&facet=true&hl.simple.pre=&hl.fl=*&hl=true&rows=10&version=2&fl=uri、文档类型、文档标题、修改日期、研究&hl.snippets=1&hl.useFastVectorHighlighter=true

我感到奇怪的是，在solr admin stats中，一个查询生成9146个对HtmlFormatter和GapFragmenter的请求。有没有想过为什么会发生这种情况，以及如何提高荧光笔的性能？

问题似乎是由“hl.fl=*”引起的，它导致DefaultSolrHighlighter为找到的每个文档（在我的索引中）迭代相对大量的字段（最多10个）。这会导致额外的O（n^2）时间。以下是相关的代码片段：

for (int i = 0; i < docs.size(); i++) {
  int docId = iterator.nextDoc();
  Document doc = searcher.doc(docId, fset);
  NamedList docSummaries = new SimpleOrderedMap();
  for (String fieldName : fieldNames) {
    fieldName = fieldName.trim();
    if( useFastVectorHighlighter( params, schema, fieldName ) )
      doHighlightingByFastVectorHighlighter( fvh, fieldQuery, req, docSummaries, docId, doc, fieldName );
    else
      doHighlightingByHighlighter( query, req, docSummaries, docId, doc, fieldName );
  }
  String printId = schema.printableUniqueKey(doc);
  fragments.add(printId == null ? null : printId, docSummaries);
}

for（int i=0；i


减少字段数量将大大改善行为。但是，在我的情况下，我无法将其减少到20个字段以下，因此我将检查是否为所有字段启用FastVectorHighlighter将提高整体性能
我还想知道，我们是否可以通过使用匹配文档中的一些信息（目前已提供）来进一步减少此列表
更新
对所有字段使用FastVectorHighlighter（对所有标记化字段将termVectors、termPositions和TermOffset设置为true）确实可以将高亮显示速度提高一个数量级，因此现在所有查询的运行时间都小于1s。该指数的大小增加了其原始值的3倍（从500米增加到2克）。还有一个问题是如何生成多值字段的片段，但是性能的提高已经足够高了。
我使用的是在Tomcat 7.0.28上运行的Solr 3.6，在没有启用termVectors的类似字段上运行相同的查询，显示响应时间没有显著差异。请注意，3.6中的突出显示似乎存在一系列问题。除了多值字段的问题外，数字字段似乎会被highlighter（）自动忽略，日期范围查询根本无法计算（）。