如何让ShingleFilterFactory使用Hibernate搜索?
上面是我的分析器定义,我在索引时使用PatternTokenizerFactory索引多个单词的短语。 另一个是带有ShingleFilterFactory的StandardTokenizerFactory,用于querytime,但我无法从搜索查询中接收令牌组合。 我所期望的是:当搜索查询是我的搜索查询时,它应该是我的搜索和搜索查询,但我得到的是我的搜索和查询 下面是我的功能如何让ShingleFilterFactory使用Hibernate搜索?,hibernate,search,solr,lucene,hibernate-search,Hibernate,Search,Solr,Lucene,Hibernate Search,上面是我的分析器定义,我在索引时使用PatternTokenizerFactory索引多个单词的短语。 另一个是带有ShingleFilterFactory的StandardTokenizerFactory,用于querytime,但我无法从搜索查询中接收令牌组合。 我所期望的是:当搜索查询是我的搜索查询时,它应该是我的搜索和搜索查询,但我得到的是我的搜索和查询 下面是我的功能 @AnalyzerDef( name = "tags", tokenizer =
@AnalyzerDef(
name = "tags",
tokenizer =
@TokenizerDef(factory = PatternTokenizerFactory.class,
params = {
@Parameter(name = "pattern", value=",")
}
),
filters = {
@TokenFilterDef(factory = StandardFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(
factory = StopFilterFactory.class,
params = {
@Parameter(name = "words", value = "data/ignorewords.txt"),
@Parameter(name = "ignoreCase", value = "true")
}
),
@TokenFilterDef(
factory = SynonymFilterFactory.class,
params = {
@Parameter(name = "ignoreCase", value="true"),
@Parameter(name = "expand", value="false"),
@Parameter(name = "synonyms", value="data/synonyms.txt")
}
),
@TokenFilterDef(
factory = SnowballPorterFilterFactory.class,
params = {
@Parameter(name = "language", value="English")
}
),
@TokenFilterDef(
factory = ShingleFilterFactory.class,
params = {
@Parameter(name = "minShingleSize", value="2"),
@Parameter(name = "maxShingleSize", value="3"),
@Parameter(name = "outputUnigrams", value="true"),
@Parameter(name = "outputUnigramsIfNoShingles", value="false")
}
),
@TokenFilterDef(
factory = PositionFilterFactory.class,
params = {
@Parameter(name = "positionIncrement", value = "100")
}
),
@TokenFilterDef(
factory = PhoneticFilterFactory.class,
params = {
@Parameter(name = "encoder", value="RefinedSoundex"),
@Parameter(name = "inject", value="true")
}
)
}
),
@AnalyzerDef(
name = "querytime",
tokenizer =
@TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = StandardFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = StopFilterFactory.class,
params = {
@Parameter(name = "words", value = "data/ignorewords.txt"),
@Parameter(name = "ignoreCase", value = "true")
}
),
@TokenFilterDef(
factory = SnowballPorterFilterFactory.class,
params = {
@Parameter(name = "language", value="English")
}
),
@TokenFilterDef(
factory = ShingleFilterFactory.class,
params = {
@Parameter(name = "minShingleSize", value="2"),
@Parameter(name = "maxShingleSize", value="3"),
@Parameter(name = "outputUnigrams", value="true"),
@Parameter(name = "outputUnigramsIfNoShingles", value="false")
}
),
@TokenFilterDef(
factory = PositionFilterFactory.class,
params = {
@Parameter(name = "positionIncrement", value = "100")
}
),
@TokenFilterDef(
factory = PhoneticFilterFactory.class,
params = {
@Parameter(name = "encoder", value="RefinedSoundex"),
@Parameter(name = "inject", value="true")
}
)
}
)
})
问题是我的索引标记是用于例如:说单词A,单词B,我正在搜索单词A的位置,这样它将显示单词A的记录,但没有结果。这是因为在搜索查询包含索引的短语之前,我不想显示任何结果。没有任何人的答案导致我自己深入研究问题并找到答案。因为它可能会帮助其他人,所以我在这里写它,解决方案非常简单,只需将searchQuery用引号括起来。我使用了:
FullTextSession fullTextSession = Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
// create native Lucene query
org.apache.lucene.search.Query luceneQuery = null;
String[] fields = new String[] {"tags"};
MultiFieldQueryParser parser = new MultiFieldQueryParser(
Version.LUCENE_31, fields, fullTextSession.getSearchFactory().getAnalyzer("querytime"));
try {
luceneQuery = parser.parse(searchQuery);
} catch (ParseException e) {
e.printStackTrace();
}
// wrap Lucene query in a org.hibernate.Query
org.hibernate.Query hibQuery = fullTextSession.createFullTextQuery(luceneQuery, CityArea.class);
// execute search
List result = hibQuery.list();
tx.commit();
session.close();
return result;
在标记分析器中,代码工作得非常好
现在我想知道为什么您使用不同的标记器一个用于查询,另一个用于存储?我在存储索引时使用PatternTokenizerFactory,因为我的标记是逗号分隔的,例如:如果标记是=>单词a、单词B、单词C等,我想索引单词a、单词B、单词C,而不是单词a、B,C,StandardTokenizerFactory将使用空格或特殊字符进行标记。但是用户搜索查询字符串将是一个字符串,所以我想使用ShingleFilterFactory来获取多单词标记,以便能够使用索引标记进行精确搜索。
@TokenizerDef(factory = StandardTokenizerFactory.class),