Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/oop/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache-Lucene多词短语模糊搜索_Lucene_Fuzzy Search - Fatal编程技术网

Apache-Lucene多词短语模糊搜索

Apache-Lucene多词短语模糊搜索,lucene,fuzzy-search,Lucene,Fuzzy Search,我有以下Apache Lucene 7应用程序: StandardAnalyzer standardAnalyzer = new StandardAnalyzer(); Directory directory = new RAMDirectory(); IndexWriterConfig config = new IndexWriterConfig(standardAnalyzer); IndexWriter writer = new IndexWriter(directory, config)

我有以下Apache Lucene 7应用程序:

StandardAnalyzer standardAnalyzer = new StandardAnalyzer();
Directory directory = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(standardAnalyzer);
IndexWriter writer = new IndexWriter(directory, config);
Document document = new Document();

document.add(new TextField("content", new FileReader("document.txt"))); 
writer.addDocument(document);
writer.close();

IndexReader reader = DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);

Query fuzzyQuery = new FuzzyQuery(new Term("content", "Company"), 2);

TopDocs results = searcher.search(fuzzyQuery, 5);
System.out.println("Hits: " + results.totalHits);
System.out.println("Max score:" + results.getMaxScore())
当我使用它时:

new FuzzyQuery(new Term("content", "Company"), 2);
应用程序工作正常,并返回以下结果:

Hits: 1
Max score:0.35161147
Hits: 0
Max score:NaN
但当我尝试使用多术语查询进行搜索时,例如:

新FuzzyQuery(新术语(“内容”、“公司名称”),2)

它返回以下结果:

Hits: 1
Max score:0.35161147
Hits: 0
Max score:NaN
无论如何,短语
公司名称
存在于源
document.txt
文件中

在这种情况下,如何正确使用
FuzzyQuery
,以便能够对多词短语进行模糊搜索

已更新

基于提供的解决方案,我已在以下文本信息上对其进行了测试:

Company name: BlueCross BlueShield              Customer Service 
   1-800-521-2227           
                        of Texas                          Preauth-Medical              1-800-441-9188           
                                                          Preauth-MH/CD                1-800-528-7264           
                                                          Blue Card Access             1-800-810-2583     
对于以下查询:

SpanQuery[] clauses = new SpanQuery[2];
clauses[0] = new SpanMultiTermQueryWrapper<FuzzyQuery>(new FuzzyQuery(new Term("content", "BlueCross"), 2));
clauses[1] = new SpanMultiTermQueryWrapper<FuzzyQuery>(new FuzzyQuery(new Term("content", "BlueShield"), 2));
SpanNearQuery query = new SpanNearQuery(clauses, 0, true);
但是当我试图破坏一点搜索查询时(例如从
BlueCross
BlueCros


这里的问题如下,您使用的是
TextField
,这是标记化字段。例如,您的文本
“公司名称正在处理某些事情”
将被有效地用空格(和其他空格)分隔。因此,即使您有文本
公司名称
,在指数化过程中,它也会变成
公司
名称
,等等

在这种情况下,此TermQuery将无法找到您要查找的内容。帮助您的技巧如下所示:

SpanQuery[] clauses = new SpanQuery[2];
    clauses[0] = new SpanMultiTermQueryWrapper(new FuzzyQuery(new Term("content", "some"), 2));
    clauses[1] = new SpanMultiTermQueryWrapper(new FuzzyQuery(new Term("content", "text"), 2));
    SpanNearQuery query = new SpanNearQuery(clauses, 0, true);
然而,我不太推荐这种方法,特别是如果你的工作量很大,并且你计划搜索一个10个长期的公司名称。我们应该知道,这些查询可能很难执行

BlueCros的以下问题如下所示。默认情况下,Lucene将StandardAnalyzer用于文本字段。这意味着它有效地降低了术语的大小写,基本上意味着
内容
字段中的
BlueCross
变为
BlueCross

BlueCros
bluecross
之间的模糊差异为3,这就是您没有匹配项的原因

简单的建议是通过执行类似于
.toLowerCase()


一般来说,在查询期间(例如,在构建查询期间),您应该更喜欢使用相同的分析器。

对于Lucene.Net,它可以是这样的

private string _IndexPath = @"Your Index Path";
private Directory _Directory;
private Searcher _IndexSearcher;
private MultiPhraseQuery _MultiPhraseQuery;

_Directory = FSDirectory.Open(_IndexPath);
IndexReader indexReader = IndexReader.Open(_Directory, true);

string field = "Name" // Your field name
string keyword = "big red fox"; // your search term 
float fuzzy = 0,7f; // between 0-1
using (_IndexSearcher = new IndexSearcher(indexReader))
{
    // "big red fox" to [big,red,fox]
    var keywordSplit = keyword.Split();

    _MultiPhraseQuery = new MultiPhraseQuery();
    FuzzyTermEnum[] _FuzzyTermEnum = new FuzzyTermEnum[keywordSplit.Length];
    Term[] _Term = new Term[keywordSplit.Length];

    for (int i = 0; i < keywordSplit.Length; i++)
    {
        _FuzzyTermEnum[i] = new FuzzyTermEnum(indexReader, new Term(field, keywordSplit[i]),fuzzy);
        _Term[i] = _FuzzyTermEnum[i].Term;
        if (_Term[i] == null)
        {
            _MultiPhraseQuery.Add(new Term(field, keywordSplit[i]));
        }
        else
        {
            _MultiPhraseQuery.Add(_FuzzyTermEnum[i].Term);
        }
    }

    var results = _IndexSearcher.Search(_MultiPhraseQuery, indexReader.MaxDoc);

    foreach (var loopDoc in results.ScoreDocs.OrderByDescending(s => s.Score))
    {
        //YourCode Here
    }
}
private string\u indepath=@“您的索引路径”;
私有目录_目录;
专用搜索器(索引搜索器),;
私人多语篇(private multiphrasequiry);;
_Directory=FSDirectory.Open(_IndexPath);
IndexReader IndexReader=IndexReader.Open(_目录,true);
string field=“Name”//您的字段名
string关键字=“大红狐”;//你的搜索词
浮动模糊=0,7f;//在0-1之间
使用(_indexsearch=newindexsearch(indexReader))
{
//“大红狐”对[大红狐]
var keywordSplit=关键字.Split();
_MultiPhraseQuery=新的MultiPhraseQuery();
fuzzyternum[]\u fuzzyternum=新的fuzzyternum[keywordSplit.Length];
术语[]_Term=新术语[keywordSplit.Length];
for(int i=0;is.Score))
{
//你的代码在这里
}
}

谢谢您的回答。我已经用当前的结果更新了我的问题。你能检查一下我做错了什么吗?@alexanoid更新了,但是,我建议不要再编辑了,因为这个问题变得太广泛和全面了
private string _IndexPath = @"Your Index Path";
private Directory _Directory;
private Searcher _IndexSearcher;
private MultiPhraseQuery _MultiPhraseQuery;

_Directory = FSDirectory.Open(_IndexPath);
IndexReader indexReader = IndexReader.Open(_Directory, true);

string field = "Name" // Your field name
string keyword = "big red fox"; // your search term 
float fuzzy = 0,7f; // between 0-1
using (_IndexSearcher = new IndexSearcher(indexReader))
{
    // "big red fox" to [big,red,fox]
    var keywordSplit = keyword.Split();

    _MultiPhraseQuery = new MultiPhraseQuery();
    FuzzyTermEnum[] _FuzzyTermEnum = new FuzzyTermEnum[keywordSplit.Length];
    Term[] _Term = new Term[keywordSplit.Length];

    for (int i = 0; i < keywordSplit.Length; i++)
    {
        _FuzzyTermEnum[i] = new FuzzyTermEnum(indexReader, new Term(field, keywordSplit[i]),fuzzy);
        _Term[i] = _FuzzyTermEnum[i].Term;
        if (_Term[i] == null)
        {
            _MultiPhraseQuery.Add(new Term(field, keywordSplit[i]));
        }
        else
        {
            _MultiPhraseQuery.Add(_FuzzyTermEnum[i].Term);
        }
    }

    var results = _IndexSearcher.Search(_MultiPhraseQuery, indexReader.MaxDoc);

    foreach (var loopDoc in results.ScoreDocs.OrderByDescending(s => s.Score))
    {
        //YourCode Here
    }
}