C# lucene.net索引速度降低_C#_.net_Lucene.net_Lucene

C# lucene.net索引速度降低

c# .net lucene

C# lucene.net索引速度降低,c#,.net,lucene.net,lucene,C#,.net,Lucene.net,Lucene,我正在使用Lucene.net搜索大约50K个实体。这些实体保存在数据库中。我创建了一个应用程序，每次尝试索引100个实体代码非常简单： var entityList = GetEntityList(100); foreach (var item in entityList) Indexer.IndexEntity(item); 这是索引器类： public class Indexer { public void IndexEntity(Entity item)

我正在使用Lucene.net搜索大约50K个实体。这些实体保存在数据库中。我创建了一个应用程序，每次尝试索引100个实体
代码非常简单：

var entityList = GetEntityList(100); foreach (var item in entityList) Indexer.IndexEntity(item);
这是索引器类：

public class Indexer { public void IndexEntity(Entity item) { IndexWriter writer; string path = ConfigurationManager.AppSettings["SearchIndexPath"]; FSDirectory directory = FSDirectory.Open(new DirectoryInfo(path)); Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_29); if (Directory.GetFiles(path).Length > 0) writer = new IndexWriter(directory, analyzer, false, IndexWriter.MaxFieldLength.UNLIMITED); else writer = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED); Document document = new Document(); document.Add(new Field("id", item.Id.ToString(), Field.Store.YES, Field.Index.NO, Field.TermVector.NO)); document.Add(new Field("category", item.Category.Id.ToString(), Field.Store.YES, Field.Index.NO, Field.TermVector.NO)); document.Add(new Field("location", item.Location.Id.ToString(), Field.Store.YES, Field.Index.NO, Field.TermVector.NO)); document.Add(new Field("point", item.Point.ToString(), Field.Store.YES, Field.Index.NO, Field.TermVector.NO)); document.Add(new Field("picture", item.PictureUrl, Field.Store.YES, Field.Index.NO, Field.TermVector.NO)); document.Add(new Field("creationdate", item.CreationDate.ToString(), Field.Store.YES, Field.Index.NO, Field.TermVector.NO)); document.Add(new Field("title", item.Title, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES)); document.Add(new Field("body", item.Body, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES)); string str2 = string.Empty; foreach (Tag tag in item.Tags) { if (!string.IsNullOrEmpty(str2)) { str2 = str2 + "-"; } str2 = str2 + tag.DisplayName; } document.Add(new Field("tags", str2, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES)); writer.AddDocument(document); writer.Optimize(); writer.Close(); } }
一切都很好，我的搜索速度现在已经足够快了。但问题是索引速度降低了。到目前为止，我的应用程序索引了大约15K个实体，索引文件大小约为600MB。现在，当它想要索引100个新实体时，需要花费大约24分钟的时间

有什么问题？提前感谢。
在您的代码中，有两件事非常明显：

添加每个文档后，您都在优化索引。对于Lucene的最新版本，有很好的理由说明你根本不应该优化你的索引（每段缓存），尽管有这些理由，在添加每个文档后优化你的索引是过分的

您不断地打开/关闭/提交索引。给定循环构造，为什么不在循环外部打开索引编写器，添加实体，然后关闭/提交呢。如果需要更快的索引可见性，可以在循环中添加一个定期提交命令（基于某种模数算法，我觉得还可以）

通过这两项更改，我认为您将在索引工作中看到显著的速度提升。
在您的代码中有两件事情非常明显：

在添加每个文档后，您都在优化索引。对于Lucene的最新版本，有很好的理由说明您根本不应该优化索引（每段缓存），尽管有这些原因，但在添加每个文档后优化索引未免有些过火

你不断地打开/关闭/提交索引。鉴于你的循环构造，为什么不在循环外打开索引编写器，添加实体，然后关闭/提交。如果你需要更快的索引可见性，你可以向循环中添加定期提交命令（基于某种模数算术，我觉得没问题）

有了这两个变化，我想你会看到你的索引工作大大加快。
你的意思是我根本不应该优化索引？没错。也许值得稍微调整一下你的索引编写器合并因子（这与优化效果类似），但除此之外，我不想麻烦了。然后，它从45分钟减少到了50秒。我还优化了每增加100个项目的索引。+1（接受）你尝试过优化吗（）在你的循环结束时？我想知道这是否更好。你的意思是我根本不应该优化索引？没错。也许值得稍微调整一下你的索引编写器合并因子（它具有类似于优化的效果），但除此之外，我不想麻烦。然后，它从45分钟减少到了50秒。我还优化了每增加100个项目的索引。+1（接受）你在循环结束时尝试过optimize（）吗？我想知道它是否更好。