Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/csharp/271.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/sqlite/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
C# Lucene IndexWriter添加文档速度慢_C#_Lucene_Lucene.net - Fatal编程技术网

C# Lucene IndexWriter添加文档速度慢

C# Lucene IndexWriter添加文档速度慢,c#,lucene,lucene.net,C#,Lucene,Lucene.net,我编写了一个小循环,将10000个文档添加到IndexWriter中,这需要花费很长时间 是否有其他方法为大量文档编制索引 我这样问是因为当它上线时,它必须加载15000条记录 另一个问题是如何防止在重新启动web应用程序时再次加载所有记录 编辑 这是我使用的代码 for (int t = 0; t < 10000; t++){ doc = new Document(); text = "Value" + t.toString(); doc.Add(new Fiel

我编写了一个小循环,将10000个文档添加到IndexWriter中,这需要花费很长时间

是否有其他方法为大量文档编制索引

我这样问是因为当它上线时,它必须加载15000条记录

另一个问题是如何防止在重新启动web应用程序时再次加载所有记录

编辑

这是我使用的代码

for (int t = 0; t < 10000; t++){
    doc = new Document();
    text = "Value" + t.toString();
    doc.Add(new Field("Value", text, Field.Store.YES, Field.Index.TOKENIZED));
    iwriter.AddDocument(doc);
};
然后将代码添加到文档中,然后

        iwriter.Close();

只是检查一下,但你在运行调试器时没有附加调试器,是吗

这严重影响了添加文档时的性能

在我的机器(Lucene 2.0.0.4)上:

使用平台目标x86构建:

  • 无调试器-5.2秒

  • 已连接调试器-113.8秒

使用平台目标x64构建:

  • 无调试器-6.0秒

  • 已连接调试器-171.4秒

在RAMDirectory中保存和加载索引的粗略示例:

const int DocumentCount = 10 * 1000;
const string IndexFilePath = @"X:\Temp\tmp.idx";

Analyzer analyzer = new StandardAnalyzer();
Directory ramDirectory = new RAMDirectory();

IndexWriter indexWriter = new IndexWriter(ramDirectory, analyzer, true);

for (int i = 0; i < DocumentCount; i++)
{
    Document doc = new Document();
    string text = "Value" + i;
    doc.Add(new Field("Value", text, Field.Store.YES, Field.Index.TOKENIZED));
    indexWriter.AddDocument(doc);
}

indexWriter.Close();

//Save index
FSDirectory fileDirectory = FSDirectory.GetDirectory(IndexFilePath, true);
IndexWriter fileIndexWriter = new IndexWriter(fileDirectory, analyzer, true);
fileIndexWriter.AddIndexes(new[] { ramDirectory });
fileIndexWriter.Close();

//Load index
FSDirectory newFileDirectory = FSDirectory.GetDirectory(IndexFilePath, false);
Directory newRamDirectory = new RAMDirectory();
IndexWriter newIndexWriter = new IndexWriter(newRamDirectory, analyzer, true);
newIndexWriter.AddIndexes(new[] { newFileDirectory });

Console.WriteLine("New index writer document count:{0}.", newIndexWriter.DocCount());
const int DocumentCount=10*1000;
常量字符串IndexFilePath=@“X:\Temp\tmp.idx”;
Analyzer Analyzer=新的StandardAnalyzer();
目录ramDirectory=新的ramDirectory();
IndexWriter IndexWriter=新的IndexWriter(ramDirectory,analyzer,true);
对于(int i=0;i
您应该这样做以获得最佳性能。在我的机器上,我在1秒内索引了1000个文档

1) 您应该重复使用(文档、字段),而不是每次添加这样的文档时都创建

private static void IndexingThread(object contextObj)
{
     Range<int> range = (Range<int>)contextObj;
     Document newDoc = new Document();
     newDoc.Add(new Field("title", "", Field.Store.NO, Field.Index.ANALYZED));
     newDoc.Add(new Field("body", "", Field.Store.NO, Field.Index.ANALYZED));
     newDoc.Add(new Field("newsdate", "", Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
     newDoc.Add(new Field("id", "", Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));

     for (int counter = range.Start; counter <= range.End; counter++)
     {
         newDoc.GetField("title").SetValue(Entities[counter].Title);
         newDoc.GetField("body").SetValue(Entities[counter].Body);
         newDoc.GetField("newsdate").SetValue(Entities[counter].NewsDate);
         newDoc.GetField("id").SetValue(Entities[counter].ID.ToString());

         writer.AddDocument(newDoc);
     }
}
私有静态void索引线程(对象contextObj)
{
范围范围=(范围)contextObj;
Document newDoc=新文档();
newDoc.Add(新字段(“title”,”,Field.Store.NO,Field.Index.analysisted));
newDoc.Add(新字段(“body”,“Field.Store.NO”,“Field.Index.analysisted”);
newDoc.Add(新字段(“newsdate”,”,Field.Store.YES,Field.Index.NOT_-analysis_-NO_-norm));
newDoc.Add(新字段(“id”,”,Field.Store.YES,Field.Index.NOT_-analysis_-NO_-norm));

对于(int counter=range.Start;counter),大约需要2.5到3分钟。这是预期的吗?我应该补充一点,文档包含一个字段,并且该字段有“value”+t.toString()作为它的值。非常小。我们能看到你用来设置索引编写器的代码吗?+1,谢谢。今晚我会研究这个。我怀疑你提到它可能是调试器。谢谢你的帮助。谢谢你的帮助。我不太了解你的好时光。我的是16秒,这是可以接受的。我想它已经下降了给我的硬衣服。
private static void IndexingThread(object contextObj)
{
     Range<int> range = (Range<int>)contextObj;
     Document newDoc = new Document();
     newDoc.Add(new Field("title", "", Field.Store.NO, Field.Index.ANALYZED));
     newDoc.Add(new Field("body", "", Field.Store.NO, Field.Index.ANALYZED));
     newDoc.Add(new Field("newsdate", "", Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
     newDoc.Add(new Field("id", "", Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));

     for (int counter = range.Start; counter <= range.End; counter++)
     {
         newDoc.GetField("title").SetValue(Entities[counter].Title);
         newDoc.GetField("body").SetValue(Entities[counter].Body);
         newDoc.GetField("newsdate").SetValue(Entities[counter].NewsDate);
         newDoc.GetField("id").SetValue(Entities[counter].ID.ToString());

         writer.AddDocument(newDoc);
     }
}