Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/csharp/278.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
C# 使用NHibernate索引Lucene.Net中的大量数据_C#_Performance_Nhibernate_Lucene_Fluent - Fatal编程技术网

C# 使用NHibernate索引Lucene.Net中的大量数据

C# 使用NHibernate索引Lucene.Net中的大量数据,c#,performance,nhibernate,lucene,fluent,C#,Performance,Nhibernate,Lucene,Fluent,我们使用Nhibernate作为数据访问层。我们有一个170万条记录的表格,我们需要通过Lucene逐一索引这些记录,以便进行搜索。当我们运行我们为构建索引而编写的控制台应用程序时,它启动得很快,但当它通过项目时,它逐渐变得越来越慢 我们的第一次迭代就是将它们全部编入索引。第二次迭代是按类别对它们进行索引。现在,我们按类别选择子集,然后将它们分成100个“页面”。我们的性能仍在下降 我打开了sql profiler,当它迭代项目时,它会为每个项目调用sql server,一个接一个地调用图像,即

我们使用Nhibernate作为数据访问层。我们有一个170万条记录的表格,我们需要通过Lucene逐一索引这些记录,以便进行搜索。当我们运行我们为构建索引而编写的控制台应用程序时,它启动得很快,但当它通过项目时,它逐渐变得越来越慢

我们的第一次迭代就是将它们全部编入索引。第二次迭代是按类别对它们进行索引。现在,我们按类别选择子集,然后将它们分成100个“页面”。我们的性能仍在下降

我打开了sql profiler,当它迭代项目时,它会为每个项目调用sql server,一个接一个地调用图像,即使图像的延迟加载设置为not

这是一个商业网站,我们正在为目录项(产品)编制索引。每个目录项下都有0到多个图像(存储在单独的表中)

这是我们的地图:

public class ItemMap : ClassMap<Item>
    {
        public ItemMap()
        {
            Table("Products");

            Id(x => x.Id, "ProductId").GeneratedBy.GuidComb();

            Map(x => x.Model);
            Map(x => x.Description);

            Map(x => x.Created);
            Map(x => x.Modified);
            Map(x => x.IsActive);
            Map(x => x.PurchaseUrl).CustomType<UriType>();

            Component(x => x.Identifier, m =>
                {
                    m.Map(x => x.Upc);
                    m.Map(x => x.Asin);
                    m.Map(x => x.Isbn);
                    m.Map(x => x.Tid);
                });

            Component(x => x.Price, m =>
                {
                    m.Map(x => x.Currency);
                    m.Map(x => x.Amount, "Price");
                    m.Map(x => x.Shipping);
                });

            References(x => x.Brand, "BrandId");
            References(x => x.Category, "CategoryId");
            References(x => x.Supplier, "SupplierId");
            References(x => x.Provider, "ProviderId");

            HasMany(x => x.Images)
                .Table("ProductImages")
                .KeyColumn("ProductId")
                .Not.LazyLoad();




            // TODO: Add variants





        }

    }
然后对每个产品执行

exec sp_executesql N'SELECT images0_.ProductId as ProductId1_, images0_.ImageId as ImageId1_, images0_.ImageId as ImageId98_0_, images0_.Description as Descript2_98_0_, images0_.Url as Url98_0_, images0_.Created as Created98_0_, images0_.Modified as Modified98_0_, images0_.ProductId as ProductId98_0_ FROM ProductImages images0_ WHERE images0_.ProductId=@p0',N'@p0 uniqueidentifier',@p0='487EA053-4DD5-4EBA-AA36-95B30C42F0CD'
这很好。问题是,前2000个左右的产品速度非常快,但它在类别中运行的时间越长,速度就越慢,消耗的内存也就越多——即使它索引的产品数量相同。GC之所以能工作,是因为内存使用量下降,但总体而言,随着处理器的工作,它会上升

我们能做些什么来加速索引器?为什么它的性能在稳步下降?我不认为它是nhibernate或查询,因为它开始得太快了。我们在这里真的很困惑


谢谢,您是否对所有调用都使用相同的会话?如果是这种情况,它将缓存加载的实体,并在调用Flush时循环检查它们是否需要刷新(这取决于您的FlushMode)。为每一页的项目使用新会话,或者更改FlushMode


在使用Criteria时,您可以指定应使用sql连接预取特定属性,这可能会加快数据读取速度。我通常更信任Criteria API而不是Linq to NHibernate,因为我实际上决定了每个调用的操作。

Ayende有一篇关于如何完成此操作的帖子(使用无状态会话和自定义IList实现)就在几周前


这听起来正是您所需要的,至少是为了加快记录检索和最大限度地减少内存使用。

我们最终转向Solr进行索引。我们无法让它有效地索引,这可能是由于实现的原因

供参考:


在挖掘并运行sql 2008数据库引擎优化顾问和更多研究之后,数据收集似乎不是问题。问题(我们认为)Lucene没有很有效地处理创建大型索引的问题。文件不断变大,索引器运行越来越慢。您可以偶尔优化搜索索引。
exec sp_executesql N'SELECT TOP 100 ProductId100_1_, Upc100_1_, Asin100_1_, Isbn100_1_, Tid100_1_, Currency100_1_, Price100_1_, Shipping100_1_, Model100_1_, Descrip10_100_1_, Created100_1_, Modified100_1_, IsActive100_1_, Purchas14_100_1_, BrandId100_1_, CategoryId100_1_, SupplierId100_1_, ProviderId100_1_, CategoryId103_0_, Name103_0_, ShortName103_0_, Created103_0_, Modified103_0_, ShortId103_0_, DisplayO7_103_0_, IsActive103_0_, ParentCa9_103_0_ FROM (SELECT this_.ProductId as ProductId100_1_, this_.Upc as Upc100_1_, this_.Asin as Asin100_1_, this_.Isbn as Isbn100_1_, this_.Tid as Tid100_1_, this_.Currency as Currency100_1_, this_.Price as Price100_1_, this_.Shipping as Shipping100_1_, this_.Model as Model100_1_, this_.Description as Descrip10_100_1_, this_.Created as Created100_1_, this_.Modified as Modified100_1_, this_.IsActive as IsActive100_1_, this_.PurchaseUrl as Purchas14_100_1_, this_.BrandId as BrandId100_1_, this_.CategoryId as CategoryId100_1_, this_.SupplierId as SupplierId100_1_, this_.ProviderId as ProviderId100_1_, category1_.CategoryId as CategoryId103_0_, category1_.Name as Name103_0_, category1_.ShortName as ShortName103_0_, category1_.Created as Created103_0_, category1_.Modified as Modified103_0_, category1_.ShortId as ShortId103_0_, category1_.DisplayOrder as DisplayO7_103_0_, category1_.IsActive as IsActive103_0_, category1_.ParentCategoryId as ParentCa9_103_0_, ROW_NUMBER() OVER(ORDER BY CURRENT_TIMESTAMP) as __hibernate_sort_row FROM Products this_ left outer join Categories category1_ on this_.CategoryId=category1_.CategoryId WHERE (this_.IsActive = @p0 and (1=0 or (this_.CategoryId is not null and category1_.CategoryId = @p1)))) as query WHERE query.__hibernate_sort_row > 500 ORDER BY query.__hibernate_sort_row',N'@p0 bit,@p1 uniqueidentifier',@p0=1,@p1='A988FD8C-DD93-4119-8F84-0AF3656DAEDD'
exec sp_executesql N'SELECT images0_.ProductId as ProductId1_, images0_.ImageId as ImageId1_, images0_.ImageId as ImageId98_0_, images0_.Description as Descript2_98_0_, images0_.Url as Url98_0_, images0_.Created as Created98_0_, images0_.Modified as Modified98_0_, images0_.ProductId as ProductId98_0_ FROM ProductImages images0_ WHERE images0_.ProductId=@p0',N'@p0 uniqueidentifier',@p0='487EA053-4DD5-4EBA-AA36-95B30C42F0CD'