基于嵌套列表（C#）中包含的id元素比较两个通用列表的最有效方法_C#_Linq_Ienumerable_Generic List

基于嵌套列表（C#）中包含的id元素比较两个通用列表的最有效方法

c# linq

基于嵌套列表（C#）中包含的id元素比较两个通用列表的最有效方法,c#,linq,ienumerable,generic-list,C#,Linq,Ienumerable,Generic List,我有两个项目的通用列表，每个都包含供应商及其id的列表： List<ExisitingItems> List<Suppliers> List <PotentialMatches> List<Suppliers> Suppliers SupplierId Name 但是，当处理大于500k的大量记录时，它效率不高，执行速度非常慢如何才能更有效地执行相同类型的比较？您当前的算法似乎是O（n*m*s*s）其中n=

我有两个项目的通用列表，每个都包含供应商及其id的列表：

List<ExisitingItems>
    List<Suppliers>

List <PotentialMatches>
    List<Suppliers>

Suppliers
    SupplierId
    Name

但是，当处理大于500k的大量记录时，它效率不高，执行速度非常慢

如何才能更有效地执行相同类型的比较？

您当前的算法似乎是

O（n*m*s*s）

其中n=现有项目的数量，m=潜在匹配的数量，s=每个现有项目/潜在匹配的供应商平均数量。通过使用散列集匹配供应商，可以将运行时间减少到

O（n*m*s）

一般版本如下所示

public static IEnumerable<(T1, T2)> SetJoin<T1, T2, TKey>(
        IEnumerable<T1> t1s,
        IEnumerable<T2> t2s,
        Func<T1, IEnumerable<TKey>> t1Key,
        Func<T2, IEnumerable<TKey>> t2Key) where TKey : IEquatable<TKey>
    {
        foreach (var t1 in t1s)
        {
            var t1Keys = new HashSet<TKey>(t1Key(t1));
            foreach (var t2 in t2s)
            {
                // t2Key(t2) would be called many times, 
                // might be worth pre-computing it for each t2.
                if (t2Key(t2).Any(t1Keys.Contains))
                {
                    yield return (t1, t2);
                }
            }    
        }
    }

公共静态IEnumerable SetJoin(
IEnumerable t1s，
IEnumerable t2s，
Func t1Key，
Func t2Key）其中TKey:IEquatable
{
foreach（t1s中的变量t1）
{
var t1Keys=newhashset（t1Key（t1））；
foreach（t2s中的变量t2）
{
//t2Key（t2）会被多次调用，
//可能值得为每个t2预先计算它。
if（t2键（t2）.Any（t1Keys.Contains））
{
收益率（t1，t2）；
}
}    
}
}

就这样说吧

SetJoin<ExistingItems, PotentialMatches, int>(
              existingItems, 
              potentialMatches,
              e=> e.Suppliers.Select(s => s.Id),
              p => p.Suppliers.Select(s => s.Id))

SetJoin(
现有项目，
潜在的匹配，
e=>e.Suppliers.选择（s=>s.Id），
p=>p.Suppliers.Select（s=>s.Id））

此外，虽然linq可以产生紧凑而漂亮的代码，但如果性能很重要，使用常规循环编写等效逻辑通常会更快

SetJoin<ExistingItems, PotentialMatches, int>(
              existingItems, 
              potentialMatches,
              e=> e.Suppliers.Select(s => s.Id),
              p => p.Suppliers.Select(s => s.Id))