C# 在比较两个列表时提高性能

C# 在比较两个列表时提高性能,c#,performance,list,C#,Performance,List,在比较两个列表中的项目时,我有什么选择?我有一些性能问题,我想知道是否有更快的替代方案: int[] foo = { 1, 2, 3, 4, 5 }; int[] bar = { 6, 7, 8, 9, 1 }; var result = foo.Any(x => bar.Contains(x)); 无论我自己使用lambda方法还是使用foreach,我都假设性能损失仍然是O(N^2)。我可以做些什么来影响它吗?您可以使用哈希集: int[] foo = { 1, 2, 3, 4,

在比较两个列表中的项目时,我有什么选择?我有一些性能问题,我想知道是否有更快的替代方案:

int[] foo = { 1, 2, 3, 4, 5 };
int[] bar = { 6, 7, 8, 9, 1 };

var result = foo.Any(x => bar.Contains(x));
无论我自己使用lambda方法还是使用
foreach
,我都假设性能损失仍然是
O(N^2)
。我可以做些什么来影响它吗?

您可以使用哈希集:

int[] foo = { 1, 2, 3, 4, 5 };
int[] bar = { 6, 7, 8, 9, 1 };
var hashSet = new Hashset<int>(bar);
var result = foo.Any(x => hashSet.Contains(x));
我打赌这是一场比赛,你可以使用:

bar
项创建
Set
,然后枚举
foo
,直到找到第一个匹配项。在内部,它看起来像:

Set<int> set = new Set<int>();

foreach (int local in bar) // M times
    set.Add(local); // O(1)

foreach (int value in foo) // N times max
{
    if (!set.Remove(value)) // O(1)
        continue;

    yield return value;
}
Set Set=newset();
foreach(int local in bar)//M次
set.Add(本地);//O(1)
foreach(foo中的int值)//N次最大值
{
如果(!set.Remove(value))//O(1)
继续;
收益回报值;
}

正如PatrykĆwiek正确指出的那样,为了完整性,这给了你O(N+M)而不是O(N*M)

,这里有一个基准程序来测试这个线程中的各种答案

这似乎表明HashSet方法的速度稍微快一些:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

namespace Demo
{
    internal class Program
    {
        private void run()
        {
            var foo = Enumerable.Range(     0, 100000).ToArray();
            var bar = Enumerable.Range(100000, 100000).ToArray();

            int trials = 4;

            Stopwatch sw = new Stopwatch();

            for (int i = 0; i < trials; ++i)
            {
                sw.Restart();
                method1(foo, bar);
                Console.WriteLine("method1()     took " +sw.Elapsed);

                sw.Restart();

                for (int j = 0; j < 100; ++j)
                    method2(foo, bar);

                Console.WriteLine("method2()*100 took " +sw.Elapsed);

                sw.Restart();

                for (int j = 0; j < 100; ++j)
                    method3(foo, bar);

                Console.WriteLine("method3()*100 took " +sw.Elapsed);

                Console.WriteLine();
            }
        }

        private static bool method1(int[] foo, int[] bar)
        {
            return foo.Any(bar.Contains);
        }

        private static bool method2(int[] foo, int[] bar)
        {
            var hashSet = new HashSet<int>(bar);
            return foo.Any(hashSet.Contains);
        }

        private static bool method3(int[] foo, int[] bar)
        {
            return foo.Intersect(bar).Any();
        }

        private static void Main()
        {
            new Program().run();
        }
    }
}

这不是仍然会创建一个内部嵌套循环吗?
Intersect
是一个集合操作,这意味着集合创建+对其中一个集合的迭代(假设集合上的
包含
操作是~
O(1)
)将产生
O(N+M)
渐近复杂性,而不是
O(N*M)
,与另一个答案类似,显式使用
HashSet
。我明白了。除了intersect之外,还有其他lambda方法可以在比较集合时创建集合吗?@Johan虽然理想情况下,
HashSet
上的比较现在应该是O(1),但这不会改变“性能损失”吗创建
HashSet
?@ThorstenDittmar可能会更快,我想这就是问题所在。当然,Sergey的更优雅,你的意思是
var HashSet=new HashSet(bar)
(否则编译错误)@ThorstenDittmar我注意到创建一次集合将使此解决方案
O(N+M)
。如果您在
Any
中为第二个集合的每个元素重新创建集合(集合创建为
O(N)
),则会将解决方案还原为
O(N*M)
+1手动创建哈希集并查找任何包含的项有点困难faster@SergeyBerezovskiy但它是如此边缘化,我个人会坚持你稍微可读的答案。
Set<int> set = new Set<int>();

foreach (int local in bar) // M times
    set.Add(local); // O(1)

foreach (int value in foo) // N times max
{
    if (!set.Remove(value)) // O(1)
        continue;

    yield return value;
}
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

namespace Demo
{
    internal class Program
    {
        private void run()
        {
            var foo = Enumerable.Range(     0, 100000).ToArray();
            var bar = Enumerable.Range(100000, 100000).ToArray();

            int trials = 4;

            Stopwatch sw = new Stopwatch();

            for (int i = 0; i < trials; ++i)
            {
                sw.Restart();
                method1(foo, bar);
                Console.WriteLine("method1()     took " +sw.Elapsed);

                sw.Restart();

                for (int j = 0; j < 100; ++j)
                    method2(foo, bar);

                Console.WriteLine("method2()*100 took " +sw.Elapsed);

                sw.Restart();

                for (int j = 0; j < 100; ++j)
                    method3(foo, bar);

                Console.WriteLine("method3()*100 took " +sw.Elapsed);

                Console.WriteLine();
            }
        }

        private static bool method1(int[] foo, int[] bar)
        {
            return foo.Any(bar.Contains);
        }

        private static bool method2(int[] foo, int[] bar)
        {
            var hashSet = new HashSet<int>(bar);
            return foo.Any(hashSet.Contains);
        }

        private static bool method3(int[] foo, int[] bar)
        {
            return foo.Intersect(bar).Any();
        }

        private static void Main()
        {
            new Program().run();
        }
    }
}
method1()     took 00:00:12.2781951
method2()*100 took 00:00:00.4920760
method3()*100 took 00:00:00.7045298

method1()     took 00:00:11.9267980
method2()*100 took 00:00:00.4688330
method3()*100 took 00:00:00.6886865

method1()     took 00:00:11.8959856
method2()*100 took 00:00:00.4736563
method3()*100 took 00:00:00.6875508

method1()     took 00:00:11.9083229
method2()*100 took 00:00:00.4572404
method3()*100 took 00:00:00.6838919