C# 一种优化的数组生成算法_C#_Algorithm_Optimization

C# 一种优化的数组生成算法

c# algorithm optimization

C# 一种优化的数组生成算法,c#,algorithm,optimization,C#,Algorithm,Optimization,我正在寻找一种优化算法，该算法给出我编写的结构的数组（或列表），删除重复的元素并返回它。我知道我可以通过一个复杂度为O（n^2）的简单算法来实现；但是我想要一个更好的算法任何帮助都将不胜感激。您可以在O（NlogN）时间内对数组排序，并比较相邻元素以删除重复元素。您可以使用复杂度为O（N）的哈希集：列表移除的副本（列表输入） { var结果=新哈希集（输入）；返回result.ToList（）； } 但它会增加内存使用量。这将在接近O（N）的时间内运行： [编辑] 由于没有微软的书面证

我正在寻找一种优化算法，该算法给出我编写的结构的数组（或列表），删除重复的元素并返回它。
我知道我可以通过一个复杂度为O（n^2）的简单算法来实现；但是我想要一个更好的算法

任何帮助都将不胜感激。

您可以在O（NlogN）时间内对数组排序，并比较相邻元素以删除重复元素。

您可以使用复杂度为O（N）的哈希集：

列表移除的副本（列表输入）
{
var结果=新哈希集（输入）；
返回result.ToList（）；
}

但它会增加内存使用量。

这将在接近O（N）的时间内运行：

[编辑]

由于没有微软的书面证据证明现在是O（N）时间，我使用以下代码进行了一些计时：

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

namespace Demo
{
    class Program
    {
        private void run()
        {
            test(1000);
            test(10000);
            test(100000);
        }

        private void test(int n)
        {
            var items = Enumerable.Range(0, n);
            new Action(() => items.Distinct().Count())
                .TimeThis("Distinct() with n == " + n + ": ", 10000);
        }

        static void Main()
        {
            new Program().run();
        }
    }

    static class DemoUtil
    {
        public static void TimeThis(this Action action, string title, int count = 1)
        {
            var sw = Stopwatch.StartNew();

            for (int i = 0; i < count; ++i)
                action();

            Console.WriteLine("Calling {0} {1} times took {2}",  title, count, sw.Elapsed);
        }
    }
}

时间随着

近似线性增加，至少对于这个特定的测试，这表明正在使用O（n）算法。

对于实际使用，LINQ的

Distinct

是最简单的解决方案。它使用基于哈希表的方法，可能与下面的算法非常相似

如果您对这种算法的外观感兴趣：

IEnumerable<T> Distinct(IEnumerable<T> sequence)
{
    var alreadySeen=new HashSet<T>();
    foreach(T item in sequence)
    {
        if(alreadySeen.Add(item))// Add returns false if item was already in set
            yield return;
    }
}

IEnumerable Distinct（IEnumerable序列）
{
var alreadySeen=newhashset（）；
foreach（按顺序排列的T项）
{
if（alreadySeen.Add（item））//如果项已在集合中，则Add返回false
收益率；
}
}

如果存在

不同元素和

总元素，则此算法将占用

O（d）

内存和

O（n）

时间

由于该算法使用哈希集，因此需要分布良好的哈希来实现运行时

O（n）

。如果散列很糟糕，运行时可能退化为

O（n*d）

没有理由重新发明轮子。

Distinct（）

的默认实现已经优化。使用它并感到高兴。算法是否需要稳定（即保持幸存元素的原始顺序）？@Nik:你说得对。我编辑了我的问题！可能是真的，但你有推荐人吗？MSDN没有指定任何内容。@Henkholtman:在O（N）中不这样做真是愚蠢；p@HenkHolterman只有这个StackOverflow答案：@HenkHolterman我很确定

Distinct

使用了明显的

HashSet

方法。

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

namespace Demo
{
    class Program
    {
        private void run()
        {
            test(1000);
            test(10000);
            test(100000);
        }

        private void test(int n)
        {
            var items = Enumerable.Range(0, n);
            new Action(() => items.Distinct().Count())
                .TimeThis("Distinct() with n == " + n + ": ", 10000);
        }

        static void Main()
        {
            new Program().run();
        }
    }

    static class DemoUtil
    {
        public static void TimeThis(this Action action, string title, int count = 1)
        {
            var sw = Stopwatch.StartNew();

            for (int i = 0; i < count; ++i)
                action();

            Console.WriteLine("Calling {0} {1} times took {2}",  title, count, sw.Elapsed);
        }
    }
}

Calling Distinct() with n == 1000:   10000 times took 00:00:00.5008792
Calling Distinct() with n == 10000:  10000 times took 00:00:06.1388296
Calling Distinct() with n == 100000: 10000 times took 00:00:58.5542259

IEnumerable<T> Distinct(IEnumerable<T> sequence)
{
    var alreadySeen=new HashSet<T>();
    foreach(T item in sequence)
    {
        if(alreadySeen.Add(item))// Add returns false if item was already in set
            yield return;
    }
}