C# 高效地查找同一数目出现次数最多的数组_C#_Linq_.net Core

C# 高效地查找同一数目出现次数最多的数组

c# linq .net-core

C# 高效地查找同一数目出现次数最多的数组,c#,linq,.net-core,C#,Linq,.net Core,假设我有以下嵌套数组： [ [1, 2, 3], [4, 7, 9, 13], [1, 2], [2, 3] [12, 15, 16] ] 我只需要相同数字出现次数最多的数组。在上述示例中，这将是： [ [1, 2, 3], [4, 7, 9, 13], [12, 15, 16] ] 如何使用C#有效地完成这项工作编辑我的问题确实令人困惑。我想问的是：如果某个较大的子数组已经包含较小子数组的所有元素，那么如何消除子数组我目

假设我有以下嵌套数组：

[
    [1, 2, 3],
    [4, 7, 9, 13],
    [1, 2],
    [2, 3]
    [12, 15, 16]
]

我只需要相同数字出现次数最多的数组。在上述示例中，这将是：

[
    [1, 2, 3],
    [4, 7, 9, 13],
    [12, 15, 16]
]

如何使用C#有效地完成这项工作

编辑我的问题确实令人困惑。我想问的是：如果某个较大的子数组已经包含较小子数组的所有元素，那么如何消除子数组

我目前对该问题的解决方法如下：

var allItems = new List<List<int>>{
            new List<int>{1, 2, 3},
            new List<int>{4, 7, 9, 13},
            new List<int>{1, 2},
            new List<int>{2, 3},
            new List<int>{12, 15, 16}
        };

var itemsToEliminate = new List<List<int>>();

for(var i = 0; i < allItems.ToList().Count; i++){
    var current = allItems[i];
    var itemsToVerify = allItems.Where(item => item != current).ToList();
    foreach(var item in itemsToVerify){
        bool containsSameNumbers = item.Intersect(current).Any();
        if(containsSameNumbers && item.Count > current.Count){
            itemsToEliminate.Add(current);          
        }
    }
}
allItems.RemoveAll(item => itemsToEliminate.Contains(item));
foreach(var item in allItems){
    Console.WriteLine(string.Join(", ", item));
}

var allItems=新列表{
新名单{1,2,3}，
新名单{4,7,9,13}，
新名单{1,2}，
新名单{2,3}，
新名单{12、15、16}
};
var itemstoelime=新列表（）；
对于（var i=0；iitem！=current.ToList（）；
foreach（itemsToVerify中的var项）{
bool containsSameNumbers=item.Intersect（当前）.Any（）；
如果（包含菜单编号和&item.Count>current.Count）{
itemsToEliminate.Add（当前）；
}
}
}
allItems.RemoveAll（item=>itemstoremove.Contains（item））；
foreach（所有项目中的var项目）{
Console.WriteLine（string.Join（“，”，item））；
}

这确实有效，但（var i=0；i

和foreach（itemsToVerify中的var项）
使其性能不佳。特别是如果您知道allItems
数组可以包含大约10000000行。
我会记住列表中已经存在的项。

首先通过减少长度对列表进行排序，然后检查每个项目是否已经存在。
根据您的算法，即使已知整数列表中已有单个整数，也不会添加数组
因此，我将使用以下算法：
List<List<int>> allItems = new List<List<int>>{
    new List<int>{1, 2, 3},
    new List<int>{4, 7, 9, 13},
    new List<int>{1, 2},
    new List<int>{2, 3},
    new List<int>{12, 15, 16}
};

allItems = allItems.OrderByDescending(x => x.Count()).ToList(); // order by length, decreasing order

List<List<int>> result = new List<List<int>>();
SortedSet<int> knownItems = new SortedSet<int>(); // keep track of numbers, so you don't have to loop arrays
// https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.sortedset-1?view=netframework-4.7.2

foreach (List<int> l in allItems)
{
    // bool allUnique = true;
    foreach (int elem in l)
    {
        if (knownItems.Contains(elem))
        {
            // allUnique = false;
            break;
        }
        else
        {
            // OK, because duplicates not allowed in single list
            // and because how the data is constrained (I still have my doubts about how the data is allowed to look and what special cases may pop up that ruin this, so use with care)
            // this WILL cause problems if a list starts with any number which has not yet been provided appears before the first match that would cause the list to be discarded.
            knownItems.Add(elem);
        }
    }
    // see comment above near knownItems.Add()
    /*
    if (allUnique)
    {
        result.Add(l);
        foreach (int elem in l)
        {
            knownItems.Add(elem);
        }
    }
    */
}

// output
foreach(List<int> item in result){
    Console.WriteLine(string.Join(", ", item));
}

你说的“大多数相同数字的出现”是什么意思？示例结果中的所有项目每个数字只出现一次，并且没有共同的数字…同一数字出现最多的是什么？不仅所有数组只包含一次它们所持有的每个数字，而且结果数组之间不共享任何数字。如果所有数组中都有数字2（2出现的次数是总数的3倍，这比所有其他数字都多），我会得到它，但这只是让人困惑。@FalcoGer，我认为TS意味着，如果某个较大的子数组已经包含较小子数组的所有元素，那么他需要消除子数组-array@JuryGolubev我想如果他是这个意思的话，他应该这么说。我不想猜答案。即使如此，数组的期望行为是什么？它同时包含之前发生的数字和未发生的数字。是否应包括阵列？如果以后的数组将相同的数字与以前出现的较少的数字相加，是否应包括该数组？还有，这里的最终目标是什么？这是家庭作业吗？当然是这样。@FalcoGer我同意，这就是我编辑这个问题的原因。使用OrderBy（x=>-x.Count（））
而不是OrderByDescending（x=>x.Count（））
有什么好处吗？我还没有检查过，但我想它们会返回相同的结果。@Klicker我想不会。老实说，我没有想到降序（）

。我认为你可以消除

的knownItems.Add（elem）循环。如果将其作为else条件添加到If（knownItems.Contains（elem））loop@user2810895考虑到数据的格式，我认为这是正确的。我肯定会测试它，但我不是100%确定哪些特殊情况可能会弹出，破坏这个解决方案。请彻底测试一下。
{1, 2, 3, 4, 5} - contains all elements that future arrays will have subsets of
{1, 4, 5} - must contain no element that {1,2,3,4,5} does not contain
{1, 2, 6} - illegal in this case
{7, 8 ,9} - OK
{8, 9} - OK (will be ignored)
{7, 9} - OK (will be ignored, is only subset in {7,8,9})
{1, 7} - - illegal, but would be legal if {1,2,3,4,5,7,8,9} was in this list. because it is longer it would've been earlier, making this valid to ignore.