C# 是否可以作为一个单一的高效LINQ查询来实现这一点？_C#_Algorithm_Linq_Time Complexity

C# 是否可以作为一个单一的高效LINQ查询来实现这一点？

c# algorithm linq time-complexity

C# 是否可以作为一个单一的高效LINQ查询来实现这一点？,c#,algorithm,linq,time-complexity,C#,Algorithm,Linq,Time Complexity,我有一门课 public class Foo { public string X; public string Y; public int Z; } 我想要实现的查询是，给定一个名为foos的IEnumerable 按X分组，然后按Y分组，然后选择最大的子组从每个超级组中选择一个；如果有平局，则选择具有最大的Z。“ 换句话说，一个不那么紧凑的解决方案 var outer = foos.GroupBy(f => f.X); foreach(var g1 in oute

我有一门课

public class Foo
{
   public string X;
   public string Y;
   public int Z;
}

我想要实现的查询是，给定一个名为

foos

的

IEnumerable

按X分组，然后按Y分组，然后选择最大的子组从每个超级组中选择一个；如果有平局，则选择具有最大的Z。“

换句话说，一个不那么紧凑的解决方案

var outer = foos.GroupBy(f => f.X);
foreach(var g1 in outer)
{
   var inner = g1.GroupBy(g2 => g2.Y);
   int maxCount = inner.Max(g3 => g3.Count());
   var winners = inner.Where(g4 => g4.Count() == maxCount));
   if(winners.Count() > 1)
   {
      yield return winners.MaxBy(w => w.Z);
   }
   else
   {
      yield return winners.Single();
   }
}

一个不那么有效的解决方案是

from foo in foos
group foo by new { foo.X, foo.Y } into g
order by g.Key.X, g.Count(), g.Max(f => f.Z)
. . . // can't figure the rest out

但理想情况下，我希望既紧凑又高效。

您过度重用了可枚举文件，这会导致整个可枚举文件再次执行，在某些情况下会导致性能显著降低

您不太紧凑的代码可以简化为这样

foreach (var byX in foos.GroupBy(f => f.X))
{
    yield return byX.GroupBy(f => f.Y, f => f, (_, byY) => byY.ToList())
                    .MaxBy(l => l.Count)
                    .MaxBy(f => f.Z);
}

事情是这样的

项目按x分组，因此变量名为

byX

，这意味着整个

byX

可枚举项包含类似的

现在您可以按

对这些分组的项目进行分组。名为

byY

的变量意味着整个

byY

enumerable包含类似的

，它们也具有类似的

最后，您选择最大的列表，即

winners

（

MaxyBy（l=>l.Count）

），并从赢家中选择具有最高

（

MaxBy（f=>f.Z）

）的项目

我使用

byY.ToList（）

的原因是为了防止重复枚举，否则重复枚举将由

Count（）

和

MaxBy（）

引起

或者，您可以将整个迭代器更改为单个返回语句

return foos.GroupBy(f => f.X, f => f, (_, byX) => 
        byX.GroupBy(f => f.Y, f => f,(__, byY) => byY.ToList())
            .MaxBy(l => l.Count)
            .MaxBy(f => f.Z));

进一步思考这个问题，我意识到您的

orderby

可以极大地简化一切，但仍然不确定它是否可以理解

var ans = foos.GroupBy(f => f.X, (_, gXfs) => gXfs.GroupBy(gXf => gXf.Y).Select(gXgYfs => gXgYfs.ToList())
                                                  .OrderByDescending(gXgYfs => gXgYfs.Count).ThenByDescending(gXgYfs => gXgYfs.Max(gXgYf => gXgYf.Z)).First());

虽然在LINQ中可以做到这一点，但如果在使用查询理解语法时将其编成一个语句，我觉得它不会更紧凑或更容易理解：

var ans = from foo in foos
          group foo by foo.X into foogX
          let foogYs = (from foo in foogX
                        group foo by foo.Y into rfoogY
                        select rfoogY)
          let maxYCount = foogYs.Max(y => y.Count())
          let foogYsmZ = from fooY in foogYs
                      where fooY.Count() == maxYCount
                      select new { maxZ = fooY.Max(f => f.Z), fooY = from f in fooY select f }
          let maxMaxZ = foogYsmZ.Max(y => y.maxZ)
          select (from foogY in foogYsmZ where foogY.maxZ == maxMaxZ select foogY.fooY).First();

如果您愿意使用lambda语法，有些事情会变得更简单、更短，但不一定更容易理解：

var ans = from foogX in foos.GroupBy(f => f.X)
          let foogYs = foogX.GroupBy(f => f.Y)
          let maxYCount = foogYs.Max(foogY => foogY.Count())
          let foogYmCmZs = foogYs.Where(fooY => fooY.Count() == maxYCount).Select(fooY => new { maxZ = fooY.Max(f => f.Z), fooY })
          let maxMaxZ = foogYmCmZs.Max(foogYmZ => foogYmZ.maxZ)
          select foogYmCmZs.Where(foogYmZ => foogYmZ.maxZ == maxMaxZ).First().fooY.Select(y => y);

使用大量lambda语法，您可能会完全无法理解：

var ans = foos.GroupBy(f => f.X, (_, gXfs) => gXfs.GroupBy(gXf => gXf.Y).Select(gXgYf => new { fCount = gXgYf.Count(), maxZ = gXgYf.Max(f => f.Z), gXgYfs = gXgYf.Select(f => f) }))
              .Select(fC_mZ_gXgYfs_s => {
                  var maxfCount = fC_mZ_gXgYfs_s.Max(fC_mZ_gXgYfs => fC_mZ_gXgYfs.fCount);
                  var fC_mZ_gXgYfs_mCs = fC_mZ_gXgYfs_s.Where(fC_mZ_gXgYfs => fC_mZ_gXgYfs.fCount == maxfCount).ToList();
                  var maxMaxZ = fC_mZ_gXgYfs_mCs.Max(fC_mZ_gXgYfs => fC_mZ_gXgYfs.maxZ);
                  return fC_mZ_gXgYfs_mCs.Where(fC_mZ_gXgYfs => fC_mZ_gXgYfs.maxZ == maxMaxZ).First().gXgYfs;
              });

（我修改了第三种可能性，以减少重复计算并使其更加枯燥，但这确实使其更加冗长。）

根据您问题的措辞，我假设您希望结果为

IEnumerable

。元素按

和

进行分组，因此特定内部序列中的所有元素将具有相同的

和

值。此外，对于

，每个内部序列将具有不同（唯一）的值

鉴于以下数据

X Y Z ----- A p 1 A p 2 A q 1 A r 3 B p 1 B q 2 可以使用以下LINQ表达式获得此结果：

var result = foos
    .GroupBy(
        outerFoo => outerFoo.X,
        (x, xFoos) => xFoos
            .GroupBy(
                innerFoo => innerFoo.Y,
                (y, yFoos) => yFoos)
            .OrderByDescending(yFoos => yFoos.Count())
            .ThenByDescending(yFoos => yFoos.Select(foo => foo.Z).Max())
            .First());

如果您真的关心性能，您很可能会以降低复杂性为代价来改进性能：

拾取元素最多或Z值最高的组时，将对每个组中的元素执行两次传递。首先使用

yFoos.Count（）

对元素进行计数，然后使用

yFoos.Select（foo=>foo.Z）.Max（）计算最大Z值。但是，您可以通过使用Aggregate
一次完成相同的操作
result = foos
    .GroupBy(
        outerFoo => outerFoo.X,
        (x, xFoos) => xFoos
            .GroupBy(
                innerFoo => innerFoo.Y,
                (y, yFoos) => new
                {
                    Foos = yFoos,
                    Aggregate = yFoos.Aggregate(
                        (Count: 0, MaxZ: int.MinValue),
                        (accumulator, foo) =>
                            (Count: accumulator.Count + 1,
                             MaxZ: Math.Max(accumulator.MaxZ, foo.Z)))
                })
            .Aggregate(
                new
                {
                    Foos = Enumerable.Empty<Foo>(),
                    Aggregate = (Count: 0, MaxZ: int.MinValue)
                },
                (accumulator, grouping) =>
                    grouping.Aggregate.Count > accumulator.Aggregate.Count
                        || grouping.Aggregate.Count == accumulator.Aggregate.Count
                            && grouping.Aggregate.MaxZ > accumulator.Aggregate.MaxZ
                        ? grouping : accumulator)
            .Foos);

此外，没有必要对所有组进行排序以找到“最大”组。相反，可以使用Aggregate
对所有组进行一次遍历，以再次找到“最大”组
result = foos
    .GroupBy(
        outerFoo => outerFoo.X,
        (x, xFoos) => xFoos
            .GroupBy(
                innerFoo => innerFoo.Y,
                (y, yFoos) => new
                {
                    Foos = yFoos,
                    Aggregate = yFoos.Aggregate(
                        (Count: 0, MaxZ: int.MinValue),
                        (accumulator, foo) =>
                            (Count: accumulator.Count + 1,
                             MaxZ: Math.Max(accumulator.MaxZ, foo.Z)))
                })
            .Aggregate(
                new
                {
                    Foos = Enumerable.Empty<Foo>(),
                    Aggregate = (Count: 0, MaxZ: int.MinValue)
                },
                (accumulator, grouping) =>
                    grouping.Aggregate.Count > accumulator.Aggregate.Count
                        || grouping.Aggregate.Count == accumulator.Aggregate.Count
                            && grouping.Aggregate.MaxZ > accumulator.Aggregate.MaxZ
                        ? grouping : accumulator)
            .Foos);

result=foos
.群比(
outerFoo=>outerFoo.X，
（x，xFoos）=>xFoos
.群比(
innerFoo=>innerFoo.Y，
（y，yFoos）=>新
{
Foos=yFoos，
聚合=yFoos.Aggregate(
（计数：0，最大值：int.MinValue），
（累加器，foo）=>
（计数：累加器。计数+1，
MaxZ:Math.Max（acculator.MaxZ，foo.Z）））
})
.合计(
新的
{
Foos=Enumerable.Empty（），
聚合=（计数：0，最大值：int.MinValue）
},
（累加器，分组）=>
分组.Aggregate.Count>累加器.Aggregate.Count
||grouping.Aggregate.Count==accumulator.Aggregate.Count
&&grouping.Aggregate.MaxZ>acculator.Aggregate.MaxZ
？分组：累加器）
(四),；

我使用ValueTuple
作为Aggregate
中的累加器，因为我希望它具有良好的性能。然而，如果你真的想知道你应该测量。
你可以忽略外部分组，剩下的只是一点高级的MaxBy，类似于双参数排序。如果您实现了这一点，您将得到如下结果：
public IEnumerable<IGrouping<string, Foo>> GetFoo2(IEnumerable<Foo> foos)
{
    return foos.GroupBy(f => f.X)
               .Select(f => f.GroupBy(g => g.Y)
                             .MaxBy2(g => g.Count(), g => g.Max(m => m.Z)));
}

public IEnumerable GetFoo2（IEnumerable foos）
{
返回foos.GroupBy（f=>f.X）
.选择（f=>f.GroupBy（g=>g.Y）
.MaxBy2（g=>g.Count（），g=>g.Max（m=>m.Z））；
}

当您将所有功能移到非常普通的函数中时，您可以将这种linq方法称为多少是值得怀疑的。您还可以使用aggregate实现该功能。有两种选择。有籽和无籽。我喜欢后一种选择：
public IEnumerable<IGrouping<string, Foo>> GetFoo3(IEnumerable<Foo> foos)
{
    return foos.GroupBy(f => f.X)
               .Select(f => f.GroupBy(g => g.Y)
                             .Aggregate((a, b) =>
                                    a.Count() > b.Count() ? a :
                                    a.Count() < b.Count() ? b :
                                    a.Max(m => m.Z) >= b.Max(m => m.Z) ? a : b
                             ));
}

public IEnumerable GetFoo3（IEnumerable foos）
{
返回foos.GroupBy（f=>f.X）
.选择（f=>f.GroupBy（g=>g.Y）
.合计（（a，b）=>
a、 计数（）>b.计数（）？a:
a、 计数（）m.Z）>=b.Max（m=>m.Z）？a:b
));
}

如果Count（）不是常数时间，性能会受到影响，这是不能保证的，但在我的测试中，它工作得很好。带有种子的变种会更复杂，但如果这样做可能会更快