C# 具有多个GroupBy的批处理

C# 具有多个GroupBy的批处理,c#,linq,linq-to-objects,C#,Linq,Linq To Objects,我有一个CSV文件,其中的记录需要排序,然后分组为任意大小的批次(例如,每个批次最多300条记录)。每个批次的记录可能少于300条,因为每个批次的内容必须是同质的(基于两个不同列的内容) 我的LINQ声明受以下答案的启发: var query = (from line in EbrRecords let EbrData = line.Split('\t') let Location = EbrData[7] let RepN

我有一个CSV文件,其中的记录需要排序,然后分组为任意大小的批次(例如,每个批次最多300条记录)。每个批次的记录可能少于300条,因为每个批次的内容必须是同质的(基于两个不同列的内容)

我的LINQ声明受以下答案的启发:

var query = (from line in EbrRecords
            let EbrData = line.Split('\t')
            let Location = EbrData[7]
            let RepName = EbrData[4]
            let AccountID = EbrData[0]
            orderby Location, RepName, AccountID).
            Select((data, index) => new {
                Record = new EbrRecord(
                AccountID = EbrData[0],
                AccountName = EbrData[1],
                MBSegment = EbrData[2],
                RepName = EbrData[4],
                Location = EbrData[7],
                TsrLocation = EbrData[8]
                )
                ,
                Index = index}
                ).GroupBy(x => new {x.Record.Location, x.Record.RepName, batch = x.Index / 100});    
    public static bool BatchGroup(string ID, ref string priorID )
    {
        if (priorID != ID)
        {
            priorID = ID;
            return true;
        }
        return false;
    }
int i = 0;
string priorID = null;
var result = from q in input
                orderby q.Age, q.ID
             group q by new { q.ID, batch = (BatchGroup(q.ID, ref priorID) ? i=0 : ++i) / 3 };
“/100”给出了任意的桶大小。groupby的其他元素旨在实现批次之间的匀浆。我怀疑这几乎就是我想要的,但它给了我以下编译器错误:
查询体必须以select子句或group子句结尾。我理解为什么我会收到这个错误,但总的来说,我不确定如何修复这个查询。怎么做

更新我几乎实现了我所追求的目标,包括:

List<EbrRecord> input = new List<EbrRecord> {
    new EbrRecord {Name = "Brent",Age = 20,ID = "A"},
    new EbrRecord {Name = "Amy",Age = 20,ID = "B"},
    new EbrRecord {Name = "Gabe",Age = 23,ID = "B"},
    new EbrRecord {Name = "Noah",Age = 27,ID = "B"},
    new EbrRecord {Name = "Alex",Age = 27,ID = "B"},
    new EbrRecord {Name = "Stormi",Age = 27,ID = "B"},
    new EbrRecord {Name = "Roger",Age = 27,ID = "B"},
    new EbrRecord {Name = "Jen",Age = 27,ID = "B"},
    new EbrRecord {Name = "Adrian",Age = 28,ID = "B"},
    new EbrRecord {Name = "Cory",Age = 29,ID = "C"},
    new EbrRecord {Name = "Bob",Age = 29,ID = "C"},
    new EbrRecord {Name = "George",Age = 29,ID = "C"},
    };

//look how tiny this query is, and it is very nearly the result I want!!!
int i = 0;
var result = from q in input
                orderby q.Age, q.ID
                group q by new { q.ID, batch = i++ / 3 };

foreach (var agroup in result)
{
    Debug.WriteLine("ID:" + agroup.Key);
    foreach (var record in agroup)
    {
        Debug.WriteLine(" Name:" + record.Name);
    }
}
虽然这个答案是可以接受的,但距离理想结果只差一小部分。“B批”的第一次出现应该有三个实体(艾米,盖布,诺亚)——而不是两个(艾米,盖布)。这是因为在识别每个组时,索引位置不会重置。有人知道如何为每个组重置我的自定义索引位置吗

更新2 我想我可能已经找到了答案。首先,创建一个如下所示的附加函数:

var query = (from line in EbrRecords
            let EbrData = line.Split('\t')
            let Location = EbrData[7]
            let RepName = EbrData[4]
            let AccountID = EbrData[0]
            orderby Location, RepName, AccountID).
            Select((data, index) => new {
                Record = new EbrRecord(
                AccountID = EbrData[0],
                AccountName = EbrData[1],
                MBSegment = EbrData[2],
                RepName = EbrData[4],
                Location = EbrData[7],
                TsrLocation = EbrData[8]
                )
                ,
                Index = index}
                ).GroupBy(x => new {x.Record.Location, x.Record.RepName, batch = x.Index / 100});    
    public static bool BatchGroup(string ID, ref string priorID )
    {
        if (priorID != ID)
        {
            priorID = ID;
            return true;
        }
        return false;
    }
int i = 0;
string priorID = null;
var result = from q in input
                orderby q.Age, q.ID
             group q by new { q.ID, batch = (BatchGroup(q.ID, ref priorID) ? i=0 : ++i) / 3 };
其次,按如下方式更新LINQ查询:

var query = (from line in EbrRecords
            let EbrData = line.Split('\t')
            let Location = EbrData[7]
            let RepName = EbrData[4]
            let AccountID = EbrData[0]
            orderby Location, RepName, AccountID).
            Select((data, index) => new {
                Record = new EbrRecord(
                AccountID = EbrData[0],
                AccountName = EbrData[1],
                MBSegment = EbrData[2],
                RepName = EbrData[4],
                Location = EbrData[7],
                TsrLocation = EbrData[8]
                )
                ,
                Index = index}
                ).GroupBy(x => new {x.Record.Location, x.Record.RepName, batch = x.Index / 100});    
    public static bool BatchGroup(string ID, ref string priorID )
    {
        if (priorID != ID)
        {
            priorID = ID;
            return true;
        }
        return false;
    }
int i = 0;
string priorID = null;
var result = from q in input
                orderby q.Age, q.ID
             group q by new { q.ID, batch = (BatchGroup(q.ID, ref priorID) ? i=0 : ++i) / 3 };
现在它做了我想要的。我只是希望我不需要那个单独的功能

这有用吗

var query = (from line in EbrRecords
        let EbrData = line.Split('\t')
        let Location = EbrData[7]
        let RepName = EbrData[4]
        let AccountID = EbrData[0]
        orderby Location, RepName, AccountID
        select new EbrRecord(
                AccountID = EbrData[0],
                AccountName = EbrData[1],
                MBSegment = EbrData[2],
                RepName = EbrData[4],
                Location = EbrData[7],
                TsrLocation = EbrData[8])
        ).Select((data, index) => new
        {
            Record = data,
            Index = index
        })
        .GroupBy(x => new {x.Record.Location, x.Record.RepName, batch = x.Index / 100},
            x => x.Record);
这行吗

var query = (from line in EbrRecords
        let EbrData = line.Split('\t')
        let Location = EbrData[7]
        let RepName = EbrData[4]
        let AccountID = EbrData[0]
        orderby Location, RepName, AccountID
        select new EbrRecord(
                AccountID = EbrData[0],
                AccountName = EbrData[1],
                MBSegment = EbrData[2],
                RepName = EbrData[4],
                Location = EbrData[7],
                TsrLocation = EbrData[8])
        ).Select((data, index) => new
        {
            Record = data,
            Index = index
        })
        .GroupBy(x => new {x.Record.Location, x.Record.RepName, batch = x.Index / 100},
            x => x.Record);
如StriplingWarrior的回答所示,在上面的后面需要有一个select子句。Linq理解查询必须以select或group by结尾


不幸的是,有一个逻辑缺陷。。。假设第一组中有50个帐户,第二组中有100个帐户,批量大小为100。原始代码将生产3批尺寸为50的产品,而不是2批尺寸为50、100的产品

这里有一个解决方法

IEnumerable<IGrouping<int, EbrRecord>> query = ...

  orderby Location, RepName, AccountID
  select new EbrRecord(
    AccountID = EbrData[0],
    AccountName = EbrData[1],
    MBSegment = EbrData[2],
    RepName = EbrData[4],
    Location = EbrData[7],
    TsrLocation = EbrData[8]) into x
  group x by new {Location = x.Location, RepName = x.RepName} into g
  from g2 in g.Select((data, index) => new Record = data, Index = index })
              .GroupBy(y => y.Index/100, y => y.Record)
  select g2;


List<List<EbrRecord>> result = query.Select(g => g.ToList()).ToList();
IEnumerable查询=。。。
orderby位置、RepName、AccountID
选择新记录(
AccountID=EbrData[0],
AccountName=EbrData[1],
MBSegment=EbrData[2],
RepName=EbrData[4],
位置=EbrData[7],
TsrLocation=EbrData[8])转换为x
按新的{Location=x.Location,RepName=x.RepName}将x分组到g中
从g.Select中的g2((数据,索引)=>newrecord=data,index=index})
.GroupBy(y=>y.Index/100,y=>y.Record)
选择g2;
List result=query.Select(g=>g.ToList()).ToList();

还要注意,由于重复迭代,使用GroupBy批处理的速度非常慢。您可以编写一个for循环,该循环将在有序集上一次完成,并且该循环将比LinqToObjects运行得更快

如StriplingWarrior的回答所示,在上面的后面需要有一个select子句。Linq理解查询必须以select或group by结尾


不幸的是,有一个逻辑缺陷。。。假设第一组中有50个帐户,第二组中有100个帐户,批量大小为100。原始代码将生产3批尺寸为50的产品,而不是2批尺寸为50、100的产品

这里有一个解决方法

IEnumerable<IGrouping<int, EbrRecord>> query = ...

  orderby Location, RepName, AccountID
  select new EbrRecord(
    AccountID = EbrData[0],
    AccountName = EbrData[1],
    MBSegment = EbrData[2],
    RepName = EbrData[4],
    Location = EbrData[7],
    TsrLocation = EbrData[8]) into x
  group x by new {Location = x.Location, RepName = x.RepName} into g
  from g2 in g.Select((data, index) => new Record = data, Index = index })
              .GroupBy(y => y.Index/100, y => y.Record)
  select g2;


List<List<EbrRecord>> result = query.Select(g => g.ToList()).ToList();
IEnumerable查询=。。。
orderby位置、RepName、AccountID
选择新记录(
AccountID=EbrData[0],
AccountName=EbrData[1],
MBSegment=EbrData[2],
RepName=EbrData[4],
位置=EbrData[7],
TsrLocation=EbrData[8])转换为x
按新的{Location=x.Location,RepName=x.RepName}将x分组到g中
从g.Select中的g2((数据,索引)=>newrecord=data,index=index})
.GroupBy(y=>y.Index/100,y=>y.Record)
选择g2;
List result=query.Select(g=>g.ToList()).ToList();


还要注意,由于重复迭代,使用GroupBy批处理的速度非常慢。您可以编写一个for循环,该循环将在有序集上一次完成,并且该循环的运行速度将比LinqToObjects快得多。

我期望的是一个EbrRecord列表列表(列表列表列表)。但是上面给出了一个匿名类型的列表,其中只包含Location、RepName和batch。我想知道我链接到的帖子是否真的实现了我的想法或希望。@Brent:GroupBy将创建一个
IEnumerable
IGrouping
s,其中每个都有一个
键,带有位置、RepName和批次,但它本身也是一个
IEnumerable
,包含所选值。如果您在我更新的答案中使用重载,您应该有效地拥有一个
IEnumerable
。然而,我认为它很可能没有达到你所希望的效果。一定要通读大卫B的答案。他提出了一些很好的观点。我所期望的是一个记录列表(列表列表)。但是上面给出了一个匿名类型的列表,其中只包含Location、RepName和batch。我想知道我链接到的帖子是否真的实现了我的想法或希望。@Brent:GroupBy将创建一个
IEnumerable
IGrouping
s,其中每个都有一个
键,带有位置、RepName和批次,但它本身也是一个
IEnumerable
,包含所选值。如果您在我更新的答案中使用重载,您应该有效地拥有一个
IEnumerable
。然而,我认为它很可能没有达到你所希望的效果。一定要通读大卫B的答案。他提出了一些很好的观点。我的智能感知和编译器拒绝让我在“选择新”之后放置“分组依据”,除非我切换到点符号。修复了许多令人尴尬的打字错误。现在我完成了(不管它是否有效)。添加了列表转换列表。好的,经过进一步分析,我看到你的答案得到了完全正确的结果。我已经检查过这个作为答案。但是,哦,老兄,这是你提出的一个繁忙的问题!!!请看我对我的问题所做的更新,在这里我找到了一个近乎完美(而且更简单)的解决方案。如果你能让我的更新在没有