C# 具有多个GroupBy的批处理
我有一个CSV文件,其中的记录需要排序,然后分组为任意大小的批次(例如,每个批次最多300条记录)。每个批次的记录可能少于300条,因为每个批次的内容必须是同质的(基于两个不同列的内容) 我的LINQ声明受以下答案的启发:C# 具有多个GroupBy的批处理,c#,linq,linq-to-objects,C#,Linq,Linq To Objects,我有一个CSV文件,其中的记录需要排序,然后分组为任意大小的批次(例如,每个批次最多300条记录)。每个批次的记录可能少于300条,因为每个批次的内容必须是同质的(基于两个不同列的内容) 我的LINQ声明受以下答案的启发: var query = (from line in EbrRecords let EbrData = line.Split('\t') let Location = EbrData[7] let RepN
var query = (from line in EbrRecords
let EbrData = line.Split('\t')
let Location = EbrData[7]
let RepName = EbrData[4]
let AccountID = EbrData[0]
orderby Location, RepName, AccountID).
Select((data, index) => new {
Record = new EbrRecord(
AccountID = EbrData[0],
AccountName = EbrData[1],
MBSegment = EbrData[2],
RepName = EbrData[4],
Location = EbrData[7],
TsrLocation = EbrData[8]
)
,
Index = index}
).GroupBy(x => new {x.Record.Location, x.Record.RepName, batch = x.Index / 100});
public static bool BatchGroup(string ID, ref string priorID )
{
if (priorID != ID)
{
priorID = ID;
return true;
}
return false;
}
int i = 0;
string priorID = null;
var result = from q in input
orderby q.Age, q.ID
group q by new { q.ID, batch = (BatchGroup(q.ID, ref priorID) ? i=0 : ++i) / 3 };
“/100”给出了任意的桶大小。groupby的其他元素旨在实现批次之间的匀浆。我怀疑这几乎就是我想要的,但它给了我以下编译器错误:查询体必须以select子句或group子句结尾。我理解为什么我会收到这个错误,但总的来说,我不确定如何修复这个查询。怎么做
更新我几乎实现了我所追求的目标,包括:
List<EbrRecord> input = new List<EbrRecord> {
new EbrRecord {Name = "Brent",Age = 20,ID = "A"},
new EbrRecord {Name = "Amy",Age = 20,ID = "B"},
new EbrRecord {Name = "Gabe",Age = 23,ID = "B"},
new EbrRecord {Name = "Noah",Age = 27,ID = "B"},
new EbrRecord {Name = "Alex",Age = 27,ID = "B"},
new EbrRecord {Name = "Stormi",Age = 27,ID = "B"},
new EbrRecord {Name = "Roger",Age = 27,ID = "B"},
new EbrRecord {Name = "Jen",Age = 27,ID = "B"},
new EbrRecord {Name = "Adrian",Age = 28,ID = "B"},
new EbrRecord {Name = "Cory",Age = 29,ID = "C"},
new EbrRecord {Name = "Bob",Age = 29,ID = "C"},
new EbrRecord {Name = "George",Age = 29,ID = "C"},
};
//look how tiny this query is, and it is very nearly the result I want!!!
int i = 0;
var result = from q in input
orderby q.Age, q.ID
group q by new { q.ID, batch = i++ / 3 };
foreach (var agroup in result)
{
Debug.WriteLine("ID:" + agroup.Key);
foreach (var record in agroup)
{
Debug.WriteLine(" Name:" + record.Name);
}
}
虽然这个答案是可以接受的,但距离理想结果只差一小部分。“B批”的第一次出现应该有三个实体(艾米,盖布,诺亚)——而不是两个(艾米,盖布)。这是因为在识别每个组时,索引位置不会重置。有人知道如何为每个组重置我的自定义索引位置吗
更新2
我想我可能已经找到了答案。首先,创建一个如下所示的附加函数:
var query = (from line in EbrRecords
let EbrData = line.Split('\t')
let Location = EbrData[7]
let RepName = EbrData[4]
let AccountID = EbrData[0]
orderby Location, RepName, AccountID).
Select((data, index) => new {
Record = new EbrRecord(
AccountID = EbrData[0],
AccountName = EbrData[1],
MBSegment = EbrData[2],
RepName = EbrData[4],
Location = EbrData[7],
TsrLocation = EbrData[8]
)
,
Index = index}
).GroupBy(x => new {x.Record.Location, x.Record.RepName, batch = x.Index / 100});
public static bool BatchGroup(string ID, ref string priorID )
{
if (priorID != ID)
{
priorID = ID;
return true;
}
return false;
}
int i = 0;
string priorID = null;
var result = from q in input
orderby q.Age, q.ID
group q by new { q.ID, batch = (BatchGroup(q.ID, ref priorID) ? i=0 : ++i) / 3 };
其次,按如下方式更新LINQ查询:
var query = (from line in EbrRecords
let EbrData = line.Split('\t')
let Location = EbrData[7]
let RepName = EbrData[4]
let AccountID = EbrData[0]
orderby Location, RepName, AccountID).
Select((data, index) => new {
Record = new EbrRecord(
AccountID = EbrData[0],
AccountName = EbrData[1],
MBSegment = EbrData[2],
RepName = EbrData[4],
Location = EbrData[7],
TsrLocation = EbrData[8]
)
,
Index = index}
).GroupBy(x => new {x.Record.Location, x.Record.RepName, batch = x.Index / 100});
public static bool BatchGroup(string ID, ref string priorID )
{
if (priorID != ID)
{
priorID = ID;
return true;
}
return false;
}
int i = 0;
string priorID = null;
var result = from q in input
orderby q.Age, q.ID
group q by new { q.ID, batch = (BatchGroup(q.ID, ref priorID) ? i=0 : ++i) / 3 };
现在它做了我想要的。我只是希望我不需要那个单独的功能 这有用吗
var query = (from line in EbrRecords
let EbrData = line.Split('\t')
let Location = EbrData[7]
let RepName = EbrData[4]
let AccountID = EbrData[0]
orderby Location, RepName, AccountID
select new EbrRecord(
AccountID = EbrData[0],
AccountName = EbrData[1],
MBSegment = EbrData[2],
RepName = EbrData[4],
Location = EbrData[7],
TsrLocation = EbrData[8])
).Select((data, index) => new
{
Record = data,
Index = index
})
.GroupBy(x => new {x.Record.Location, x.Record.RepName, batch = x.Index / 100},
x => x.Record);
这行吗
var query = (from line in EbrRecords
let EbrData = line.Split('\t')
let Location = EbrData[7]
let RepName = EbrData[4]
let AccountID = EbrData[0]
orderby Location, RepName, AccountID
select new EbrRecord(
AccountID = EbrData[0],
AccountName = EbrData[1],
MBSegment = EbrData[2],
RepName = EbrData[4],
Location = EbrData[7],
TsrLocation = EbrData[8])
).Select((data, index) => new
{
Record = data,
Index = index
})
.GroupBy(x => new {x.Record.Location, x.Record.RepName, batch = x.Index / 100},
x => x.Record);
如StriplingWarrior的回答所示,在上面的后面需要有一个select子句。Linq理解查询必须以select或group by结尾
不幸的是,有一个逻辑缺陷。。。假设第一组中有50个帐户,第二组中有100个帐户,批量大小为100。原始代码将生产3批尺寸为50的产品,而不是2批尺寸为50、100的产品
这里有一个解决方法
IEnumerable<IGrouping<int, EbrRecord>> query = ...
orderby Location, RepName, AccountID
select new EbrRecord(
AccountID = EbrData[0],
AccountName = EbrData[1],
MBSegment = EbrData[2],
RepName = EbrData[4],
Location = EbrData[7],
TsrLocation = EbrData[8]) into x
group x by new {Location = x.Location, RepName = x.RepName} into g
from g2 in g.Select((data, index) => new Record = data, Index = index })
.GroupBy(y => y.Index/100, y => y.Record)
select g2;
List<List<EbrRecord>> result = query.Select(g => g.ToList()).ToList();
IEnumerable查询=。。。
orderby位置、RepName、AccountID
选择新记录(
AccountID=EbrData[0],
AccountName=EbrData[1],
MBSegment=EbrData[2],
RepName=EbrData[4],
位置=EbrData[7],
TsrLocation=EbrData[8])转换为x
按新的{Location=x.Location,RepName=x.RepName}将x分组到g中
从g.Select中的g2((数据,索引)=>newrecord=data,index=index})
.GroupBy(y=>y.Index/100,y=>y.Record)
选择g2;
List result=query.Select(g=>g.ToList()).ToList();
还要注意,由于重复迭代,使用GroupBy批处理的速度非常慢。您可以编写一个for循环,该循环将在有序集上一次完成,并且该循环将比LinqToObjects运行得更快
如StriplingWarrior的回答所示,在上面的后面需要有一个select子句。Linq理解查询必须以select或group by结尾
不幸的是,有一个逻辑缺陷。。。假设第一组中有50个帐户,第二组中有100个帐户,批量大小为100。原始代码将生产3批尺寸为50的产品,而不是2批尺寸为50、100的产品
这里有一个解决方法
IEnumerable<IGrouping<int, EbrRecord>> query = ...
orderby Location, RepName, AccountID
select new EbrRecord(
AccountID = EbrData[0],
AccountName = EbrData[1],
MBSegment = EbrData[2],
RepName = EbrData[4],
Location = EbrData[7],
TsrLocation = EbrData[8]) into x
group x by new {Location = x.Location, RepName = x.RepName} into g
from g2 in g.Select((data, index) => new Record = data, Index = index })
.GroupBy(y => y.Index/100, y => y.Record)
select g2;
List<List<EbrRecord>> result = query.Select(g => g.ToList()).ToList();
IEnumerable查询=。。。
orderby位置、RepName、AccountID
选择新记录(
AccountID=EbrData[0],
AccountName=EbrData[1],
MBSegment=EbrData[2],
RepName=EbrData[4],
位置=EbrData[7],
TsrLocation=EbrData[8])转换为x
按新的{Location=x.Location,RepName=x.RepName}将x分组到g中
从g.Select中的g2((数据,索引)=>newrecord=data,index=index})
.GroupBy(y=>y.Index/100,y=>y.Record)
选择g2;
List result=query.Select(g=>g.ToList()).ToList();
还要注意,由于重复迭代,使用GroupBy批处理的速度非常慢。您可以编写一个for循环,该循环将在有序集上一次完成,并且该循环的运行速度将比LinqToObjects快得多。我期望的是一个EbrRecord列表列表(列表列表列表)。但是上面给出了一个匿名类型的列表,其中只包含Location、RepName和batch。我想知道我链接到的帖子是否真的实现了我的想法或希望。@Brent:GroupBy将创建一个IEnumerable
的IGrouping
s,其中每个都有一个键,带有位置、RepName和批次,但它本身也是一个IEnumerable
,包含所选值。如果您在我更新的答案中使用重载,您应该有效地拥有一个IEnumerable
。然而,我认为它很可能没有达到你所希望的效果。一定要通读大卫B的答案。他提出了一些很好的观点。我所期望的是一个记录列表(列表列表)。但是上面给出了一个匿名类型的列表,其中只包含Location、RepName和batch。我想知道我链接到的帖子是否真的实现了我的想法或希望。@Brent:GroupBy将创建一个IEnumerable
的IGrouping
s,其中每个都有一个键,带有位置、RepName和批次,但它本身也是一个IEnumerable
,包含所选值。如果您在我更新的答案中使用重载,您应该有效地拥有一个IEnumerable
。然而,我认为它很可能没有达到你所希望的效果。一定要通读大卫B的答案。他提出了一些很好的观点。我的智能感知和编译器拒绝让我在“选择新”之后放置“分组依据”,除非我切换到点符号。修复了许多令人尴尬的打字错误。现在我完成了(不管它是否有效)。添加了列表转换列表。好的,经过进一步分析,我看到你的答案得到了完全正确的结果。我已经检查过这个作为答案。但是,哦,老兄,这是你提出的一个繁忙的问题!!!请看我对我的问题所做的更新,在这里我找到了一个近乎完美(而且更简单)的解决方案。如果你能让我的更新在没有