Python 使用熊猫对csv进行排序和分组
我导入一个CSV文件,其内容如下:Python 使用熊猫对csv进行排序和分组,python,pandas,Python,Pandas,我导入一个CSV文件,其内容如下: Id; PartNrInt; Some; other; stuff; R1; 1234-5678; x1; y1; z1; R2; 1234-6789; x2; y2; z2; R3; 1234-5678; x3; y3; z3; Id; PartNrInt; OrderNr; Manufacturer; Some; other; stuff; R1; 1234-5678; OrderNr1; Manuf1; x1; y1; z1; R2; 1234-678
Id; PartNrInt; Some; other; stuff;
R1; 1234-5678; x1; y1; z1;
R2; 1234-6789; x2; y2; z2;
R3; 1234-5678; x3; y3; z3;
Id; PartNrInt; OrderNr; Manufacturer; Some; other; stuff;
R1; 1234-5678; OrderNr1; Manuf1; x1; y1; z1;
R2; 1234-6789; OrderNr2: Manuf2; x2; y2; z2;
R3; 1234-5678; OrderNr1: Manuf1; x3; y3; z3;
然后我有一个数据库,其中包含每个PartNrInt
的附加数据。我合并了这两个数据帧,所以我有如下内容:
Id; PartNrInt; Some; other; stuff;
R1; 1234-5678; x1; y1; z1;
R2; 1234-6789; x2; y2; z2;
R3; 1234-5678; x3; y3; z3;
Id; PartNrInt; OrderNr; Manufacturer; Some; other; stuff;
R1; 1234-5678; OrderNr1; Manuf1; x1; y1; z1;
R2; 1234-6789; OrderNr2: Manuf2; x2; y2; z2;
R3; 1234-5678; OrderNr1: Manuf1; x3; y3; z3;
这部分工作正常,我可以轻松打印数据框。为了在我们的ERP系统中导入文件,我必须按照PartNrInt
对表进行分组。
所以我想要一张像这样的桌子:
Count; Names; PartNrInt; OrderNr; Manufacturer
2; R1, R3; 1234-5678; OrderNr1; Manuf1
1; R2; 1234-6789; OrderNr1; Manuf1
我的问题是,我可以使用df.groupby('PartNrInt')['Id']对数据进行分组。应用(list)
并对对象进行计数,但我无法在新帧中获取新数据以进行导出。
我对pandas和python完全陌生,因此可能有一个非常简单的解决方案。您可以使用join
和len
、last和reorder列:
df = df.groupby(['PartNrInt', 'OrderNr','Manufacturer']).Id
.agg({"Names": ','.join, 'Count': len})
.reset_index()[['Count','Names','PartNrInt','OrderNr','Manufacturer']]
print (df)
Count Names PartNrInt OrderNr Manufacturer
0 2 R1,R3 1234-5678 OrderNr1 Manuf1
1 1 R2 1234-6789 OrderNr2 Manuf2
哇,太棒了!非常感谢!如果可以的话,我也会建议投票给另一个答案:)也许以后,我还没有必要的15个名声:D