Python 使用.agg（许多列）更有效地保留groupby之后的所有列_Python_Pandas

Python 使用.agg（许多列）更有效地保留groupby之后的所有列

python pandas

Python 使用.agg（许多列）更有效地保留groupby之后的所有列,python,pandas,Python,Pandas,我发现了一些与这个问题相关的话题，“如何在groupby之后保留所有列”，但我的问题是，我知道怎么做，但我不知道如何更有效例如： df=pd.DataFrame({'A':[1,1,2,3], 'B':[2,2,4,3],'d':[2,np.nan,1,4],'e':['this is','my life','not use 1','not use 2'],'f':[1,2,3,4] }) print(df) A B d e

我发现了一些与这个问题相关的话题，“如何在groupby之后保留所有列”，但我的问题是，我知道怎么做，但我不知道如何更有效

例如：

df=pd.DataFrame({'A':[1,1,2,3], 'B':[2,2,4,3],'d':[2,np.nan,1,4],'e':['this is','my life','not use 1','not use 2'],'f':[1,2,3,4]
                 })

print(df)
   A  B    d          e  f
0  1  2  2.0    this is  1
1  1  2  NaN    my life  2
2  2  4  1.0  not use 1  3
3  3  3  4.0  not use 2  4

如果列

A和B

相等，我需要连接列

中的字符串。为此，我使用以下代码：

df=df.groupby(['A','B'],as_index=False).agg({'e':' '.join,'d':'first','f':'first'})
print(df)
   A  B    d  f                e
0  1  2  2.0  1  this is my life
1  2  4  1.0  3        not use 1
2  3  3  4.0  4        not use 2

这对我来说是正确的输出。但正如您所看到的，为了保留列

f和d

，我需要将它们逐个放在这个

agg dict

中。在我的真实数据中，我有20列，我不想在代码中手动输入所有这些列的名称

是否有更好的解决方案来保留groupby之后的所有列，
或者有什么方法可以改进我的解决方案，而不是我现在使用的？

您可以创建字典，为所有列使用“排除字典的列表和方法”值，然后将

添加到字典中：

d = dict.fromkeys(df.columns.difference(['A','B','e']), 'first')
print(d)
{'d': 'first', 'f': 'first'}

d['e'] = ' '.join
print(d)
{'d': 'first', 'f': 'first', 'e': <built-in method join of str object at 0x00000000025E1880>}

最后，如果订单与原始添加一样重要：

谢谢，你能给我简要解释一下第一行吗？那里发生了什么事？

d1 = dict.fromkeys(df.columns.difference(['A','B','e']), 'first')
d2 = {'e': ' '.join}

d = {**d1, **d2}

df=df.groupby(['A','B'],as_index=False).agg(d)
print(df)
   A  B    d  f                e
0  1  2  2.0  1  this is my life
1  2  4  1.0  3        not use 1
2  3  3  4.0  4        not use 2

df=df.groupby(['A','B'],as_index=False).agg(d).reindex(df.columns, axis=1)
print (df)
   A  B    d                e  f
0  1  2  2.0  this is my life  1
1  2  4  1.0        not use 1  3
2  3  3  4.0        not use 2  4