Python 按除列以外的列进行分组，并取该列的第一个值_Python_Pandas_Dataframe

Python 按除列以外的列进行分组，并取该列的第一个值

python pandas dataframe

Python 按除列以外的列进行分组，并取该列的第一个值,python,pandas,dataframe,Python,Pandas,Dataframe,使用pandas数据帧并执行groupby求和，除了一个ID列，我只想保留它的第一个值。以下是起始数据帧： ID color height weight id_1 blue 60 10 id_2 red 50 30 id_3 blue 100 30 id_4 orange 60 35 id_5 red 100 30 因此，所需的输出数据帧如下所示： ID colo

使用pandas数据帧并执行groupby求和，除了一个ID列，我只想保留它的第一个值。以下是起始数据帧：

ID      color   height  weight
id_1    blue    60      10
id_2    red     50      30
id_3    blue    100     30
id_4    orange  60      35
id_5    red     100     30

因此，所需的输出数据帧如下所示：

ID      color   height  weight
id_1    blue    160     40
id_4    orange  60      35
id_2    red     150     60

df.groupby(['color']).sum().reset_index()

我意识到我可以做这样的群比：

ID      color   height  weight
id_1    blue    160     40
id_4    orange  60      35
id_2    red     150     60

df.groupby(['color']).sum().reset_index()

但这还不包括我正在寻找的ID的第一个值：

    color   height  weight
    blue    160     40
    orange  60      35
    red     150     60

Groupby color和agg first用于ID，sum用于其余列，我认为

  df.groupby(['color']).agg(ID=('ID', 'first'),height=('height', 'sum'),weight=('weight', 'sum')).reset_index()

    color    ID  height  weight
0    blue  id_1     160      40
1  orange  id_4      60      35
2     red  id_2     150      60

Groupby color和agg first用于ID，sum用于其余列，我认为

  df.groupby(['color']).agg(ID=('ID', 'first'),height=('height', 'sum'),weight=('weight', 'sum')).reset_index()

    color    ID  height  weight
0    blue  id_1     160      40
1  orange  id_4      60      35
2     red  id_2     150      60

这会将除ID之外的所有列作为总和处理，并泛化为更多列

df.groupby('color').agg({k: 'first' if k == 'ID' else sum for k in df.columns})

这会将除ID之外的所有列作为总和处理，并泛化为更多列

df.groupby('color').agg({k: 'first' if k == 'ID' else sum for k in df.columns})

您可以使用.groupby和agg：

与编写df.groupby'color.sum不同，您应该使用df.groupby'color.agg对一列进行不同的处理

输出为：

        ID      height  weight
color           
blue    id_1    160     40
orange  id_4    60      35
red     id_2    150     60

与编写df.groupby'color.sum不同，您应该使用df.groupby'color.agg对一列进行不同的处理

输出为：

        ID      height  weight
color           
blue    id_1    160     40
orange  id_4    60      35
red     id_2    150     60