Python 在groupby数据帧上聚合并输出到新数据帧的有效方法_Python_Pandas_Dataframe

Python 在groupby数据帧上聚合并输出到新数据帧的有效方法

python pandas dataframe

Python 在groupby数据帧上聚合并输出到新数据帧的有效方法,python,pandas,dataframe,Python,Pandas,Dataframe,我想看看是否有更好、更有效的方法：示例数据： df = pd.DataFrame ({'ID' : ['A','A','A','A','B','B','B','B'], 'Month' : [-4,-3,1,2,-3,-2,1,2], 'Cost' : [20,30,10,15,1,2,5,10] }) 然后，我使用groupbyID： df=df.groupby（ID）然后，我在for循环中使用条件0聚合成本，并将输出保存到新的数据帧： output = pd.DataFrame([]

我想看看是否有更好、更有效的方法：
示例数据：

df = pd.DataFrame ({'ID' : ['A','A','A','A','B','B','B','B'], 
'Month' : [-4,-3,1,2,-3,-2,1,2],
'Cost' : [20,30,10,15,1,2,5,10] })

然后，我使用groupbyID：

df=df.groupby（ID）

然后，我在for循环中使用条件0聚合成本，并将输出保存到新的数据帧：

output = pd.DataFrame([])
for group, data in df:
    totalPre = 0
    totalPost = 0
    for row_index, row in data.iterrows():
            if row ['Month'] < 0:
                totalPre = totalPre + row ['Cost']
            elif row['Month'] > 0:
                totalPost = totalPost + row ['Cost']
    output = output.append(pd.DataFrame({'ID': group, 'Total pre': totalPre,'Total post': totalPost }, index=[0]), ignore_index=True)

谢谢。

你可以用几种方法来做

一种方法是在

groupby

df1=df[df[“月”]0].groupby（“ID”）[“成本”].sum（）\
.reset_index（name=“Total_post”）
out=pd.merge（df1，df2，on=“ID”，how=“outer”）

另一种是根据

ID

和条件分组，然后使用

pd.pivot\u表

out=df.groupby（[“ID”，df[“Month”]一种方法是过滤掉Month==0
1，然后根据ID
和月份小于0的条件进行分组
output = df[df["Month"]!=0].groupby(["ID", df["Month"]<0])["Cost"].sum()\
    .unstack().reset_index().rename_axis(None, axis=1)
    .rename(columns={True: "Total pre", False: "Total post"})

print(output)
#  ID  Total post  Total pre
#0  A          25         50
#1  B          15          3

output=df[df[“Month”！=0]。groupby（[“ID”，df[“Month”]]使用mask
和groupby.sum
：
grp = df.mask(df['Month']>0).groupby('ID', as_index=False)['Cost'].sum().rename(columns={'Cost':'Total pre'})
grp['Total post'] = df.mask(df['Month']<0).groupby('ID')['Cost'].sum().to_numpy()


详细信息
mask
将符合条件（Month>0
）的行设置为NaN
，这样我们就可以groupby.sum
，只得到正确的行：
df.mask(df['Month']>0)

    ID  Month  Cost
0    A   -4.0  20.0
1    A   -3.0  30.0
2  NaN    NaN   NaN
3  NaN    NaN   NaN
4    B   -3.0   1.0
5    B   -2.0   2.0
6  NaN    NaN   NaN
7  NaN    NaN   NaN

我相信这是一个好的和简单的选择
df_1 = pd.DataFrame([])
df_1 = df_1.assign(totalPre=df[df['Month'] < 0].groupby('ID')['Cost'].sum(), 
                   totalPost= df[df['Month'] > 0].groupby('ID')['Cost'].sum())
print(df_1)

For循环很少是pandas中唯一也是最好的解决方案。我可能会为pre/post条件创建一个新列，然后按ID
和新列分组。为指定的列中的每个唯一值组合创建一个数据框，然后使用函数聚合值
import pandas as pd
import numpy as np

# sample DataFrame
df = pd.DataFrame ({'ID' : ['A','A','A','A','B','B','B','B'], 
'Month' : [-4,-3,1,2,-3,-2,1,2],
'Cost' : [20,30,10,15,1,2,5,10] })

# Create a new column `Timepoint` to group by
df['Timepoint'] = (df['Month'] <= 0).replace({True: 'pre', False: 'post'})
# Create a group for each unique combination of `ID` and `Timepoint` and aggregate the `Cost` using the function `sum`.
output = df.groupby(['ID', 'Timepoint'])['Cost'].sum()

df_1 = pd.DataFrame([])
df_1 = df_1.assign(totalPre=df[df['Month'] < 0].groupby('ID')['Cost'].sum(), 
                   totalPost= df[df['Month'] > 0].groupby('ID')['Cost'].sum())
print(df_1)

    totalPre  totalPost
ID
A         50         25
B          3         15

import pandas as pd
import numpy as np

# sample DataFrame
df = pd.DataFrame ({'ID' : ['A','A','A','A','B','B','B','B'], 
'Month' : [-4,-3,1,2,-3,-2,1,2],
'Cost' : [20,30,10,15,1,2,5,10] })

# Create a new column `Timepoint` to group by
df['Timepoint'] = (df['Month'] <= 0).replace({True: 'pre', False: 'post'})
# Create a group for each unique combination of `ID` and `Timepoint` and aggregate the `Cost` using the function `sum`.
output = df.groupby(['ID', 'Timepoint'])['Cost'].sum()

Timepoint  post  pre
ID                  
A            25   50
B            15    3