Python 3.x 基于条件累积和的多列

Python 3.x 基于条件累积和的多列,python-3.x,pandas,cumulative-sum,Python 3.x,Pandas,Cumulative Sum,我有一个dataframe,它包含多个“堆栈”及其相应的“长度” 我试图为每个材质创建一个单独的列,跟踪长度的累积和,而不管它们是哪个“堆栈”。我尝试过使用groupby,但我只能将累计总和放入一列中。以下是我想要的: stack-1-material stack-2-material stack-1-length stack-2-length rock_cumsum paper_cumsum scissors_cumsum 0 rock

我有一个
dataframe
,它包含多个“堆栈”及其相应的“长度”

我试图为每个材质创建一个单独的列,跟踪长度的累积和,而不管它们是哪个“堆栈”。我尝试过使用
groupby
,但我只能将累计总和放入一列中。以下是我想要的:

  stack-1-material stack-2-material  stack-1-length  stack-2-length  rock_cumsum  paper_cumsum  scissors_cumsum
0             rock             rock               3               3            6             0                0
1            paper            paper               1               1            6             2                0
2            paper             rock               1               3            9             3                0
3         scissors            paper               2               1            9             4                2
4             rock         scissors               3               2           12             4                4 

首先,颠倒列名,这样我们就可以使用
wide\u to\u long
来重塑数据帧

然后取物料中的
cumsum
,并确定每行每种物料的最大值。然后,我们可以
重塑
此和
ffill
,并将剩余的
NaN
替换为0,并连接回原始

df.columns = ['-'.join(x[::-1]) for x in df.columns.str.rsplit('-', n=1)]

res = (pd.wide_to_long(df.reset_index(), stubnames=['material', 'length'], 
                       i='index', j='whatever', suffix='.*')
         .sort_index(level=0))

#                material  length
#index whatever                  
#0     -stack-1      rock       3
#      -stack-2      rock       3
#1     -stack-1     paper       1
#      -stack-2     paper       1
#2     -stack-1     paper       1
#      -stack-2      rock       3
#3     -stack-1  scissors       2
#      -stack-2     paper       1
#4     -stack-1      rock       3
#      -stack-2  scissors       2

res['csum'] = res.groupby('material')['length'].cumsum()
res = (res.groupby(['index', 'material'])['csum'].max()
          .unstack(-1).ffill().fillna(0, downcast='infer')
          .add_suffix('_cumsum'))

df = pd.concat([df, res], axis=1)


您可以使用列材质作为列长度的遮罩,然后沿列使用
sum
,对于每种材质使用
cumsum

#separate material and length
material = df.filter(like='material').to_numpy()
lentgh = df.filter(like='length')

# get all unique material
l_mat = np.unique(material)

# iterate over nique materials
for mat in l_mat:
    df[f'{mat}_cumsum'] = lentgh.where(material==mat).sum(axis=1).cumsum()

print(df)
  stack-1-material stack-2-material  stack-1-length  stack-2-length  \
0             rock             rock               3               3   
1            paper            paper               1               1   
2            paper             rock               1               3   
3         scissors            paper               2               1   
4             rock         scissors               3               2   

   rock_cumsum  paper_cumsum  scissors_cumsum  
0          6.0           0.0              0.0  
1          6.0           2.0              0.0  
2          9.0           3.0              0.0  
3          9.0           4.0              2.0  
4         12.0           4.0              4.0  

是的,谢谢你的接球!这很好用,谢谢!
  material-stack-1 material-stack-2  length-stack-1  length-stack-2  paper_cumsum  rock_cumsum  scissors_cumsum
0             rock             rock               3               3             0            6                0
1            paper            paper               1               1             2            6                0
2            paper             rock               1               3             3            9                0
3         scissors            paper               2               1             4            9                2
4             rock         scissors               3               2             4           12                4
#separate material and length
material = df.filter(like='material').to_numpy()
lentgh = df.filter(like='length')

# get all unique material
l_mat = np.unique(material)

# iterate over nique materials
for mat in l_mat:
    df[f'{mat}_cumsum'] = lentgh.where(material==mat).sum(axis=1).cumsum()

print(df)
  stack-1-material stack-2-material  stack-1-length  stack-2-length  \
0             rock             rock               3               3   
1            paper            paper               1               1   
2            paper             rock               1               3   
3         scissors            paper               2               1   
4             rock         scissors               3               2   

   rock_cumsum  paper_cumsum  scissors_cumsum  
0          6.0           0.0              0.0  
1          6.0           2.0              0.0  
2          9.0           3.0              0.0  
3          9.0           4.0              2.0  
4         12.0           4.0              4.0