Python 3.x 基于条件累积和的多列
我有一个Python 3.x 基于条件累积和的多列,python-3.x,pandas,cumulative-sum,Python 3.x,Pandas,Cumulative Sum,我有一个dataframe,它包含多个“堆栈”及其相应的“长度” 我试图为每个材质创建一个单独的列,跟踪长度的累积和,而不管它们是哪个“堆栈”。我尝试过使用groupby,但我只能将累计总和放入一列中。以下是我想要的: stack-1-material stack-2-material stack-1-length stack-2-length rock_cumsum paper_cumsum scissors_cumsum 0 rock
dataframe
,它包含多个“堆栈”及其相应的“长度”
我试图为每个材质创建一个单独的列,跟踪长度的累积和,而不管它们是哪个“堆栈”。我尝试过使用groupby
,但我只能将累计总和放入一列中。以下是我想要的:
stack-1-material stack-2-material stack-1-length stack-2-length rock_cumsum paper_cumsum scissors_cumsum
0 rock rock 3 3 6 0 0
1 paper paper 1 1 6 2 0
2 paper rock 1 3 9 3 0
3 scissors paper 2 1 9 4 2
4 rock scissors 3 2 12 4 4
首先,颠倒列名,这样我们就可以使用
wide\u to\u long
来重塑数据帧
然后取物料中的cumsum
,并确定每行每种物料的最大值。然后,我们可以重塑
此和ffill
,并将剩余的NaN
替换为0,并连接回原始
df.columns = ['-'.join(x[::-1]) for x in df.columns.str.rsplit('-', n=1)]
res = (pd.wide_to_long(df.reset_index(), stubnames=['material', 'length'],
i='index', j='whatever', suffix='.*')
.sort_index(level=0))
# material length
#index whatever
#0 -stack-1 rock 3
# -stack-2 rock 3
#1 -stack-1 paper 1
# -stack-2 paper 1
#2 -stack-1 paper 1
# -stack-2 rock 3
#3 -stack-1 scissors 2
# -stack-2 paper 1
#4 -stack-1 rock 3
# -stack-2 scissors 2
res['csum'] = res.groupby('material')['length'].cumsum()
res = (res.groupby(['index', 'material'])['csum'].max()
.unstack(-1).ffill().fillna(0, downcast='infer')
.add_suffix('_cumsum'))
df = pd.concat([df, res], axis=1)
您可以使用列材质作为列长度的遮罩,然后沿列使用
sum
,对于每种材质使用cumsum
#separate material and length
material = df.filter(like='material').to_numpy()
lentgh = df.filter(like='length')
# get all unique material
l_mat = np.unique(material)
# iterate over nique materials
for mat in l_mat:
df[f'{mat}_cumsum'] = lentgh.where(material==mat).sum(axis=1).cumsum()
print(df)
stack-1-material stack-2-material stack-1-length stack-2-length \
0 rock rock 3 3
1 paper paper 1 1
2 paper rock 1 3
3 scissors paper 2 1
4 rock scissors 3 2
rock_cumsum paper_cumsum scissors_cumsum
0 6.0 0.0 0.0
1 6.0 2.0 0.0
2 9.0 3.0 0.0
3 9.0 4.0 2.0
4 12.0 4.0 4.0
是的,谢谢你的接球!这很好用,谢谢!
material-stack-1 material-stack-2 length-stack-1 length-stack-2 paper_cumsum rock_cumsum scissors_cumsum
0 rock rock 3 3 0 6 0
1 paper paper 1 1 2 6 0
2 paper rock 1 3 3 9 0
3 scissors paper 2 1 4 9 2
4 rock scissors 3 2 4 12 4
#separate material and length
material = df.filter(like='material').to_numpy()
lentgh = df.filter(like='length')
# get all unique material
l_mat = np.unique(material)
# iterate over nique materials
for mat in l_mat:
df[f'{mat}_cumsum'] = lentgh.where(material==mat).sum(axis=1).cumsum()
print(df)
stack-1-material stack-2-material stack-1-length stack-2-length \
0 rock rock 3 3
1 paper paper 1 1
2 paper rock 1 3
3 scissors paper 2 1
4 rock scissors 3 2
rock_cumsum paper_cumsum scissors_cumsum
0 6.0 0.0 0.0
1 6.0 2.0 0.0
2 9.0 3.0 0.0
3 9.0 4.0 2.0
4 12.0 4.0 4.0