Python 带窗函数的累积和
我正在使用seaborn的数据集Python 带窗函数的累积和,python,pandas,pandas-groupby,window-functions,cumulative-sum,Python,Pandas,Pandas Groupby,Window Functions,Cumulative Sum,我正在使用seaborn的数据集tips: import pandas as pd import seaborn as sns tips = sns.load_dataset("tips") tips['rowid'] = tips.index 我想创建一个专栏,将累计计算给小费超过3的人,包括男性和晚餐。计数不应包括当前行(在下面的查询中,前面的cf1) SQL等价物是: SELECT *, SUM(CASE WHEN tip >= 3 AND sex='male' AND
tips
:
import pandas as pd
import seaborn as sns
tips = sns.load_dataset("tips")
tips['rowid'] = tips.index
我想创建一个专栏,将累计计算给小费超过3的人,包括男性和晚餐。计数不应包括当前行(在下面的查询中,前面的cf1
)
SQL等价物是:
SELECT *,
SUM(CASE WHEN tip >= 3 AND sex='male' AND time='Dinner' THEN 1 ELSE NULL END)
OVER (PARTITION BY sex, time ORDER BY rowid ROWS BETWEEN unbounded PRECEDING AND 1 PRECEDING) as cnt
FROM tips
ORDER BY rowid ;
我怎样才能在熊猫身上达到同样的效果?从我所读到的,我可能会使用一些滚动和变换函数,但我没有成功
最终数据帧应包括以下内容:
编辑:ansev请求的数据帧切片
total_bill tip sex smoker day time size rowid cnt
index
0 16.99 1.01 Female No Sun Dinner 2 0 NaN
1 10.34 1.66 Male No Sun Dinner 3 1 NaN
2 21.01 3.50 Male No Sun Dinner 3 2 NaN
3 23.68 3.31 Male No Sun Dinner 2 3 1.0
4 24.59 3.61 Female No Sun Dinner 4 4 NaN
5 25.29 4.71 Male No Sun Dinner 4 5 2.0
6 8.77 2.00 Male No Sun Dinner 2 6 3.0
7 26.88 3.12 Male No Sun Dinner 4 7 3.0
8 15.04 1.96 Male No Sun Dinner 2 8 4.0
9 14.78 3.23 Male No Sun Dinner 2 9 4.0
10 10.27 1.71 Male No Sun Dinner 2 10 5.0
11 35.26 5.00 Female No Sun Dinner 4 11 NaN
12 15.42 1.57 Male No Sun Dinner 2 12 5.0
13 18.43 3.00 Male No Sun Dinner 4 13 5.0
14 14.83 3.02 Female No Sun Dinner 2 14 NaN
15 21.58 3.92 Male No Sun Dinner 2 15 6.0
16 10.33 1.67 Female No Sun Dinner 3 16 NaN
17 16.29 3.71 Male No Sun Dinner 3 17 7.0
18 16.97 3.50 Female No Sun Dinner 3 18 NaN
19 20.65 3.35 Male No Sat Dinner 3 19 8.0
20 17.92 4.08 Male No Sat Dinner 2 20 9.0
21 20.29 2.75 Female No Sat Dinner 2 21 NaN
22 15.77 2.23 Female No Sat Dinner 2 22 NaN
23 39.42 7.58 Male No Sat Dinner 4 23 10.0
24 19.82 3.18 Male No Sat Dinner 2 24 11.0
25 17.81 2.34 Male No Sat Dinner 4 25 12.0
26 13.37 2.00 Male No Sat Dinner 2 26 12.0
27 12.69 2.00 Male No Sat Dinner 2 27 12.0
28 21.70 4.30 Male No Sat Dinner 2 28 12.0
29 19.65 3.00 Female No Sat Dinner 2 29 NaN
我想你需要
df['cnt'] = ( df.loc[df['sex'].eq('Male') & df['time'].eq('Dinner'),'tip']
.ge(3)
.cumsum()
.shift() )
# if not ordered
#df['cnt'] = ( df.sort_values('rowid')
# .loc[df['sex'].eq('Male') & df['time'].eq('Dinner'),'tip']
# .ge(3)
# .cumsum()
# .shift() )
更新
df['cnt']=( df.loc[df['sex'].eq('Male') & df['time'].eq('Dinner'),'tip']
.ge(3)
.cumsum()
.shift()
.where(lambda x: x.gt(0))
)
# total_bill tip sex smoker day time size rowid cnt
#index
#0 16.99 1.01 Female No Sun Dinner 2 0 NaN
#1 10.34 1.66 Male No Sun Dinner 3 1 NaN
#2 21.01 3.50 Male No Sun Dinner 3 2 NaN
#3 23.68 3.31 Male No Sun Dinner 2 3 1.0
#4 24.59 3.61 Female No Sun Dinner 4 4 NaN
#5 25.29 4.71 Male No Sun Dinner 4 5 2.0
#6 8.77 2.00 Male No Sun Dinner 2 6 3.0
#7 26.88 3.12 Male No Sun Dinner 4 7 3.0
#8 15.04 1.96 Male No Sun Dinner 2 8 4.0
#9 14.78 3.23 Male No Sun Dinner 2 9 4.0
#10 10.27 1.71 Male No Sun Dinner 2 10 5.0
#11 35.26 5.00 Female No Sun Dinner 4 11 NaN
#12 15.42 1.57 Male No Sun Dinner 2 12 5.0
#13 18.43 3.00 Male No Sun Dinner 4 13 5.0
#14 14.83 3.02 Female No Sun Dinner 2 14 NaN
#15 21.58 3.92 Male No Sun Dinner 2 15 6.0
#16 10.33 1.67 Female No Sun Dinner 3 16 NaN
#17 16.29 3.71 Male No Sun Dinner 3 17 7.0
#18 16.97 3.50 Female No Sun Dinner 3 18 NaN
#19 20.65 3.35 Male No Sat Dinner 3 19 8.0
#20 17.92 4.08 Male No Sat Dinner 2 20 9.0
#21 20.29 2.75 Female No Sat Dinner 2 21 NaN
#22 15.77 2.23 Female No Sat Dinner 2 22 NaN
#23 39.42 7.58 Male No Sat Dinner 4 23 10.0
#24 19.82 3.18 Male No Sat Dinner 2 24 11.0
#25 17.81 2.34 Male No Sat Dinner 4 25 12.0
#26 13.37 2.00 Male No Sat Dinner 2 26 12.0
#27 12.69 2.00 Male No Sat Dinner 2 27 12.0
#28 21.70 4.30 Male No Sat Dinner 2 28 12.0
#29 19.65 3.00 Female No Sat Dinner 2 29 NaN
你能复制并粘贴数据框吗?我可以使用
pd.read_clipboard()
检查我的答案并帮助您:)数据集包含244行,我编写了从seaborn库加载它的代码:)。你的答案似乎很好,我正在核对;)@ansev dataframe已添加:)谢谢,是否要删除首字母0?如果可能,请删除!