Python:agg函数中的case语句
我有如下sql语句:Python:agg函数中的case语句,python,numpy,pandas,dataframe,Python,Numpy,Pandas,Dataframe,我有如下sql语句: select id , avg(case when rate=1 then rate end) as "P_Rate" , stddev(case when rate=1 then rate end) as "std P_Rate", , avg(case when f_rate = 1 then f_rate else 0 end) as "A_Rate" , stddev(case when f_rate
select id
, avg(case when rate=1 then rate end) as "P_Rate"
, stddev(case when rate=1 then rate end) as "std P_Rate",
, avg(case when f_rate = 1 then f_rate else 0 end) as "A_Rate"
, stddev(case when f_rate = 1 then f_rate else 0 end) as "std A_Rate"
from (
select id, connected_date,payment_type,acc_type,
max(case when is s_rate > 1 then 1 else 0 end) / count(open) as rate
sum(case when is hire_days <= 5 and paid>1000 then 1 else 0 end )/count(open) as f_rate
from analysis_table where alloc_date <= '2016-01-01' group by 1,2
) a group by id
`def my_agg_function(hire_days,paid,open):
r_arr = []
if hire_days <= 5 and paid > 1000:
r_arr.append(1)
else:
r.append(0)
return np.max(r_arr)/len(????)
inner_table['f_rate'] = grouped.agg(lambda row: my_agg_function(row['hire_days'],row['paid'],row['open'])`
但我必须使用它来过滤每一列,并在其上使用max/sum
我试过这样的方法:
select id
, avg(case when rate=1 then rate end) as "P_Rate"
, stddev(case when rate=1 then rate end) as "std P_Rate",
, avg(case when f_rate = 1 then f_rate else 0 end) as "A_Rate"
, stddev(case when f_rate = 1 then f_rate else 0 end) as "std A_Rate"
from (
select id, connected_date,payment_type,acc_type,
max(case when is s_rate > 1 then 1 else 0 end) / count(open) as rate
sum(case when is hire_days <= 5 and paid>1000 then 1 else 0 end )/count(open) as f_rate
from analysis_table where alloc_date <= '2016-01-01' group by 1,2
) a group by id
`def my_agg_function(hire_days,paid,open):
r_arr = []
if hire_days <= 5 and paid > 1000:
r_arr.append(1)
else:
r.append(0)
return np.max(r_arr)/len(????)
inner_table['f_rate'] = grouped.agg(lambda row: my_agg_function(row['hire_days'],row['paid'],row['open'])`
`def my_agg__功能(租用天数、付费、开放):
r_arr=[]
如果租用1000天:
r_arr.append(1)
其他:
r、 追加(0)
返回np.max(r_arr)/len(??)
内部表格['f_rate']=grouped.agg(lambda行:my_agg_函数(行['hire\u days',行['paid',行['open'))`
与rate类似,您应该在问题中加入一些数据框,以便更容易回答 根据您的需要,您可能希望使用groupby数据帧的
agg
方法。假设您有以下数据帧:
connected_date id number_of_clicks time_spent
0 Mon matt 15 124
1 Tue john 13 986
2 Mon matt 48 451
3 Thu jack 68 234
4 Sun john 52 976
5 Sat sabrina 13 156
您希望获得用户每天花费的时间和单个会话中的最大点击次数之和。然后您可以通过以下方式使用groupby
:
df.groupby(['id','connected_date'],as_index = False).agg({'number_of_clicks':max,'time_spent':sum})
输出:
id connected_date time_spent number_of_clicks
0 jack Thu 234 68
1 john Sun 976 52
2 john Tue 986 13
3 matt Mon 575 48
4 sabrina Sat 156 13
请注意,为了输出的清晰性,我只将
传递为_index=False
。好的,让我们想象一下点击次数看起来像是(.023,1.2,0.4,2.1,1,2),而你想要计算和,但不是(.023+1,2等),但是如果_点击次数<1,那么0其他1,然后计算和(1+1+1..)然后在你的groupby之前做一些类似的事情:df['number\u of_clicks']=df['number\u of_clicks']>=1
。你将得到布尔值的序列(对于python也是0和1),groupby中的和将给出你想要的值。