Python:agg函数中的case语句_Python_Numpy_Pandas_Dataframe

Python:agg函数中的case语句

python numpy pandas dataframe

Python:agg函数中的case语句,python,numpy,pandas,dataframe,Python,Numpy,Pandas,Dataframe,我有如下sql语句： select id , avg(case when rate=1 then rate end) as "P_Rate" , stddev(case when rate=1 then rate end) as "std P_Rate", , avg(case when f_rate = 1 then f_rate else 0 end) as "A_Rate" , stddev(case when f_rate

我有如下sql语句：

select id
        , avg(case when rate=1 then rate end) as "P_Rate"
        , stddev(case when rate=1 then rate end) as "std P_Rate",
        , avg(case when f_rate = 1 then f_rate else 0 end) as "A_Rate"
        , stddev(case when f_rate = 1 then f_rate else 0 end) as "std A_Rate"
from (
 select id, connected_date,payment_type,acc_type,
  max(case when is s_rate > 1 then 1 else 0 end) / count(open) as rate
  sum(case when is hire_days <= 5 and paid>1000 then 1 else 0 end )/count(open) as f_rate
from analysis_table where alloc_date <= '2016-01-01' group by 1,2
) a group by id

`def my_agg_function(hire_days,paid,open):
     r_arr = []
     if hire_days <= 5 and paid > 1000:
        r_arr.append(1)
     else:
        r.append(0)
     return np.max(r_arr)/len(????)
inner_table['f_rate'] = grouped.agg(lambda row: my_agg_function(row['hire_days'],row['paid'],row['open'])`

但我必须使用它来过滤每一列，并在其上使用max/sum

我试过这样的方法：

select id
        , avg(case when rate=1 then rate end) as "P_Rate"
        , stddev(case when rate=1 then rate end) as "std P_Rate",
        , avg(case when f_rate = 1 then f_rate else 0 end) as "A_Rate"
        , stddev(case when f_rate = 1 then f_rate else 0 end) as "std A_Rate"
from (
 select id, connected_date,payment_type,acc_type,
  max(case when is s_rate > 1 then 1 else 0 end) / count(open) as rate
  sum(case when is hire_days <= 5 and paid>1000 then 1 else 0 end )/count(open) as f_rate
from analysis_table where alloc_date <= '2016-01-01' group by 1,2
) a group by id

`def my_agg_function(hire_days,paid,open):
     r_arr = []
     if hire_days <= 5 and paid > 1000:
        r_arr.append(1)
     else:
        r.append(0)
     return np.max(r_arr)/len(????)
inner_table['f_rate'] = grouped.agg(lambda row: my_agg_function(row['hire_days'],row['paid'],row['open'])`

`def my_agg__功能（租用天数、付费、开放）：
r_arr=[]
如果租用1000天：
r_arr.append（1）
其他：
r、 追加（0）
返回np.max（r_arr）/len（？？）
内部表格['f_rate']=grouped.agg（lambda行：my_agg_函数（行['hire\u days'，行['paid'，行['open'））`

与rate类似，您应该在问题中加入一些数据框，以便更容易回答

根据您的需要，您可能希望使用groupby数据帧的

agg

方法。假设您有以下数据帧：

    connected_date  id      number_of_clicks    time_spent
0   Mon             matt    15                  124
1   Tue             john    13                  986
2   Mon             matt    48                  451
3   Thu             jack    68                  234
4   Sun             john    52                  976
5   Sat             sabrina 13                  156

您希望获得用户每天花费的时间和单个会话中的最大点击次数之和。然后您可以通过以下方式使用

groupby

：

df.groupby(['id','connected_date'],as_index = False).agg({'number_of_clicks':max,'time_spent':sum})

输出：

    id      connected_date  time_spent  number_of_clicks
0   jack    Thu             234         68
1   john    Sun             976         52
2   john    Tue             986         13
3   matt    Mon             575         48
4   sabrina Sat             156         13

请注意，为了输出的清晰性，我只将

传递为_index=False

。

好的，让我们想象一下点击次数看起来像是（.023,1.2,0.4,2.1,1,2），而你想要计算和，但不是（.023+1,2等），但是如果_点击次数<1，那么0其他1，然后计算和（1+1+1..）然后在你的groupby之前做一些类似的事情：

df['number\u of_clicks']=df['number\u of_clicks']>=1

。你将得到布尔值的

序列（对于python也是0和1），groupby中的和将给出你想要的值。