Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/327.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python:agg函数中的case语句_Python_Numpy_Pandas_Dataframe - Fatal编程技术网

Python:agg函数中的case语句

Python:agg函数中的case语句,python,numpy,pandas,dataframe,Python,Numpy,Pandas,Dataframe,我有如下sql语句: select id , avg(case when rate=1 then rate end) as "P_Rate" , stddev(case when rate=1 then rate end) as "std P_Rate", , avg(case when f_rate = 1 then f_rate else 0 end) as "A_Rate" , stddev(case when f_rate

我有如下sql语句:

select id
        , avg(case when rate=1 then rate end) as "P_Rate"
        , stddev(case when rate=1 then rate end) as "std P_Rate",
        , avg(case when f_rate = 1 then f_rate else 0 end) as "A_Rate"
        , stddev(case when f_rate = 1 then f_rate else 0 end) as "std A_Rate"
from (
 select id, connected_date,payment_type,acc_type,
  max(case when is s_rate > 1 then 1 else 0 end) / count(open) as rate
  sum(case when is hire_days <= 5 and paid>1000 then 1 else 0 end )/count(open) as f_rate
from analysis_table where alloc_date <= '2016-01-01' group by 1,2
) a group by id
`def my_agg_function(hire_days,paid,open):
     r_arr = []
     if hire_days <= 5 and paid > 1000:
        r_arr.append(1)
     else:
        r.append(0)
     return np.max(r_arr)/len(????)
inner_table['f_rate'] = grouped.agg(lambda row: my_agg_function(row['hire_days'],row['paid'],row['open'])`
但我必须使用它来过滤每一列,并在其上使用max/sum

我试过这样的方法:

select id
        , avg(case when rate=1 then rate end) as "P_Rate"
        , stddev(case when rate=1 then rate end) as "std P_Rate",
        , avg(case when f_rate = 1 then f_rate else 0 end) as "A_Rate"
        , stddev(case when f_rate = 1 then f_rate else 0 end) as "std A_Rate"
from (
 select id, connected_date,payment_type,acc_type,
  max(case when is s_rate > 1 then 1 else 0 end) / count(open) as rate
  sum(case when is hire_days <= 5 and paid>1000 then 1 else 0 end )/count(open) as f_rate
from analysis_table where alloc_date <= '2016-01-01' group by 1,2
) a group by id
`def my_agg_function(hire_days,paid,open):
     r_arr = []
     if hire_days <= 5 and paid > 1000:
        r_arr.append(1)
     else:
        r.append(0)
     return np.max(r_arr)/len(????)
inner_table['f_rate'] = grouped.agg(lambda row: my_agg_function(row['hire_days'],row['paid'],row['open'])`
`def my_agg__功能(租用天数、付费、开放):
r_arr=[]
如果租用1000天:
r_arr.append(1)
其他:
r、 追加(0)
返回np.max(r_arr)/len(??)
内部表格['f_rate']=grouped.agg(lambda行:my_agg_函数(行['hire\u days',行['paid',行['open'))`

与rate类似,您应该在问题中加入一些数据框,以便更容易回答

根据您的需要,您可能希望使用groupby数据帧的
agg
方法。假设您有以下数据帧:

    connected_date  id      number_of_clicks    time_spent
0   Mon             matt    15                  124
1   Tue             john    13                  986
2   Mon             matt    48                  451
3   Thu             jack    68                  234
4   Sun             john    52                  976
5   Sat             sabrina 13                  156
您希望获得用户每天花费的时间和单个会话中的最大点击次数之和。然后您可以通过以下方式使用
groupby

df.groupby(['id','connected_date'],as_index = False).agg({'number_of_clicks':max,'time_spent':sum})
输出:

    id      connected_date  time_spent  number_of_clicks
0   jack    Thu             234         68
1   john    Sun             976         52
2   john    Tue             986         13
3   matt    Mon             575         48
4   sabrina Sat             156         13

请注意,为了输出的清晰性,我只将
传递为_index=False

好的,让我们想象一下点击次数看起来像是(.023,1.2,0.4,2.1,1,2),而你想要计算和,但不是(.023+1,2等),但是如果_点击次数<1,那么0其他1,然后计算和(1+1+1..)然后在你的groupby之前做一些类似的事情:
df['number\u of_clicks']=df['number\u of_clicks']>=1
。你将得到布尔值的
序列(对于python也是0和1),groupby中的和将给出你想要的值。