Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/330.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 数据帧点击率计算_Python_Pandas_Group By_Dataframe_Time Series - Fatal编程技术网

Python 数据帧点击率计算

Python 数据帧点击率计算,python,pandas,group-by,dataframe,time-series,Python,Pandas,Group By,Dataframe,Time Series,我有一个数据框,它的形式是 variantid eventType date 2016-02-08 14:43:42 variant1 served 2016-02-08 14:43:46 variant1 served 2016-02-08 14:43:47 variant1 served 2016-02-08 14:43:51 variant1 served 2016-02-08 14:43:53 var

我有一个数据框,它的形式是

                           variantid eventType
date
2016-02-08 14:43:42  variant1    served
2016-02-08 14:43:46  variant1    served
2016-02-08 14:43:47  variant1    served
2016-02-08 14:43:51  variant1    served
2016-02-08 14:43:53  variant1    served
2016-02-08 14:43:54  variant1    served
2016-02-08 14:43:55  variant1    served
2016-02-08 14:43:55  variant2    served
2016-02-08 14:43:56  variant2    served
2016-02-08 14:43:56  variant1    served
我已经按日期给它编了索引。我现在想对
variantid
列中的每个唯一值进行点击率计算。我对熊猫很陌生,不知道如何做到这一点。如果我做以下操作

grouped_by_varid=df.groupby(by=[df.variantid,df.index.hour]).count()
我得到以下数据帧

                eventType
variantid
variant1    0           3
            1           3
            3           1
            4           1
            5           4
            6           3
            7           5
            8           9
            9           9
            10         12
            14       5846
            15      26712
            16      25614
            17      19579
            18      14328
            19       2984
            20         39
            21         32
            22         15
            23         12

variant2    0           3
            1           1
            2           4
            3           3
            4           8
            5          14
            6          24
            7          21
            8          27
            9           9
            10          9
            14       4947
            15      21299
            16      19475
            17      13292
            18       9398
            19       2172
            20         66
            21         64
            22         44
            23         12
我想生成一个数据框,计算并存储每小时(也是每分钟)每个变量的点击率,但我觉得这将是一个微小的变化

我还注意到,由于
eventType
列中的值是字符串,如果我进行求和,它会简单地连接这些值,因此如何使用这些字符串
eventType
s计算每个变量的聚合统计信息

任何帮助都将不胜感激。

IIUC您可以使用和聚合,最后:

IIUC您可以使用和聚合,并持续:

print df
                    variantid eventType
date                                   
2016-02-08 14:43:42  variant1    served
2016-02-08 14:43:46  variant1    served
2016-02-08 14:43:47  variant1    served
2016-02-08 14:43:51  variant1    served
2016-02-08 14:43:53  variant1    served
2016-02-08 14:43:54  variant1    served
2016-02-08 14:43:55  variant1    served
2016-02-08 14:43:55  variant2    served
2016-02-08 14:43:56  variant2    served
2016-02-08 14:43:56  variant1    served

print df.groupby(by=[df.variantid,df.index.hour])['eventType'].size()
                               .reset_index(name='count').rename(columns={'level_1':'hours'})
  variantid  hours  count
0  variant1     14      8
1  variant2     14      2

print df.groupby(by=[df.variantid,df.index.minute])['eventType'].size()
                             .reset_index(name='count').rename(columns={'level_1':'minutes'})
  variantid  minutes  count
0  variant1       43      8
1  variant2       43      2