Python 在groupby下使用函数

Python 在groupby下使用函数,python,pandas,Python,Pandas,我有一个专栏“dateTime”,我正在努力实现以下目标(没有gropuby也行): 但问题是,在我使用以下工具整理数据之后: groups = df.groupby(groupbytime,as_index=True) df_grouped = (groups.agg({ 'clients1': [np.mean,np.max,], 'clients2': [np.mean,np.max,],

我有一个专栏“dateTime”,我正在努力实现以下目标(没有gropuby也行):

但问题是,在我使用以下工具整理数据之后:

    groups = df.groupby(groupbytime,as_index=True) 
    df_grouped = (groups.agg({
                'clients1': [np.mean,np.max,],
                'clients2': [np.mean,np.max,],
                }))
我丢失了日期时间,因此我尝试将其添加回,并添加了以下内容:

 groups = df.groupby(groupbytime,as_index=True) 
  df_grouped = (groups.agg({
                'dateTime':['first'],
                 'clients1': [np.mean,np.max,],
                 'clients2': [np.mean,np.max,],
                 }))
这将给我日期时间的类型

dateTime           first               datetime64[ns]
我试图在groupby中获得四舍五入的时间和日期作为分开的coulmns。 谢谢

编辑并添加示例数据: 原始数据:

    dateTime    Clients1    Clients2
8   2017-10-23 08:00:04.854309  12991.5 2
10  2017-10-23 08:00:04.875162  12991.5 1
11  2017-10-23 08:00:04.875162  12991.5 1
12  2017-10-23 08:00:04.875162  12991.5 1
13  2017-10-23 08:00:04.875162  12991.5 1
23  2017-10-23 08:00:04.876464  12989.5 1
24  2017-10-23 08:00:04.876464  12989.5 1
32  2017-10-23 08:00:04.964356  12990   1
34  2017-10-23 08:00:04.968549  12990.5 1
38  2017-10-23 08:00:05.008758  12990   1
43  2017-10-23 08:00:05.996090  12990   2
45  2017-10-23 08:00:06.018212  12990   1
51  2017-10-23 08:00:06.344568  12989.5 1
56  2017-10-23 08:00:06.903661  12990   1
60  2017-10-23 08:00:07.120324  12990   1
66  2017-10-23 08:00:07.206179  12990.5 1
74  2017-10-23 08:00:07.358889  12991.5 3
77  2017-10-23 08:00:07.491244  12991   1
80  2017-10-23 08:00:07.671106  12991   1
83  2017-10-23 08:00:07.897968  12991   1
87  2017-10-23 08:00:08.028444  12991   1
95  2017-10-23 08:00:09.787827  12991.5 3
98  2017-10-23 08:00:10.178936  12991.5 3
104 2017-10-23 08:00:10.505921  12991.5 2
110 2017-10-23 08:00:11.438628  12992   1
112 2017-10-23 08:00:12.145907  12992   1
令人振奋的结果是:

    dateTime    Clients1    Clients1    Clients2    Clients2
    first   mean    amax    mean    amax
1min                    
2017-10-23 08:00:00 2017-10-23 08:00:04.854309  12988.8902439024    12993.5 227 12987.7398373984
2017-10-23 08:01:00 2017-10-23 08:01:00.005942  12986.92    12988.5 84  12986.28
2017-10-23 08:02:00 2017-10-23 08:02:00.901496  12987.6486486486    12988.5 98  12987
2017-10-23 08:03:00 2017-10-23 08:03:00.521976  12986.8148148148    12987.5 65  12986.1296296296
2017-10-23 08:04:00 2017-10-23 08:04:02.800922  12986.4705882353    12986.5 47  12985.5294117647
2017-10-23 08:05:00 2017-10-23 08:05:00.670865  12985.3658536585    12986   88  12984.7804878049
2017-10-23 08:06:00 2017-10-23 08:06:00.141393  12987.359375    12988   103 12986.734375
2017-10-23 08:07:00 2017-10-23 08:07:00.922107  12987.5454545455    12988   34  12986.7727272727
2017-10-23 08:08:00 2017-10-23 08:08:00.165103  12986.8214285714    12988   46  12986.0714285714
2017-10-23 08:09:00 2017-10-23 08:09:01.910121  12988.96875 12990   145 12988.328125
2017-10-23 08:10:00 2017-10-23 08:10:00.008064  12988.2678571429    12989.5 102 12987.6785714286
2017-10-23 08:11:00 2017-10-23 08:11:05.533862  12989.4318181818    12991   71  12988.8636363636
2017-10-23 08:12:00 2017-10-23 08:12:01.124564  12991.0444444444    12992.5 144 12990.4444444444
2017-10-23 08:13:00 2017-10-23 08:13:00.347987  12992.84375 12995   185 12992.0390625
2017-10-23 08:14:00 2017-10-23 08:14:00.627402  12994.2906976744    12996   216 12993.6395348837
2017-10-23 08:15:00 2017-10-23 08:15:00.032132  12994.8859649123    12996.5 211 12994.298245614

一种可能的解决方案是
agg
之后的
floor

df_grouped[('time_of_day_10', 'first')] = df_grouped[('dateTime', 'first')].dt.floor('10min')
df_grouped[('time_of_day_30', 'first')] = df_grouped[('dateTime', 'first')].dt.floor('30min')
编辑:如果需要每个组的最大日期,请使用带有日期的自定义函数:

groups = df.groupby('dateTime',as_index=True) 
df_grouped = (groups.agg({
                'dateTime':[lambda x: x.dt.date.max()],
                 'Clients1': [np.mean,np.max,],
                 'Clients2': [np.mean,np.max,],
                 }))

print (df_grouped.dtypes)
Clients1  mean        float64
          amax        float64
dateTime  <lambda>     object <-pure python date is object
Clients2  mean          int64
          amax          int64
dtype: object

打印(df\u分组)
客户端1日期时间客户端2
平均amax平均amax
日期时间
2017-10-23 08:00:04.854309  12991.5  12991.5  2017-10-23        2    2
2017-10-23 08:00:04.875162  12991.5  12991.5  2017-10-23        1    1
2017-10-23 08:00:04.876464  12989.5  12989.5  2017-10-23        1    1
2017-10-23 08:00:04.964356  12990.0  12990.0  2017-10-23        1    1
2017-10-23 08:00:04.968549  12990.5  12990.5  2017-10-23        1    1
2017-10-23 08:00:05.008758  12990.0  12990.0  2017-10-23        1    1
2017-10-23 08:00:05.996090  12990.0  12990.0  2017-10-23        2    2
2017-10-23 08:00:06.018212  12990.0  12990.0  2017-10-23        1    1
2017-10-23 08:00:06.344568  12989.5  12989.5  2017-10-23        1    1
2017-10-23 08:00:06.903661  12990.0  12990.0  2017-10-23        1    1
2017-10-23 08:00:07.120324  12990.0  12990.0  2017-10-23        1    1
2017-10-23 08:00:07.206179  12990.5  12990.5  2017-10-23        1    1
2017-10-23 08:00:07.358889  12991.5  12991.5  2017-10-23        3    3
2017-10-23 08:00:07.491244  12991.0  12991.0  2017-10-23        1    1
2017-10-23 08:00:07.671106  12991.0  12991.0  2017-10-23        1    1
2017-10-23 08:00:07.897968  12991.0  12991.0  2017-10-23        1    1
2017-10-23 08:00:08.028444  12991.0  12991.0  2017-10-23        1    1
2017-10-23 08:00:09.787827  12991.5  12991.5  2017-10-23        3    3
2017-10-23 08:00:10.178936  12991.5  12991.5  2017-10-23        3    3
2017-10-23 08:00:10.505921  12991.5  12991.5  2017-10-23        2    2
2017-10-23 08:00:11.438628  12992.0  12992.0  2017-10-23        1    1
2017-10-23 08:00:12.145907  12992.0  12992.0  2017-10-23        1    1

是否可以添加一些示例数据?请尝试将df.head()添加到dict()并粘贴结果。这有助于我们轻松查看数据并提供工作解决方案。{(“dateTime”,“first”):{Timestamp(“2017-10-23 08:00:00”):Timestamp(“2017-10-23 08:00:04.854309”)、Timestamp(“2017-10-23 08:01:00”):Timestamp(“2017-10-23 08:01:00.005942”)、Timestamp(“2017-10-23 08:02:00”):@Giladbi-在
agg
之后是否可以使用
floor
?在
groupby
中使用
floor
的原因是什么?@Giladbi-也
agg
函数聚合值,因此可以返回聚合值。它要求您可以使用floor,但也可以使用一些聚合dunction,如
max
min
等teTime':[lambda x:x.dt.floor('10min').max(),
groups = df.groupby('dateTime',as_index=True) 
df_grouped = (groups.agg({
                'dateTime':[lambda x: x.dt.date.max()],
                 'Clients1': [np.mean,np.max,],
                 'Clients2': [np.mean,np.max,],
                 }))

print (df_grouped.dtypes)
Clients1  mean        float64
          amax        float64
dateTime  <lambda>     object <-pure python date is object
Clients2  mean          int64
          amax          int64
dtype: object
df_grouped = (groups.agg({
                'dateTime':[lambda x: x.dt.floor('d').max()],
                 'Clients1': [np.mean,np.max,],
                 'Clients2': [np.mean,np.max,],
                 }))

print (df_grouped.dtypes)
Clients1  mean               float64
          amax               float64
dateTime  <lambda>    datetime64[ns] <- floor return pandas datetime
Clients2  mean                 int64
          amax                 int64
dtype: object
print (df_grouped)
                           Clients1             dateTime Clients2     
                               mean     amax    <lambda>     mean amax
dateTime                                                              
2017-10-23 08:00:04.854309  12991.5  12991.5  2017-10-23        2    2
2017-10-23 08:00:04.875162  12991.5  12991.5  2017-10-23        1    1
2017-10-23 08:00:04.876464  12989.5  12989.5  2017-10-23        1    1
2017-10-23 08:00:04.964356  12990.0  12990.0  2017-10-23        1    1
2017-10-23 08:00:04.968549  12990.5  12990.5  2017-10-23        1    1
2017-10-23 08:00:05.008758  12990.0  12990.0  2017-10-23        1    1
2017-10-23 08:00:05.996090  12990.0  12990.0  2017-10-23        2    2
2017-10-23 08:00:06.018212  12990.0  12990.0  2017-10-23        1    1
2017-10-23 08:00:06.344568  12989.5  12989.5  2017-10-23        1    1
2017-10-23 08:00:06.903661  12990.0  12990.0  2017-10-23        1    1
2017-10-23 08:00:07.120324  12990.0  12990.0  2017-10-23        1    1
2017-10-23 08:00:07.206179  12990.5  12990.5  2017-10-23        1    1
2017-10-23 08:00:07.358889  12991.5  12991.5  2017-10-23        3    3
2017-10-23 08:00:07.491244  12991.0  12991.0  2017-10-23        1    1
2017-10-23 08:00:07.671106  12991.0  12991.0  2017-10-23        1    1
2017-10-23 08:00:07.897968  12991.0  12991.0  2017-10-23        1    1
2017-10-23 08:00:08.028444  12991.0  12991.0  2017-10-23        1    1
2017-10-23 08:00:09.787827  12991.5  12991.5  2017-10-23        3    3
2017-10-23 08:00:10.178936  12991.5  12991.5  2017-10-23        3    3
2017-10-23 08:00:10.505921  12991.5  12991.5  2017-10-23        2    2
2017-10-23 08:00:11.438628  12992.0  12992.0  2017-10-23        1    1
2017-10-23 08:00:12.145907  12992.0  12992.0  2017-10-23        1    1