Python Pandas-基于时间频率计算日志数

Python Pandas-基于时间频率计算日志数,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,我要在熊猫身上做一个复杂的分析,这是一个基本的挑战。 我需要根据时间频率以分钟为单位计算日志数量 我有下面的数据框和日志。我定义的频率是00:05:00分钟 ' user_id data time_log_in_hours user1 24/03/2020 00:01:00 user1 24/03/2020 00:07:00 user1 24/03/2020 00:11:00 user2 24/03/2020 00:25:00 user2 24/03/2020 0

我要在熊猫身上做一个复杂的分析,这是一个基本的挑战。 我需要根据时间频率以分钟为单位计算日志数量

我有下面的数据框和日志。我定义的频率是00:05:00分钟

'

user_id data       time_log_in_hours
user1  24/03/2020  00:01:00
user1  24/03/2020  00:07:00
user1  24/03/2020  00:11:00
user2  24/03/2020  00:25:00
user2  24/03/2020  00:27:00
user2  24/03/2020  00:27:00
user3  25/03/2020  01:36:00
user3  25/03/2020  01:37:00
user3  25/03/2020  01:38:00
User   date       00:05:00 00:10:00 00:15:00 00:25:00 00:30:00...01:35:00 01:40:00...
user1  24/03/2020 1        1        1        0        0       ...0        0...
user2  24/03/2020 0        0        0        1        2       ...0        0...
user3  25/03/2020 0        0        0        0        0       ...0        3...
'

user_id data       time_log_in_hours
user1  24/03/2020  00:01:00
user1  24/03/2020  00:07:00
user1  24/03/2020  00:11:00
user2  24/03/2020  00:25:00
user2  24/03/2020  00:27:00
user2  24/03/2020  00:27:00
user3  25/03/2020  01:36:00
user3  25/03/2020  01:37:00
user3  25/03/2020  01:38:00
User   date       00:05:00 00:10:00 00:15:00 00:25:00 00:30:00...01:35:00 01:40:00...
user1  24/03/2020 1        1        1        0        0       ...0        0...
user2  24/03/2020 0        0        0        1        2       ...0        0...
user3  25/03/2020 0        0        0        0        0       ...0        3...
预期结果是下面的dataframe,它应该按确定的频率统计日志数量。 我会考虑时间的愤怒之间的时间分开5分钟。 在这种情况下,24小时内可用的所有rage都需要在标题中分隔,以便在5分钟内定义范围

'

user_id data       time_log_in_hours
user1  24/03/2020  00:01:00
user1  24/03/2020  00:07:00
user1  24/03/2020  00:11:00
user2  24/03/2020  00:25:00
user2  24/03/2020  00:27:00
user2  24/03/2020  00:27:00
user3  25/03/2020  01:36:00
user3  25/03/2020  01:37:00
user3  25/03/2020  01:38:00
User   date       00:05:00 00:10:00 00:15:00 00:25:00 00:30:00...01:35:00 01:40:00...
user1  24/03/2020 1        1        1        0        0       ...0        0...
user2  24/03/2020 0        0        0        1        2       ...0        0...
user3  25/03/2020 0        0        0        0        0       ...0        3...
'

user_id data       time_log_in_hours
user1  24/03/2020  00:01:00
user1  24/03/2020  00:07:00
user1  24/03/2020  00:11:00
user2  24/03/2020  00:25:00
user2  24/03/2020  00:27:00
user2  24/03/2020  00:27:00
user3  25/03/2020  01:36:00
user3  25/03/2020  01:37:00
user3  25/03/2020  01:38:00
User   date       00:05:00 00:10:00 00:15:00 00:25:00 00:30:00...01:35:00 01:40:00...
user1  24/03/2020 1        1        1        0        0       ...0        0...
user2  24/03/2020 0        0        0        1        2       ...0        0...
user3  25/03/2020 0        0        0        0        0       ...0        3...
有可能在熊猫身上建造吗?

让我们试试

s=df.groupby([df['user_id'],df['data'],df['time_log_in_hours'].dt.ceil('5 min')]).size().unstack(fill_value=0).reset_index()
time_log_in_hours user_id        data  ...  0 days 00:30:00  0 days 01:40:00
0                   user1  24/03/2020  ...                0                0
1                   user2  24/03/2020  ...                2                0
2                   user3  25/03/2020  ...                0                3
[3 rows x 8 columns]
让我们试试

s=df.groupby([df['user_id'],df['data'],df['time_log_in_hours'].dt.ceil('5 min')]).size().unstack(fill_value=0).reset_index()
time_log_in_hours user_id        data  ...  0 days 00:30:00  0 days 01:40:00
0                   user1  24/03/2020  ...                0                0
1                   user2  24/03/2020  ...                2                0
2                   user3  25/03/2020  ...                0                3
[3 rows x 8 columns]