如何在python中转换日期的时间和计算天数?

如何在python中转换日期的时间和计算天数?,python,pandas,numpy,dataframe,pandas-groupby,Python,Pandas,Numpy,Dataframe,Pandas Groupby,我有一个关于在线课程用户的数据集。它具有“id”、“事件”、“时间”等功能。我对他们进行分组,并想知道用户在特定日期进行每项活动的频率。我想用几天数一数 lt = log_train.groupby(['enrollment_id','event','time']).size() print(lt) enrollment_id event time 1 access 2014-06-14T09:38:39 2

我有一个关于在线课程用户的数据集。它具有“id”、“事件”、“时间”等功能。我对他们进行分组,并想知道用户在特定日期进行每项活动的频率。我想用几天数一数

lt = log_train.groupby(['enrollment_id','event','time']).size()
print(lt)


enrollment_id  event       time
1              access  2014-06-14T09:38:39    2
                       2014-06-14T09:38:48    1
                       2014-06-19T06:21:16    2
                       2014-06-19T06:21:32    1
                       2014-06-19T06:21:45    1
                                           ..
200887     navigate    2014-07-24T03:27:16    1
200887     navigate    2014-07-24T03:27:16    1
           page_close  2014-07-24T04:19:55    1
           video       2014-07-24T04:19:57    1
200888     access      2014-07-24T03:48:14    2
           discussion  2014-07-24T03:47:57    1
           navigate    2014-07-24T03:47:17    1
                       2014-07-24T03:47:28    1
                       2014-07-24T03:48:01    1
从我在另一个数据集中看到的信息来看,有userid、courseid和courserange-time

usercourse = pd.merge(enroll,date,how="left", on= 'course_id' )



 enrollment_id                          username  \

0                   1  9Uee7oEuuMmgPx2IzPfFkWgkHZyPbWr0   
1                   3  1qXC7Fjbwp66GPQc6pHLfEuO8WKozxG4   
2                   4  FIHlppZyoq8muPbdVxS44gfvceX9zvU7 

                           course_id        from          to  
0       DPnLzkJJqOOPRJfBxIHbQEERiYHu5ila  2014-06-12  2014-07-11  
1       7GRhBDsirIGkRZBtSMEzNTyDr2JQm4xx  2014-06-19  2014-07-18  
2       DPnLzkJJqOOPRJfBxIHbQEERiYHu5ila  2014-06-12  2014-07-11 
每个用户只有一个课程,所有课程的30天范围相同。所以我想要的应该是这样的

enrollment_id  event      #ofDays   #ofActionTimes
1              access      2         2
                           10        6
                           30        2
                                   ..
200887         navigate    23        1
               page_close  30        1
               video       1         1
200888         access      12        2
               discussion  2         1
               navigate    5         3
                           29        4  

**#ofDays means at the Nth day of a course.
#ofActionTimes means how often an event happens on the Nth day.**
由于每门课程都从不同的日期开始,我不知道如何在python上生成此数据表单。

希望有人能帮我解决这个问题

IIUC,您可以使用
merge
groupby
count
来获取所需内容

首先,一些示例数据。这是基于您提供的数据,但我已对其进行了修改,以便可以从起始数据清楚地跟踪输出

data1 = {"enrollment_id":[1,1,1,1,2,2,3,3,3],
         "event":["access","access","access","navigate","access",
                  "page_close","navigate","navigate","video"],
         "time":["2014-06-14T09:38:39", "2014-06-14T09:38:48",
                 "2014-06-19T06:21:16", "2014-06-19T06:21:32", 
                 "2014-06-21T06:21:45", "2014-06-22T06:21:16",
                 "2014-06-19T06:21:32", "2014-06-20T06:21:16",
                 "2014-06-20T06:21:16"]}

data2 = {"enrollment_id":[1,2,3],
         "username":["user1", "user2", "user3"],
         "course_id":["course1", "course2", "course3"],
         "course_from":["2014-06-12", "2014-06-19", "2014-06-12"],
         "course_to":["2014-07-11", "2014-07-18", "2014-07-11"]}

df1 = pd.DataFrame(data1)
df1
   enrollment_id       event                 time
0              1      access  2014-06-14T09:38:39
1              1      access  2014-06-14T09:38:48
2              1      access  2014-06-19T06:21:16
3              1    navigate  2014-06-19T06:21:32
4              2      access  2014-06-21T06:21:45
5              2  page_close  2014-06-22T06:21:16
6              3    navigate  2014-06-19T06:21:32
7              3    navigate  2014-06-20T06:21:16
8              3       video  2014-06-20T06:21:16

df2 = pd.DataFrame(data2)
df2
  course_id  enrollment_id course_from   course_to username
0   course1              1  2014-06-12  2014-07-11    user1
1   course2              2  2014-06-19  2014-07-18    user2
2   course3              3  2014-06-12  2014-07-11    user3
我们想知道一个特定的
注册id
的特定
事件发生了多少次,课程的每一天都有一个单独的计数

通过从
事件日期
中减去
课程
(课程开始日期),得出课程日数
课程日数

df = (df1.merge(df2[["enrollment_id", "course_from"]], 
           on="enrollment_id", how="left")
)
df["event_date"] = pd.to_datetime(pd.to_datetime(df1.time).dt.date)
df["course_from"] = pd.to_datetime(df["course_from"])
df["course_day_num"] = (df.event_date - df["course_from"]).dt.days
然后,
groupby
each
course\u day\u num
获取每个人、每个课程日的事件计数:

groupby_cols = ["enrollment_id", "event", "event_date", "course_day_num"]

df.groupby(groupby_cols).event_date.count()

enrollment_id  event       event_date  course_day_num
1              access      2014-06-14  2                 2
                           2014-06-19  7                 1
               navigate    2014-06-19  7                 1
2              access      2014-06-21  2                 1
               page_close  2014-06-22  3                 1
3              navigate    2014-06-19  7                 1
                           2014-06-20  8                 1
               video       2014-06-20  8                 1
Name: event_date, dtype: int64

IIUC,您可以使用
merge
groupby
count
来获取所需内容

首先,一些示例数据。这是基于您提供的数据,但我已对其进行了修改,以便可以从起始数据清楚地跟踪输出

data1 = {"enrollment_id":[1,1,1,1,2,2,3,3,3],
         "event":["access","access","access","navigate","access",
                  "page_close","navigate","navigate","video"],
         "time":["2014-06-14T09:38:39", "2014-06-14T09:38:48",
                 "2014-06-19T06:21:16", "2014-06-19T06:21:32", 
                 "2014-06-21T06:21:45", "2014-06-22T06:21:16",
                 "2014-06-19T06:21:32", "2014-06-20T06:21:16",
                 "2014-06-20T06:21:16"]}

data2 = {"enrollment_id":[1,2,3],
         "username":["user1", "user2", "user3"],
         "course_id":["course1", "course2", "course3"],
         "course_from":["2014-06-12", "2014-06-19", "2014-06-12"],
         "course_to":["2014-07-11", "2014-07-18", "2014-07-11"]}

df1 = pd.DataFrame(data1)
df1
   enrollment_id       event                 time
0              1      access  2014-06-14T09:38:39
1              1      access  2014-06-14T09:38:48
2              1      access  2014-06-19T06:21:16
3              1    navigate  2014-06-19T06:21:32
4              2      access  2014-06-21T06:21:45
5              2  page_close  2014-06-22T06:21:16
6              3    navigate  2014-06-19T06:21:32
7              3    navigate  2014-06-20T06:21:16
8              3       video  2014-06-20T06:21:16

df2 = pd.DataFrame(data2)
df2
  course_id  enrollment_id course_from   course_to username
0   course1              1  2014-06-12  2014-07-11    user1
1   course2              2  2014-06-19  2014-07-18    user2
2   course3              3  2014-06-12  2014-07-11    user3
我们想知道一个特定的
注册id
的特定
事件发生了多少次,课程的每一天都有一个单独的计数

通过从
事件日期
中减去
课程
(课程开始日期),得出课程日数
课程日数

df = (df1.merge(df2[["enrollment_id", "course_from"]], 
           on="enrollment_id", how="left")
)
df["event_date"] = pd.to_datetime(pd.to_datetime(df1.time).dt.date)
df["course_from"] = pd.to_datetime(df["course_from"])
df["course_day_num"] = (df.event_date - df["course_from"]).dt.days
然后,
groupby
each
course\u day\u num
获取每个人、每个课程日的事件计数:

groupby_cols = ["enrollment_id", "event", "event_date", "course_day_num"]

df.groupby(groupby_cols).event_date.count()

enrollment_id  event       event_date  course_day_num
1              access      2014-06-14  2                 2
                           2014-06-19  7                 1
               navigate    2014-06-19  7                 1
2              access      2014-06-21  2                 1
               page_close  2014-06-22  3                 1
3              navigate    2014-06-19  7                 1
                           2014-06-20  8                 1
               video       2014-06-20  8                 1
Name: event_date, dtype: int64

什么是时间和频率?你能提供可用于构建示例输出的
usercourse
数据吗?嗨@andrew_reece,我更新了上一段代码底部的解释。什么是
time
frequency
?你能提供可用于构建示例输出的
usercourse
数据吗?嗨@andrew_reece,我更新了上一段代码底部的解释。同样@andrew_reece,我想知道你是否可以看看我在这里发布的新问题。似乎没人能帮我一把:(再说一遍@andrew_reece,我想知道你是否能看看我贴在这里的新问题?。似乎没人能帮我一把:(