如何在python中转换日期的时间和计算天数?
我有一个关于在线课程用户的数据集。它具有“id”、“事件”、“时间”等功能。我对他们进行分组,并想知道用户在特定日期进行每项活动的频率。我想用几天数一数如何在python中转换日期的时间和计算天数?,python,pandas,numpy,dataframe,pandas-groupby,Python,Pandas,Numpy,Dataframe,Pandas Groupby,我有一个关于在线课程用户的数据集。它具有“id”、“事件”、“时间”等功能。我对他们进行分组,并想知道用户在特定日期进行每项活动的频率。我想用几天数一数 lt = log_train.groupby(['enrollment_id','event','time']).size() print(lt) enrollment_id event time 1 access 2014-06-14T09:38:39 2
lt = log_train.groupby(['enrollment_id','event','time']).size()
print(lt)
enrollment_id event time
1 access 2014-06-14T09:38:39 2
2014-06-14T09:38:48 1
2014-06-19T06:21:16 2
2014-06-19T06:21:32 1
2014-06-19T06:21:45 1
..
200887 navigate 2014-07-24T03:27:16 1
200887 navigate 2014-07-24T03:27:16 1
page_close 2014-07-24T04:19:55 1
video 2014-07-24T04:19:57 1
200888 access 2014-07-24T03:48:14 2
discussion 2014-07-24T03:47:57 1
navigate 2014-07-24T03:47:17 1
2014-07-24T03:47:28 1
2014-07-24T03:48:01 1
从我在另一个数据集中看到的信息来看,有userid、courseid和courserange-time
usercourse = pd.merge(enroll,date,how="left", on= 'course_id' )
enrollment_id username \
0 1 9Uee7oEuuMmgPx2IzPfFkWgkHZyPbWr0
1 3 1qXC7Fjbwp66GPQc6pHLfEuO8WKozxG4
2 4 FIHlppZyoq8muPbdVxS44gfvceX9zvU7
course_id from to
0 DPnLzkJJqOOPRJfBxIHbQEERiYHu5ila 2014-06-12 2014-07-11
1 7GRhBDsirIGkRZBtSMEzNTyDr2JQm4xx 2014-06-19 2014-07-18
2 DPnLzkJJqOOPRJfBxIHbQEERiYHu5ila 2014-06-12 2014-07-11
每个用户只有一个课程,所有课程的30天范围相同。所以我想要的应该是这样的
enrollment_id event #ofDays #ofActionTimes
1 access 2 2
10 6
30 2
..
200887 navigate 23 1
page_close 30 1
video 1 1
200888 access 12 2
discussion 2 1
navigate 5 3
29 4
**#ofDays means at the Nth day of a course.
#ofActionTimes means how often an event happens on the Nth day.**
由于每门课程都从不同的日期开始,我不知道如何在python上生成此数据表单。希望有人能帮我解决这个问题 IIUC,您可以使用
merge
、groupby
和count
来获取所需内容
首先,一些示例数据。这是基于您提供的数据,但我已对其进行了修改,以便可以从起始数据清楚地跟踪输出
data1 = {"enrollment_id":[1,1,1,1,2,2,3,3,3],
"event":["access","access","access","navigate","access",
"page_close","navigate","navigate","video"],
"time":["2014-06-14T09:38:39", "2014-06-14T09:38:48",
"2014-06-19T06:21:16", "2014-06-19T06:21:32",
"2014-06-21T06:21:45", "2014-06-22T06:21:16",
"2014-06-19T06:21:32", "2014-06-20T06:21:16",
"2014-06-20T06:21:16"]}
data2 = {"enrollment_id":[1,2,3],
"username":["user1", "user2", "user3"],
"course_id":["course1", "course2", "course3"],
"course_from":["2014-06-12", "2014-06-19", "2014-06-12"],
"course_to":["2014-07-11", "2014-07-18", "2014-07-11"]}
df1 = pd.DataFrame(data1)
df1
enrollment_id event time
0 1 access 2014-06-14T09:38:39
1 1 access 2014-06-14T09:38:48
2 1 access 2014-06-19T06:21:16
3 1 navigate 2014-06-19T06:21:32
4 2 access 2014-06-21T06:21:45
5 2 page_close 2014-06-22T06:21:16
6 3 navigate 2014-06-19T06:21:32
7 3 navigate 2014-06-20T06:21:16
8 3 video 2014-06-20T06:21:16
df2 = pd.DataFrame(data2)
df2
course_id enrollment_id course_from course_to username
0 course1 1 2014-06-12 2014-07-11 user1
1 course2 2 2014-06-19 2014-07-18 user2
2 course3 3 2014-06-12 2014-07-11 user3
我们想知道一个特定的注册id
的特定事件发生了多少次,课程的每一天都有一个单独的计数
通过从事件日期
中减去课程
(课程开始日期),得出课程日数课程日数
df = (df1.merge(df2[["enrollment_id", "course_from"]],
on="enrollment_id", how="left")
)
df["event_date"] = pd.to_datetime(pd.to_datetime(df1.time).dt.date)
df["course_from"] = pd.to_datetime(df["course_from"])
df["course_day_num"] = (df.event_date - df["course_from"]).dt.days
然后,groupby
eachcourse\u day\u num
获取每个人、每个课程日的事件计数:
groupby_cols = ["enrollment_id", "event", "event_date", "course_day_num"]
df.groupby(groupby_cols).event_date.count()
enrollment_id event event_date course_day_num
1 access 2014-06-14 2 2
2014-06-19 7 1
navigate 2014-06-19 7 1
2 access 2014-06-21 2 1
page_close 2014-06-22 3 1
3 navigate 2014-06-19 7 1
2014-06-20 8 1
video 2014-06-20 8 1
Name: event_date, dtype: int64
IIUC,您可以使用merge
、groupby
和count
来获取所需内容
首先,一些示例数据。这是基于您提供的数据,但我已对其进行了修改,以便可以从起始数据清楚地跟踪输出
data1 = {"enrollment_id":[1,1,1,1,2,2,3,3,3],
"event":["access","access","access","navigate","access",
"page_close","navigate","navigate","video"],
"time":["2014-06-14T09:38:39", "2014-06-14T09:38:48",
"2014-06-19T06:21:16", "2014-06-19T06:21:32",
"2014-06-21T06:21:45", "2014-06-22T06:21:16",
"2014-06-19T06:21:32", "2014-06-20T06:21:16",
"2014-06-20T06:21:16"]}
data2 = {"enrollment_id":[1,2,3],
"username":["user1", "user2", "user3"],
"course_id":["course1", "course2", "course3"],
"course_from":["2014-06-12", "2014-06-19", "2014-06-12"],
"course_to":["2014-07-11", "2014-07-18", "2014-07-11"]}
df1 = pd.DataFrame(data1)
df1
enrollment_id event time
0 1 access 2014-06-14T09:38:39
1 1 access 2014-06-14T09:38:48
2 1 access 2014-06-19T06:21:16
3 1 navigate 2014-06-19T06:21:32
4 2 access 2014-06-21T06:21:45
5 2 page_close 2014-06-22T06:21:16
6 3 navigate 2014-06-19T06:21:32
7 3 navigate 2014-06-20T06:21:16
8 3 video 2014-06-20T06:21:16
df2 = pd.DataFrame(data2)
df2
course_id enrollment_id course_from course_to username
0 course1 1 2014-06-12 2014-07-11 user1
1 course2 2 2014-06-19 2014-07-18 user2
2 course3 3 2014-06-12 2014-07-11 user3
我们想知道一个特定的注册id
的特定事件发生了多少次,课程的每一天都有一个单独的计数
通过从事件日期
中减去课程
(课程开始日期),得出课程日数课程日数
df = (df1.merge(df2[["enrollment_id", "course_from"]],
on="enrollment_id", how="left")
)
df["event_date"] = pd.to_datetime(pd.to_datetime(df1.time).dt.date)
df["course_from"] = pd.to_datetime(df["course_from"])
df["course_day_num"] = (df.event_date - df["course_from"]).dt.days
然后,groupby
eachcourse\u day\u num
获取每个人、每个课程日的事件计数:
groupby_cols = ["enrollment_id", "event", "event_date", "course_day_num"]
df.groupby(groupby_cols).event_date.count()
enrollment_id event event_date course_day_num
1 access 2014-06-14 2 2
2014-06-19 7 1
navigate 2014-06-19 7 1
2 access 2014-06-21 2 1
page_close 2014-06-22 3 1
3 navigate 2014-06-19 7 1
2014-06-20 8 1
video 2014-06-20 8 1
Name: event_date, dtype: int64
什么是时间和频率?你能提供可用于构建示例输出的usercourse
数据吗?嗨@andrew_reece,我更新了上一段代码底部的解释。什么是time
和frequency
?你能提供可用于构建示例输出的usercourse
数据吗?嗨@andrew_reece,我更新了上一段代码底部的解释。同样@andrew_reece,我想知道你是否可以看看我在这里发布的新问题。似乎没人能帮我一把:(再说一遍@andrew_reece,我想知道你是否能看看我贴在这里的新问题?。似乎没人能帮我一把:(