Python 如何获取基于日期时间的值计数
我编写了以下代码,创建了两个数据帧Python 如何获取基于日期时间的值计数,python,python-3.x,dataframe,datetime,time-series,Python,Python 3.x,Dataframe,Datetime,Time Series,我编写了以下代码,创建了两个数据帧nq和cmnt nq包含UserId和获得徽章的相应时间日期 cmnt包含OwnerUserId和用户发表评论的时间CreationDate 我想统计每个用户在获得徽章1周之前和之后的所有日子里的评论数量,这样我就可以从中创建一个时间序列线图 下面的代码执行相同的操作,但对一部分数据产生错误,而对另一部分数据工作正常。请为我提供执行此任务的替代方法 nq UserId | date 1 2009-10-17 17:38:32.590
nq
和cmnt
nq
包含UserId
和获得徽章的相应时间日期
cmnt
包含OwnerUserId
和用户发表评论的时间CreationDate
我想统计每个用户在获得徽章1周之前和之后的所有日子里的评论数量,这样我就可以从中创建一个时间序列线图 下面的代码执行相同的操作,但对一部分数据产生错误,而对另一部分数据工作正常。请为我提供执行此任务的替代方法 nq
UserId | date
1 2009-10-17 17:38:32.590
2 2009-10-19 00:37:23.067
3 2009-10-20 08:37:14.143
4 2009-10-21 18:07:51.247
5 2009-10-22 21:25:24.483
cmnt
OwnerUserId | CreationDate
1 2009-10-16 17:38:32.590
1 2009-10-18 17:38:32.590
2 2009-10-18 00:37:23.067
2 2009-10-17 00:37:23.067
2 2009-10-20 00:37:23.067
3 2009-10-19 08:37:14.143
4 2009-10-20 18:07:51.247
5 2009-10-21 21:25:24.483
UserId | date |-7|-6|-5|-4|-3|-2|-1|0 |1 |2 |3 |4 |5 |6 |7
1 2009-10-17 17:38:32.590 |0 |0 |0 |0 |0 |0 |1 |0 |1 |0 |0 |0 |0 |0 |0
2 2009-10-19 00:37:23.067 |0 |0 |0 |0 |0 |1 |1 |0 |1 |0 |0 |0 |0 |0 |0
3 2009-10-20 08:37:14.143 |0 |0 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0
4 2009-10-21 18:07:51.247 |0 |0 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0
5 2009-10-22 21:25:24.483 |0 |0 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0
代码
t = pd.merge(nq, cmnt, left_on="UserId", right_on = "OwnerUserId")
t["days_diff"] = (t["CreationDate"] - t["date"]).dt.days
t["count"] = t.groupby(["UserId", "days_diff"]).OwnerUserId.transform("count")
all_days = pd.DataFrame(itertools.product(t.UserId.unique(), range(-7, 8)), )
all_days.columns = ["UserId", "day"]
t = pd.merge(t, all_days, left_on=["UserId", "days_diff"], right_on=["UserId", "day"], how = "right")
t = pd.pivot_table(t, index="UserId", columns="day", values="count", dropna=False)
res = pd.merge(nq, t, left_on="UserId", right_index=True)
print(res)
预期产出
OwnerUserId | CreationDate
1 2009-10-16 17:38:32.590
1 2009-10-18 17:38:32.590
2 2009-10-18 00:37:23.067
2 2009-10-17 00:37:23.067
2 2009-10-20 00:37:23.067
3 2009-10-19 08:37:14.143
4 2009-10-20 18:07:51.247
5 2009-10-21 21:25:24.483
UserId | date |-7|-6|-5|-4|-3|-2|-1|0 |1 |2 |3 |4 |5 |6 |7
1 2009-10-17 17:38:32.590 |0 |0 |0 |0 |0 |0 |1 |0 |1 |0 |0 |0 |0 |0 |0
2 2009-10-19 00:37:23.067 |0 |0 |0 |0 |0 |1 |1 |0 |1 |0 |0 |0 |0 |0 |0
3 2009-10-20 08:37:14.143 |0 |0 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0
4 2009-10-21 18:07:51.247 |0 |0 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0
5 2009-10-22 21:25:24.483 |0 |0 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0
此处,-1
列表示在获得徽章前一天发表的评论,1
列表示在获得徽章后一天发表的评论,依此类推
错误
ValueError: Length mismatch: Expected axis has 0 elements, new values have 2 elements
注意
错误是由这行代码引起的:
all_days.columns=[“UserId”,“day”]
这是否回答了您的问题?