Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/302.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python:如何基于datetime获取值计数_Python_Python 3.x_Pandas_Dataframe_Time Series - Fatal编程技术网

Python:如何基于datetime获取值计数

Python:如何基于datetime获取值计数,python,python-3.x,pandas,dataframe,time-series,Python,Python 3.x,Pandas,Dataframe,Time Series,我编写了以下代码,创建了两个数据帧nq和cmnt nq包含UserId和获得徽章的相应时间日期 cmnt包含OwnerUserId和用户发表评论的时间CreationDate 我想统计一下在获得徽章1周之前和之后的所有日子里的评论,这样我就可以从中创建一个时间序列线图 以下代码执行相同的操作,但会产生一个KeyError。请提供为所有用户执行此操作的代码 nq UserId | date 1 2009-10-17 17:38:32.590 2 20

我编写了以下代码,创建了两个数据帧
nq
cmnt

nq
包含
UserId
和获得徽章的相应时间
日期

cmnt
包含
OwnerUserId
和用户发表评论的时间
CreationDate

我想统计一下在获得徽章1周之前和之后的所有日子里的评论,这样我就可以从中创建一个时间序列线图

以下代码执行相同的操作,但会产生一个KeyError。请提供为所有用户执行此操作的代码

nq

 UserId |   date 
     1      2009-10-17 17:38:32.590
     2      2009-10-19 00:37:23.067
     3      2009-10-20 08:37:14.143
     4      2009-10-21 18:07:51.247
     5      2009-10-22 21:25:24.483
cmnt

OwnerUserId | CreationDate
1             2009-10-16 17:38:32.590
1             2009-10-18 17:38:32.590
2             2009-10-18 00:37:23.067
2             2009-10-17 00:37:23.067
2             2009-10-20 00:37:23.067
3             2009-10-19 08:37:14.143
4             2009-10-20 18:07:51.247
5             2009-10-21 21:25:24.483
UserId     |   date                 |-7|-6|-5|-4|-3|-2|-1|0 |1 |2 |3 |4 |5 |6 |7
     1      2009-10-17 17:38:32.590 |0 |0 |0 |0 |0 |0 |1 |0 |1 |0 |0 |0 |0 |0 |0  
     2      2009-10-19 00:37:23.067 |0 |0 |0 |0 |0 |1 |1 |0 |1 |0 |0 |0 |0 |0 |0    
     3      2009-10-20 08:37:14.143 |0 |0 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 
     4      2009-10-21 18:07:51.247 |0 |0 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 
     5      2009-10-22 21:25:24.483 |0 |0 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 
代码

 nq.date = pd.to_datetime(nq.date)
 cmnt.CreationDate = pd.to_datetime(cmnt.CreationDate)

 count= []
   
 for j in range(len(nq)): 
      for i in range(-7,8):
        
          check_date = nq.date.iloc[j] + timedelta(days=i)
          
          count = cmnt.loc[(cmnt.OwnerUserId == nq.UserId.iloc[j]) & (cmnt.CreationDate == check_date)].shape[0]
          nq.iloc[j].append({nq[i]:count})
预期产出

OwnerUserId | CreationDate
1             2009-10-16 17:38:32.590
1             2009-10-18 17:38:32.590
2             2009-10-18 00:37:23.067
2             2009-10-17 00:37:23.067
2             2009-10-20 00:37:23.067
3             2009-10-19 08:37:14.143
4             2009-10-20 18:07:51.247
5             2009-10-21 21:25:24.483
UserId     |   date                 |-7|-6|-5|-4|-3|-2|-1|0 |1 |2 |3 |4 |5 |6 |7
     1      2009-10-17 17:38:32.590 |0 |0 |0 |0 |0 |0 |1 |0 |1 |0 |0 |0 |0 |0 |0  
     2      2009-10-19 00:37:23.067 |0 |0 |0 |0 |0 |1 |1 |0 |1 |0 |0 |0 |0 |0 |0    
     3      2009-10-20 08:37:14.143 |0 |0 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 
     4      2009-10-21 18:07:51.247 |0 |0 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 
     5      2009-10-22 21:25:24.483 |0 |0 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0 
此处,
-1
列表示在获得徽章前一天发表的评论,
1
列表示在获得徽章后一天发表的评论,依此类推

注意
有一种完全交替的方法可以做到这一点。我的主要目标是绘制一个时间序列线图,显示用户在获得徽章前后的评论数量。

可能您需要交叉合并、筛选,然后是交叉表。:

# merge the two dataframes
merged = (nq.merge(cmnt, left_on='UserId', 
         right_on='OwnerUserId',
         how='left')
)

# extract the date difference between `date` and `CreationDate`
merged['date_diff'] = merged['date'].dt.normalize() - merged['CreationDate'].dt.normalize()
merged['date_diff'] = (merged['date_diff'] / pd.to_timedelta('1D')).astype(int)

# filter the comments within the range
merged = merged[merged['date_diff'].between(-7,7)]

# crosstab
pd.crosstab([merged['UserId'],merged['date']], merged['date_diff'])
输出:

date_diff                       -1   1   2
UserId date                               
1      2009-10-17 17:38:32.590   1   1   0
2      2009-10-19 00:37:23.067   1   1   1
3      2009-10-20 08:37:14.143   0   1   0
4      2009-10-21 18:07:51.247   0   1   0
5      2009-10-22 21:25:24.483   0   1   0

这将提供正确的输出。您能否向解决方案中添加如何将此交叉表转换为数据帧?该交叉表命令将返回一个数据帧。只需将其分配给某个对象,例如,
out=pd.crosstab(…)
。它是,但我希望它有以下列['UserId',“date”,-7,-6,…0…,6,7],我可以像普通数据帧列一样访问这些列。但是现在的列是
Int64Index([-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7],dtype='int64',name='date_diff')
。因此,现在命令df['UserId']给出错误,因为'UserId'不是df的一列。(df=pd.crosstab(…)chain
reset_index()
使用该
pd.crosstab()
。对于某些输入,我也会得到此错误
ValueError:无法将非有限值(NA或inf)转换为整数。如何解决这个问题?