Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/357.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何计算滚动窗口中数据帧列中相同实例的数量_Python_Pandas_Numpy_Machine Learning_Data Mining - Fatal编程技术网

Python 如何计算滚动窗口中数据帧列中相同实例的数量

Python 如何计算滚动窗口中数据帧列中相同实例的数量,python,pandas,numpy,machine-learning,data-mining,Python,Pandas,Numpy,Machine Learning,Data Mining,我试图在每个滑动窗口内为该数据计算相同ID的数量: ID DATE 2017-05-17 15:49:51 s_2 2017-05-17 15:49:52 s_5 2017-05-17 15:49:55 s_2 2017-05-17 15:49:56 s_3 2017-05-17 15:49:58 s_5 201

我试图在每个滑动窗口内为该数据计算相同ID的数量:

                           ID  
DATE            
2017-05-17 15:49:51         s_2   
2017-05-17 15:49:52         s_5   
2017-05-17 15:49:55         s_2   
2017-05-17 15:49:56         s_3   
2017-05-17 15:49:58         s_5
2017-05-17 15:49:59         s_5
我正试图数一数大小为3的滚动窗口中相互重叠的相同ID的数量。答案应该是这样的:

DATE                    ID      s_2_count    s_3_count   s_5_count       
2017-05-17 15:49:51     s_2         2            0         1 
2017-05-17 15:49:52     s_5         1            1         1   
2017-05-17 15:49:55     s_2         1            1         1   
2017-05-17 15:49:56     s_3         0            1         2   
2017-05-17 15:49:58     s_5         NaN          NaN       NaN
2017-05-17 15:49:59     s_5         NaN          NaN       NaN

使用
str.get\u dummies
rolling
sum
shift
,以及
添加前缀

df.ID.str.get_dummies().rolling(3).sum().shift(-2).add_suffix('_count')
输出:

                     s_2_count  s_3_count  s_5_count
DATE                                                
2017-05-17 15:49:51        2.0        0.0        1.0
2017-05-17 15:49:52        1.0        1.0        1.0
2017-05-17 15:49:55        1.0        1.0        1.0
2017-05-17 15:49:56        0.0        1.0        2.0
2017-05-17 15:49:58        NaN        NaN        NaN
2017-05-17 15:49:59        NaN        NaN        NaN
                      ID  s_2_count  s_3_count  s_5_count
DATE                                                     
2017-05-17 15:49:51  s_2        2.0        0.0        1.0
2017-05-17 15:49:52  s_5        1.0        1.0        1.0
2017-05-17 15:49:55  s_2        1.0        1.0        1.0
2017-05-17 15:49:56  s_3        0.0        1.0        2.0
2017-05-17 15:49:58  s_5        NaN        NaN        NaN
2017-05-17 15:49:59  s_5        NaN        NaN        NaN
让我们将其分配回数据帧:

df.assign(**df.ID.str.get_dummies().rolling(3).sum().shift(-2).add_suffix('_count'))
或者使用join

df.join(df.ID.str.get_dummies().rolling(3).sum().shift(-2).add_suffix('_count'))
输出:

                     s_2_count  s_3_count  s_5_count
DATE                                                
2017-05-17 15:49:51        2.0        0.0        1.0
2017-05-17 15:49:52        1.0        1.0        1.0
2017-05-17 15:49:55        1.0        1.0        1.0
2017-05-17 15:49:56        0.0        1.0        2.0
2017-05-17 15:49:58        NaN        NaN        NaN
2017-05-17 15:49:59        NaN        NaN        NaN
                      ID  s_2_count  s_3_count  s_5_count
DATE                                                     
2017-05-17 15:49:51  s_2        2.0        0.0        1.0
2017-05-17 15:49:52  s_5        1.0        1.0        1.0
2017-05-17 15:49:55  s_2        1.0        1.0        1.0
2017-05-17 15:49:56  s_3        0.0        1.0        2.0
2017-05-17 15:49:58  s_5        NaN        NaN        NaN
2017-05-17 15:49:59  s_5        NaN        NaN        NaN
选项2使用pd.crosstab

df.assign(**pd.crosstab(df.index,df.ID).rolling(3).sum().shift(-2))
或者使用join

df.join(pd.crosstab(df.index,df.ID).rolling(3).sum().shift(-2))

@阿里…如果你想在列中获得所有内容,你可以重置索引。非常感谢!这是一个非常聪明的方法,我可以问另一个问题,
**df.ID
**pd.crosstab
,它们是指针吗?您有任何关于使用
**
的资料吗?@Ali我认为数据帧的**字典解包没有文档记录,因此我使用join选项更新了此解决方案。非常感谢!我现在理解了代码,但是如果可以的话,您能简要解释一下
**
符号吗