Python 3.x Pandas使用跨行条件统计每月发生的事件_Python 3.x_Pandas_Dataframe_Datetime_Data Analysis

Python 3.x Pandas使用跨行条件统计每月发生的事件

python-3.x pandas dataframe datetime

Python 3.x Pandas使用跨行条件统计每月发生的事件,python-3.x,pandas,dataframe,datetime,data-analysis,Python 3.x,Pandas,Dataframe,Datetime,Data Analysis,我有一个这样的数据帧 oper_status 2012-01-01 00:26:54.250 0 2012-01-01 12:11:54.250 1 2012-01-01 13:57:54.250 2 2012-01-02 00:16:54.250 0 2012-01-02 14:26:54.250 1 2012-01-02 1

我有一个这样的数据帧

                              oper_status
2012-01-01 00:26:54.250            0
2012-01-01 12:11:54.250            1
2012-01-01 13:57:54.250            2
2012-01-02 00:16:54.250            0
2012-01-02 14:26:54.250            1
2012-01-02 17:20:54.250            0
2012-01-04 08:21:54.250            0
2012-01-04 15:34:54.250            1
2012-01-04 19:45:54.250            0
2012-01-05 01:00:54.250            0
2012-01-05 12:46:54.250            1
2012-01-05 20:27:54.250            2
        (...)                    (...)

              count
time                      
2012-03-31     244
2012-04-30     65
2012-05-31     167
2012-06-30     33
2012-07-31     187
            ...     ...
2013-05-31     113
2013-06-30     168
2013-07-31     294
2013-08-31     178
2013-09-30     65

我想计算每个月我有多少次连续的值是这样的：0，1，2。我尝试使用iterrows（）在行上循环，但速度非常慢，因为我有一个大数据集。我也考虑过使用“diff”，但我想不出一个简单的方法。谢谢

编辑：预期输出如下所示

                              oper_status
2012-01-01 00:26:54.250            0
2012-01-01 12:11:54.250            1
2012-01-01 13:57:54.250            2
2012-01-02 00:16:54.250            0
2012-01-02 14:26:54.250            1
2012-01-02 17:20:54.250            0
2012-01-04 08:21:54.250            0
2012-01-04 15:34:54.250            1
2012-01-04 19:45:54.250            0
2012-01-05 01:00:54.250            0
2012-01-05 12:46:54.250            1
2012-01-05 20:27:54.250            2
        (...)                    (...)

              count
time                      
2012-03-31     244
2012-04-30     65
2012-05-31     167
2012-06-30     33
2012-07-31     187
            ...     ...
2013-05-31     113
2013-06-30     168
2013-07-31     294
2013-08-31     178
2013-09-30     65

计算顺序模式是一个两步过程。首先，为每行构建一个序列，表示在该行结束的模式：

df['seq'] = df.order_status.astype(str).shift(periods=0) + '-' + 
            df.order_status.astype(str).shift(periods=1) + '-' + 
            df.order_status.astype(str).shift(periods=2)

                      date  order_status    seq
0  2012-01-01 00:26:54.250             0    NaN
1  2012-01-01 12:11:54.250             1    NaN
2  2012-01-01 13:57:54.250             2  2-1-0
3  2012-01-02 00:16:54.250             0  0-2-1
4  2012-01-02 14:26:54.250             1  1-0-2
5  2012-01-02 17:20:54.250             0  0-1-0
6  2012-01-04 08:21:54.250             0  0-0-1
7  2012-01-04 15:34:54.250             1  1-0-0
8  2012-01-04 19:45:54.250             0  0-1-0
9  2012-01-05 01:00:54.250             0  0-0-1
10 2012-01-05 12:46:54.250             1  1-0-0
11 2012-01-05 20:27:54.250             2  2-1-0

然后，只过滤到正确的序列，并聚合到所需的级别：

df['month'] = df.date.dt.month    
df[df.seq == '2-1-0'].groupby("month").month.count()

month
1    2

根据需要进行更改，以处理您希望模式在某个时间段内开始、停止、完全在某个时间段内等情况。

类似于？另外，您使用的是Python3还是Python2？你问题的标签在这方面是不明确的…请显示预期的输出，但这不是我的意思。我需要的东西，计算多少时间，我有0，1，2（在连续三行）。我删除了python 2，谢谢