Python 识别数据帧中滚动时间窗口中的重复项_Python_Pandas

Python 识别数据帧中滚动时间窗口中的重复项

python pandas

Python 识别数据帧中滚动时间窗口中的重复项,python,pandas,Python,Pandas,我有一个数据帧，我想在滑动时间窗口中识别（并最终删除）重复的行 dict={ 'type': ['apple','apple','apple','berry','grape','apple'], 'attr': ['red','green','red','blue','green','red'], 'timestamp': [ '2021-03-01 12:00:00', '2021-03-01 12:00:30',

我有一个数据帧，我想在滑动时间窗口中识别（并最终删除）重复的行

dict={
    'type': ['apple','apple','apple','berry','grape','apple'],
    'attr': ['red','green','red','blue','green','red'],
    'timestamp': [ '2021-03-01 12:00:00',
                  '2021-03-01 12:00:30',
                  '2021-03-01 12:01:13',
                  '2021-03-01 12:01:30',
                  '2021-03-01 12:10:00',
                  '2021-03-01 12:11:00',
                 ]
}
df = pd.DataFrame(dict)
df['is_dup'] = False
print(df)

在本例中，我的目标是当“type”和“attr”等于2分钟内发生的另一行时，将该行标记为重复行。所以我想将索引2标记为_dup=True，因为它与索引0匹配并且在2分钟的时间范围内，而不是第5行，因为它的时间戳不在窗口内

因此，生成的数据帧如下所示：

    type   attr            timestamp  is_dup
0  apple    red  2021-03-01 12:00:00   False
1  apple  green  2021-03-01 12:00:30   False
2  apple    red  2021-03-01 12:01:13   True
3  berry   blue  2021-03-01 12:01:30   False
4  grape  green  2021-03-01 12:10:00   False
5  apple    red  2021-03-01 12:11:00   False

提前感谢。

我正在创建一个临时列

diff

，用于分组和存储时差。然后我单独检查时差是否小于2分钟，然后将

is_dup

修改为

True

df['diff'] = df.groupby(['type', 'attr'])['timestamp'].diff().fillna(pd.Timedelta(seconds=0))
df.loc[(df['diff']>pd.Timedelta(0,'m')) & (df['diff']<=pd.Timedelta(2,'m')), 'is_dup'] = True
df=df.drop(['diff'], axis=1)
print(df)

这回答了你的问题吗？索引0不也应该是

吗_dup=True

？我不希望原始文件被视为dup。稍后我将返回并删除所有行，其中是_dup=True，在这种情况下，我不希望删除原始行。哇，非常感谢！！嗯，

df['diff']>pd.Timedelta（0，'m'））

的目的是什么？

diff

不是总是正值吗？@tdy有多行的值为零diff value和

df['diff']
df['diff'] = df.groupby(['type', 'attr'])['timestamp'].diff().fillna(pd.Timedelta(seconds=0))
df.loc[(df['diff']>pd.Timedelta(0,'m')) & (df['diff']<=pd.Timedelta(2,'m')), 'is_dup'] = True
df=df.drop(['diff'], axis=1)
print(df)

    type   attr           timestamp  is_dup
0  apple    red 2021-03-01 12:00:00   False
1  apple  green 2021-03-01 12:00:30   False
2  apple    red 2021-03-01 12:01:13    True
3  berry   blue 2021-03-01 12:01:30   False
4  grape  green 2021-03-01 12:10:00   False
5  apple    red 2021-03-01 12:11:00   False