Python 熊猫-带滑动窗口的条件列_Python_Pandas_Conditional

Python 熊猫-带滑动窗口的条件列

python pandas

Python 熊猫-带滑动窗口的条件列,python,pandas,conditional,Python,Pandas,Conditional,我有一个带有两列的df-时间戳和文本。我正在尝试使用真/假（1/0）标签标记数据。条件是，如果文本中有“error”一词，则在输入前3-4小时之间的所有输入项都应获得1标签，而其他输入项应获得0标签。例如，从这样一个df： time text 15:00 a-ok 16:01 fine 17:00 kay 18:00 uhum 19:00 doin well 20:00 is error 20:05 still error 21:00 fine again 应转化为： ti

我有一个带有两列的df-时间戳和文本。我正在尝试使用真/假（1/0）标签标记数据。条件是，如果文本中有“error”一词，则在输入前3-4小时之间的所有输入项都应获得1标签，而其他输入项应获得0标签。例如，从这样一个df：

time   text
15:00  a-ok
16:01  fine
17:00  kay
18:00  uhum
19:00  doin well
20:00  is error
20:05  still error
21:00  fine again

应转化为：

time   text       error coming
15:00  a-ok       0
16:01  fine       1
17:00  kay        1
18:00  uhum       1
19:00  doin well  1
20:00  is error   0
20:05  still error0
21:00  fine again 0

我读过一些关于滑动窗口的

。滚动，但我很难将其全部整合起来。
的想法是将时间转换为时间增量，过滤带有错误的时间增量，并为每个值创建带有逻辑\u或.reduce的掩码，链掩码，带反转的m1
，用于避免错误
s值，并将真/假
转换为整数，以1/0
映射：
td = pd.to_timedelta(df['time'].astype(str) + ':00')

m1 = df['text'].str.contains('error')
v = td[m1]
print (v)
5   20:00:00
6   20:05:00
Name: time, dtype: timedelta64[ns]

m2 = np.logical_or.reduce([td.between(x - pd.Timedelta(4, unit='h'), x) for x in v])
df['error coming'] = (m2 & ~m1).astype(int)
print (df)
    time         text  error coming
0  15:00         a-ok             0
1  16:01         fine             1
2  17:00          kay             1
3  18:00         uhum             1
4  19:00    doin well             1
5  20:00     is error             0
6  20:05  still error             0
7  21:00   fine again             0

编辑：

矢量化解决方案：
m1 = df['text'].str.contains('error')
v = df.loc[m1, 'time']
print (v)
5   2019-01-26 20:00:00
6   2019-01-26 20:05:00
Name: time, dtype: datetime64[ns]

a = v - pd.Timedelta(4, unit='h')
m = (a.values < df['time'].values[:, None]) & (v.values > df['time'].values[:, None])
df['error coming'] = (m.any(axis=1) & ~m1).astype(int)
print (df)
                 time         text  error coming
0 2019-01-26 15:00:00         a-ok             0
1 2019-01-26 16:01:00         fine             1
2 2019-01-26 17:00:00          kay             1
3 2019-01-26 18:00:00         uhum             1
4 2019-01-26 19:00:00    doin well             1
5 2019-01-26 20:00:00     is error             0
6 2019-01-26 20:05:00  still error             0
7 2019-01-26 21:00:00   fine again             0

m1=df['text'].str.contains（'error'）
v=df.loc[m1，‘时间’]
印刷品（五）
5   2019-01-26 20:00:00
6   2019-01-26 20:05:00
名称：时间，数据类型：datetime64[ns]
a=v-pd.Timedelta（4，单位=h）
m=（a.valuesdf['time']值[：，无]）
df['error coming']=（m.any（axis=1）和~m1）。aType（int）
打印（df）
时间文本错误来临
0 2019-01-26 15:00:00 a-ok 0
2019-01-26 16:01:00罚款1
2019-01-26 17:00:00 kay 1
2019-01-26 18:00:00 uhum 1
4 2019-01-26 19:00:00在1号井内施工
5 2019-01-26 20:00:00是错误0
6 2019-01-26 20:05:00仍然错误0
7 2019-01-26 21:00:00再次罚款
它会出现在许多行中吗？是的。我将修改这个问题以反映它。这是非常棘手的：）稍后会看一看，希望你能尽快得到答案though@lte__-然后将td=pd.更改为_timedelta（df['time'].astype（str）+':00'）
更改为td=pd.to_datetime（df['time']）
如果我这样做，我会得到TypeError:dtype datetime64[ns，UTC]无法转换为timedelta64[ns]
并且如果我这样做，td=pd.to_timedelta（df_full['time'].values.astype（'datetime64[ns]'））
我最终会得到AttributeError:'TimedeltaIndex'对象在'之间没有属性'。。。我错过了什么？
m1 = df['text'].str.contains('error')
v = df.loc[m1, 'time']
print (v)
5   2019-01-26 20:00:00
6   2019-01-26 20:05:00
Name: time, dtype: datetime64[ns]

m2 = np.logical_or.reduce([df['time'].between(x - pd.Timedelta(4, unit='h'), x) for x in v])
df['error coming'] = (m2 & ~m1).astype(int)
print (df)
                 time         text  error coming
0 2019-01-26 15:00:00         a-ok             0
1 2019-01-26 16:01:00         fine             1
2 2019-01-26 17:00:00          kay             1
3 2019-01-26 18:00:00         uhum             1
4 2019-01-26 19:00:00    doin well             1
5 2019-01-26 20:00:00     is error             0
6 2019-01-26 20:05:00  still error             0
7 2019-01-26 21:00:00   fine again             0

m1 = df['text'].str.contains('error')
v = df.loc[m1, 'time']
print (v)
5   2019-01-26 20:00:00
6   2019-01-26 20:05:00
Name: time, dtype: datetime64[ns]

a = v - pd.Timedelta(4, unit='h')
m = (a.values < df['time'].values[:, None]) & (v.values > df['time'].values[:, None])
df['error coming'] = (m.any(axis=1) & ~m1).astype(int)
print (df)
                 time         text  error coming
0 2019-01-26 15:00:00         a-ok             0
1 2019-01-26 16:01:00         fine             1
2 2019-01-26 17:00:00          kay             1
3 2019-01-26 18:00:00         uhum             1
4 2019-01-26 19:00:00    doin well             1
5 2019-01-26 20:00:00     is error             0
6 2019-01-26 20:05:00  still error             0
7 2019-01-26 21:00:00   fine again             0