Python 从dataframe中滚动窗口中满足的一组条件创建布尔列
我有这样一个df:Python 从dataframe中滚动窗口中满足的一组条件创建布尔列,python,pandas,Python,Pandas,我有这样一个df: x. y. length. condition_x. condition_y 6 1.89 TRUE FALSE 1 0.79 FALSE FALSE 5 1.34 FALSE FALSE 4 5.22 FALSE FALSE 1 1.21 FALSE FA
x. y. length. condition_x. condition_y
6 1.89 TRUE FALSE
1 0.79 FALSE FALSE
5 1.34 FALSE FALSE
4 5.22 FALSE FALSE
1 1.21 FALSE FALSE
4 3.44 FALSE FALSE
x. y. length. condition_x. condition_y. val_x. val_y
6 1.89 TRUE FALSE FALSE FALSE
1 0.79 FALSE FALSE FALSE FALSE
5 1.34 FALSE FALSE TRUE FALSE
4 5.22 FALSE FALSE FALSE FALSE
1 1.21 FALSE FALSE FALSE FALSE
4 3.44 FALSE FALSE FALSE FALSE
5 2.43 FALSE FALSE FALSE FALSE
x. y. length. condition_x. condition_y. val_x. val_y
6 1.89 TRUE FALSE FALSE FALSE
1 0.79 FALSE FALSE FALSE FALSE
5 1.34 FALSE FALSE TRUE FALSE
4 5.22 FALSE FALSE FALSE FALSE
1 1.21 FALSE FALSE FALSE FALSE
4 3.44 FALSE FALSE FALSE FALSE
5 2.43 FALSE FALSE FALSE FALSE
我正在尝试根据伪代码中的以下逻辑创建列:
`condition_x` is TRUE if x. == 6 and `condition_y` is TRUE if y. == 6 (I already have this computed)
if `condition_x` is `TRUE`:
then create a window of the next 3 rows as `window`
window_values = []
for row in `window`:
take the y value with the max value for `length` as y_length_max
window_values.append(y_length_max)
if the first row y > 4 OR if the first row y == np.nan & the second row y > 4:
take the y value(s) that satisfy this condition as y_condition_matches
window_values.append(y_condition_matches)
then take max value from window_values and create bool column where y matches max value in window
repeat if condition_y == TRUE
应该是这样的:
x. y. length. condition_x. condition_y
6 1.89 TRUE FALSE
1 0.79 FALSE FALSE
5 1.34 FALSE FALSE
4 5.22 FALSE FALSE
1 1.21 FALSE FALSE
4 3.44 FALSE FALSE
x. y. length. condition_x. condition_y. val_x. val_y
6 1.89 TRUE FALSE FALSE FALSE
1 0.79 FALSE FALSE FALSE FALSE
5 1.34 FALSE FALSE TRUE FALSE
4 5.22 FALSE FALSE FALSE FALSE
1 1.21 FALSE FALSE FALSE FALSE
4 3.44 FALSE FALSE FALSE FALSE
5 2.43 FALSE FALSE FALSE FALSE
x. y. length. condition_x. condition_y. val_x. val_y
6 1.89 TRUE FALSE FALSE FALSE
1 0.79 FALSE FALSE FALSE FALSE
5 1.34 FALSE FALSE TRUE FALSE
4 5.22 FALSE FALSE FALSE FALSE
1 1.21 FALSE FALSE FALSE FALSE
4 3.44 FALSE FALSE FALSE FALSE
5 2.43 FALSE FALSE FALSE FALSE
为了清晰起见,要将样本分解:
condition_x
在此为TRUE
,启动流程:
x. y. length. condition_x. condition_y
6 1.89 TRUE FALSE
然后,我们创建前面3行的窗口:
x. y. length. condition_x. condition_y
1 0.79 FALSE FALSE
5 1.34 FALSE FALSE
4 5.22 FALSE FALSE
然后取y值和最大长度:
x. y. length. condition_x. condition_y
4 5.22 FALSE FALSE
并将其添加到包含窗口条件匹配项的列表中:
window_vals = [4]
if the first row y > 4 OR if the first row y == np.nan & the second row y > 4:
然后,我们检查窗口内是否满足第二组条件:
window_vals = [4]
if the first row y > 4 OR if the first row y == np.nan & the second row y > 4:
如果由于窗口中第一行的y值==np.nan
如果第一行y==np.nan&第二行y>4
满足第二个条件,因为窗口中第一行的y值=='np.nan',第二行的y值为>4
因为满足第二个条件,所以我们取满足任一条件的最大y值。在这种情况下,该值为5
然后将该值添加到窗口\u vals
列表中
window_vals = [4, 5]
然后,我们从窗口\u vals
中获取最大值。然后,每当y值等于该窗口中的max(窗口值)
时,我们创建一个TRUE
列
同样,最终结果如下所示:
x. y. length. condition_x. condition_y
6 1.89 TRUE FALSE
1 0.79 FALSE FALSE
5 1.34 FALSE FALSE
4 5.22 FALSE FALSE
1 1.21 FALSE FALSE
4 3.44 FALSE FALSE
x. y. length. condition_x. condition_y. val_x. val_y
6 1.89 TRUE FALSE FALSE FALSE
1 0.79 FALSE FALSE FALSE FALSE
5 1.34 FALSE FALSE TRUE FALSE
4 5.22 FALSE FALSE FALSE FALSE
1 1.21 FALSE FALSE FALSE FALSE
4 3.44 FALSE FALSE FALSE FALSE
5 2.43 FALSE FALSE FALSE FALSE
x. y. length. condition_x. condition_y. val_x. val_y
6 1.89 TRUE FALSE FALSE FALSE
1 0.79 FALSE FALSE FALSE FALSE
5 1.34 FALSE FALSE TRUE FALSE
4 5.22 FALSE FALSE FALSE FALSE
1 1.21 FALSE FALSE FALSE FALSE
4 3.44 FALSE FALSE FALSE FALSE
5 2.43 FALSE FALSE FALSE FALSE