Python 重复与下一行进行比较,直到符合标准

Python 重复与下一行进行比较,直到符合标准,python,pandas,algorithm,Python,Pandas,Algorithm,我想过滤数据帧df: Id Timestamp Data diff1 10856167 18675685 2010-03-01 05:58:15.520 25.0 0.0 10856168 18675686 2010-03-01 05:58:16.863 26.0 1.0 10856169 18675687 2010-03-01 05:58:18.203 30.5 4

我想过滤数据帧
df

            Id          Timestamp               Data    diff1
10856167    18675685    2010-03-01 05:58:15.520 25.0    0.0
10856168    18675686    2010-03-01 05:58:16.863 26.0    1.0
10856169    18675687    2010-03-01 05:58:18.203 30.5    4.5
10856170    18675688    2010-03-01 05:58:19.543 40.5    10.0
10856171    18675689    2010-03-01 05:58:20.877 42.0    1.5
10856172    18675690    2010-03-01 05:58:22.223 43.0    1.0
10856175    18675693    2010-03-01 05:58:41.127 42.5    -0.5
10856176    18675694    2010-03-01 05:58:42.503 42.0    -0.5
10856177    18675695    2010-03-01 05:58:49.313 42.5    0.5
10856178    18675696    2010-03-01 05:58:50.663 43.0    0.5
10856181    18675699    2010-03-01 05:59:01.443 43.5    0.5
10856182    18675700    2010-03-01 05:59:02.797 42.0    -1.5
10856183    18675701    2010-03-01 05:59:04.153 41.5    -0.5
10856184    18675702    2010-03-01 05:59:05.497 41.0    -0.5
10856185    18675703    2010-03-01 05:59:29.880 41.5    0.5
10856186    18675704    2010-03-01 05:59:31.220 42.0    0.5
10856191    18675709    2010-03-01 05:59:42.053 42.5    0.5
10856192    18675710    2010-03-01 05:59:43.407 43.0    0.5
10856193    18675711    2010-03-01 05:59:44.753 42.0    -1.0
10856218    18675736    2010-03-01 06:05:21.360 41.5    -0.5
通过比较当前行和下一行的
df['Data']
。如果值之间的绝对差值大于1,则保留下一行,该行将成为新的当前行;否则,删除下一行并与下一行进行比较,直到找到满足条件的行。我尝试了
diff()
shift()
但它们只比较相邻的行

因此,预期产出:


            Id          Timestamp               Data    diff1
10856167    18675685    2010-03-01 05:58:15.520 25.0    0.0

10856169    18675687    2010-03-01 05:58:18.203 30.5    4.5
10856170    18675688    2010-03-01 05:58:19.543 40.5    10.0
10856171    18675689    2010-03-01 05:58:20.877 42.0    1.5





10856181    18675699    2010-03-01 05:59:01.443 43.5    0.5
10856182    18675700    2010-03-01 05:59:02.797 42.0    -1.5

最好的方法是什么


更新

尝试:

from numba import njit
@njit
def f(x, lim):
    total = x[0]
    result = np.empty(len(x), dtype=bool)
    result[0] = True
    for j,i in enumerate(x[1:], 1):
        if abs(total - i) <= lim:
            result[j] = False
        else:
            total = i
            result[j] = True

    return result

N = 1
df1 = sample[f(sample.Data.values, N)]
print(df1)
来自numba import njit
@njit
def f(x,lim):
总计=x[0]
结果=np.empty(len(x),dtype=bool)
结果[0]=真
对于j,i在枚举中(x[1:],1):
如果abs(total-i)18 df1=样本[f(sample.Data.values,N)]
19打印(df1)
~/opt/anaconda3/lib/python3.7/site-packages/numba/core/dispatcher.py in(编译)for(self,*args,**kws)
399 e.patch_信息(msg)
400
-->401错误\u重写(例如,“键入”)
402错误除外。不支持错误为e:
403#用户代码中存在不支持的内容,请添加帮助信息
~/opt/anaconda3/lib/python3.7/site-packages/numba/core/dispatcher.py出错\u重写(e,问题类型)
342上升e
343其他:
-->344重新拍卖(e类、e类、无)
345
346 argtypes=[]
重新发布中的~/opt/anaconda3/lib/python3.7/site-packages/numba/core/utils.py(tp、value、tb)
77值=tp()
78如果值.\uuuu回溯\uuuuu不是tb:
--->79通过回溯(tb)提升值
80提高价值
81
TypingError:在nopython模式管道中失败(步骤:nopython前端)
非精确类型数组(pyobject,1d,C)
[1] 期间:在(5)处键入参数
文件“”,第5行:
def f(x,lim):
总计=x[0]
^
如果性能很重要,我认为这是处理循环的方法:

from numba import njit
@njit
def f(x, lim):
    total = x[0]
    result = np.empty(len(x), dtype=np.bool8)
    result[0] = True
    for j,i in enumerate(x[1:], 1):
        if abs(total - i) <= lim:
            result[j] = False
        else:
            total = i
            result[j] = True

    return result

N = 1
df1 = sample[f(sample.Data.values, N)]
print(df1)
                Id                Timestamp  Data  diff1
10856167  18675685  2010-03-01 05:58:15.520  25.0    0.0
10856169  18675687  2010-03-01 05:58:18.203  30.5    4.5
10856170  18675688  2010-03-01 05:58:19.543  40.5   10.0
10856171  18675689  2010-03-01 05:58:20.877  42.0    1.5
10856181  18675699  2010-03-01 05:59:01.443  43.5    0.5
10856182  18675700  2010-03-01 05:59:02.797  42.0   -1.5    
来自numba import njit
@njit
def f(x,lim):
总计=x[0]
结果=np.empty(len(x),dtype=np.bool8)
结果[0]=真
对于j,i在枚举中(x[1:],1):

如果abs(total-i)@jezrael,请参见编辑的问题以获取更多数据。非常感谢。谢谢你的解决方案。它返回了一个错误-请参阅问题更新。@NilsineLabre-已测试,问题似乎
dtype=bool
,需要
dtype=np.bool8
太棒了,它可以工作了!请您对代码进行一些解释,特别是enumerate(x[1:],1)
中j,i的行
?@nilsinelabore-这意味着处理所有行时不使用第一个,因为第一个总是正确的。因此,
x[1://code>省略第一个值并枚举第一个值是
1
,因为
,1)
嗨,耶兹雷尔,你能看看这个问题吗?谢谢你的回答。我认为这与预期产出不同。我正在寻找下一行,其中它的
数据
与当前数据的差异大于1。但是
输出中的最后两行
的差异小于1。
from numba import njit
@njit
def f(x, lim):
    total = x[0]
    result = np.empty(len(x), dtype=np.bool8)
    result[0] = True
    for j,i in enumerate(x[1:], 1):
        if abs(total - i) <= lim:
            result[j] = False
        else:
            total = i
            result[j] = True

    return result

N = 1
df1 = sample[f(sample.Data.values, N)]
print(df1)
                Id                Timestamp  Data  diff1
10856167  18675685  2010-03-01 05:58:15.520  25.0    0.0
10856169  18675687  2010-03-01 05:58:18.203  30.5    4.5
10856170  18675688  2010-03-01 05:58:19.543  40.5   10.0
10856171  18675689  2010-03-01 05:58:20.877  42.0    1.5
10856181  18675699  2010-03-01 05:59:01.443  43.5    0.5
10856182  18675700  2010-03-01 05:59:02.797  42.0   -1.5    
i = 0
for row in range(2, len(df)):
    i += 1
    if i <= len(df) - 1:
        if -1 <= df.iloc[i,:]['Data'] - df.iloc[i-1,:]['Data'] <= 1:
            df.iloc[i,3] = ''
df.loc[df['diff1'] == '',:] = ''
            Id          Timestamp               Data diff1
10856167    18675685    2010-03-01 05:58:15.520 25   0
10856168                
10856169    18675687    2010-03-01 05:58:18.203 30.5 4.5
10856170    18675688    2010-03-01 05:58:19.543 40.5 10
10856171    18675689    2010-03-01 05:58:20.877 42   1.5
10856172                
10856175                
10856176                
10856177                
10856178                
10856181                
10856182    18675700    2010-03-01 05:59:02.797 42  -1.5
10856183                
10856184                
10856185                
10856186                
10856191                
10856192                
10856193                
10856218    18675736    2010-03-01 06:05:21.360 41.5 -0.5