Python 3.x 如何根据列值的范围提取行_Python 3.x_Pandas_Numpy_Pandas Groupby

Python 3.x 如何根据列值的范围提取行

python-3.x pandas numpy

Python 3.x 如何根据列值的范围提取行,python-3.x,pandas,numpy,pandas-groupby,Python 3.x,Pandas,Numpy,Pandas Groupby,我有一个dataframe，其中包含value、ID、distance和distance2列。我想在列距离或距离2的值从0变为距离列4000到5000的值范围时提取上一行，在值从0变为3000到4000的范围时提取距离2列的上一行这是我的例子 df=pd.DataFrame({'value':[3,4,7,8,11,20,15,20,15,16], 'ID':[2,2,8,8,8,2,2,2,5,5], 'distance':[0,0,0,4

我有一个dataframe，其中包含value、ID、distance和distance2列。我想在列距离或距离2的值从0变为距离列4000到5000的值范围时提取上一行，在值从0变为3000到4000的范围时提取距离2列的上一行

这是我的例子

df=pd.DataFrame({'value':[3,4,7,8,11,20,15,20,15,16],
             'ID':[2,2,8,8,8,2,2,2,5,5],
             'distance':[0,0,0,4008,0,0,4820,0,0,0],'distance2':[0,0,0,3006,0,0,0,1,3990,0]})





    value  ID  distance  distance2
0      3   2         0          0
1      4   2         0          0
2      7   8         0          0
3      8   8      4008       3006
4     11   8         0          0
5     20   2         0          0
6     15   2      4820          0
7     20   2         0          1
8     15   5         0       3990
9     16   5         0          0
desired output

  value  ID  distance  distance2
0      7   8      4008       3006
1     20   2      4820          0
2     20   2         0       3990

我试图修改来自的已接受答案，这似乎有效：

row_iterator = df.iterrows()
_, last = next(row_iterator)
df_new = []

for index, row in row_iterator:
    if ((4000 < row.distance < 5000) & (last.distance == 0)) | ((3000 < row.distance2 < 4000) & (last.distance2 == 0)):
        df_new.append([last.value, last.ID, row.distance, row.distance2])
    last = row
df_new = pd.DataFrame(df_new, columns=df.columns)

row\u迭代器=df.iterrows（）
_，last=next（行迭代器）
df_new=[]
对于索引，行迭代器中的行：
如果（（4000

我试图修改来自的已接受答案，这似乎有效：

row_iterator = df.iterrows()
_, last = next(row_iterator)
df_new = []

for index, row in row_iterator:
    if ((4000 < row.distance < 5000) & (last.distance == 0)) | ((3000 < row.distance2 < 4000) & (last.distance2 == 0)):
        df_new.append([last.value, last.ID, row.distance, row.distance2])
    last = row
df_new = pd.DataFrame(df_new, columns=df.columns)

row\u迭代器=df.iterrows（）
_，last=next（行迭代器）
df_new=[]
对于索引，行迭代器中的行：
如果（（4000

使用此选项可以获得所需的输出，但当我将此选项用于原始数据时，过程会很慢，因为我有数百万行的数据，这是正确的，但与前面使用

.diff（）的答案相比，使用上述数据时，此选项实际上运行得更快。我不知道可能是因为数据太小。使用此选项可以获得所需的输出，但当我将此选项用于原始数据时，过程会很慢，因为我有数百万行的数据。这是真的，但与前面使用.diff（）
的答案相比，使用上述数据时，此选项实际上运行得更快。我不知道可能是因为数据太小了。