Python 如何在数据帧中筛选具有指定条件的行,并将它们放入新的数据帧中?
test.csv的数据如下所示:Python 如何在数据帧中筛选具有指定条件的行,并将它们放入新的数据帧中?,python,pandas,Python,Pandas,test.csv的数据如下所示: staff_id,clock_time,device_id,latitude,longitude 1001,2020/9/14 04:43:00,d_1,24.59652556,118.0824644 1001,2020/9/14 05:34:40,d_1,24.59732974,118.0859631 1001,2020/9/14 06:33:34,d_1,24.73208312,118.0957197 1001,2020/9/14 08:17:29,d_1,
staff_id,clock_time,device_id,latitude,longitude
1001,2020/9/14 04:43:00,d_1,24.59652556,118.0824644
1001,2020/9/14 05:34:40,d_1,24.59732974,118.0859631
1001,2020/9/14 06:33:34,d_1,24.73208312,118.0957197
1001,2020/9/14 08:17:29,d_1,24.59222786,118.0955275
1001,2020/9/20 05:30:56,d_1,24.59689407,118.2863806
1001,2020/9/20 07:26:05,d_1,24.58237852,118.2858955
1001,2020/9/20 08:26:05,d_1,24.58237852,118.2858955
1001,2020/9/20 09:26:05,d_1,24.58237852,118.2858955
1001,2020/9/20 17:26:05,d_1,24.58237852,118.2858955
1001,2020/9/20 19:26:05,d_1,24.70237852,118.2858955
1001,2020/9/20 22:26:05,d_1,24.71237852,118.2858955
staff_id,clock_time,device_id,latitude,longitude
1001,2020/9/14 05:34:40,d_1,24.59732974,118.0859631
1001,2020/9/14 06:33:34,d_1,24.73208312,118.0957197
1001,2020/9/14 08:17:29,d_1,24.59222786,118.0955275
1001,2020/9/20 05:30:56,d_1,24.59689407,118.2863806
1001,2020/9/20 17:26:05,d_1,24.58237852,118.2858955
1001,2020/9/20 19:26:05,d_1,24.70237852,118.2858955
我想找到两个连续行的经度或纬度差大于0.1的任何行,然后将结果放入新的数据帧中
在我的示例中,第2、3、4、9、10行的纬度差大于0.1,第4、5行的经度差大于0.1
我希望新的数据帧如下所示:
staff_id,clock_time,device_id,latitude,longitude
1001,2020/9/14 04:43:00,d_1,24.59652556,118.0824644
1001,2020/9/14 05:34:40,d_1,24.59732974,118.0859631
1001,2020/9/14 06:33:34,d_1,24.73208312,118.0957197
1001,2020/9/14 08:17:29,d_1,24.59222786,118.0955275
1001,2020/9/20 05:30:56,d_1,24.59689407,118.2863806
1001,2020/9/20 07:26:05,d_1,24.58237852,118.2858955
1001,2020/9/20 08:26:05,d_1,24.58237852,118.2858955
1001,2020/9/20 09:26:05,d_1,24.58237852,118.2858955
1001,2020/9/20 17:26:05,d_1,24.58237852,118.2858955
1001,2020/9/20 19:26:05,d_1,24.70237852,118.2858955
1001,2020/9/20 22:26:05,d_1,24.71237852,118.2858955
staff_id,clock_time,device_id,latitude,longitude
1001,2020/9/14 05:34:40,d_1,24.59732974,118.0859631
1001,2020/9/14 06:33:34,d_1,24.73208312,118.0957197
1001,2020/9/14 08:17:29,d_1,24.59222786,118.0955275
1001,2020/9/20 05:30:56,d_1,24.59689407,118.2863806
1001,2020/9/20 17:26:05,d_1,24.58237852,118.2858955
1001,2020/9/20 19:26:05,d_1,24.70237852,118.2858955
我的代码:
import pandas as pd
df = pd.read_csv(r'E:/test.csv', encoding='utf-8', parse_dates=[1])
m1 = df[['latitude', 'longitude']].diff().abs().gt(0.1)
m2 = df[['latitude', 'longitude']].shift().diff().abs().gt(0.1)
new_dataframe = [...]
如何操作?用于将布尔值的数据帧转换为系列
,并用于移位掩码添加,使用
链进行按位或
操作,最后一次添加用于避免警告,如果过滤后将以某种方式处理新数据帧
:
m1 = df[['latitude', 'longitude']].diff().abs().gt(0.1).any(axis=1)
new_dataframe = df[m1 | m1.shift(-1)].copy()
print (new_dataframe)
staff_id clock_time device_id latitude longitude
1 1001 2020/9/14 05:34:40 d_1 24.597330 118.085963
2 1001 2020/9/14 06:33:34 d_1 24.732083 118.095720
3 1001 2020/9/14 08:17:29 d_1 24.592228 118.095527
4 1001 2020/9/20 05:30:56 d_1 24.596894 118.286381
8 1001 2020/9/20 17:26:05 d_1 24.582379 118.285896
9 1001 2020/9/20 19:26:05 d_1 24.702379 118.285896
用于将布尔值的数据帧转换为系列
,并用于移位掩码添加,按位或
使用
链,最后一次添加用于避免警告,如果过滤后将以某种方式处理新数据帧
:
m1 = df[['latitude', 'longitude']].diff().abs().gt(0.1).any(axis=1)
new_dataframe = df[m1 | m1.shift(-1)].copy()
print (new_dataframe)
staff_id clock_time device_id latitude longitude
1 1001 2020/9/14 05:34:40 d_1 24.597330 118.085963
2 1001 2020/9/14 06:33:34 d_1 24.732083 118.095720
3 1001 2020/9/14 08:17:29 d_1 24.592228 118.095527
4 1001 2020/9/20 05:30:56 d_1 24.596894 118.286381
8 1001 2020/9/20 17:26:05 d_1 24.582379 118.285896
9 1001 2020/9/20 19:26:05 d_1 24.702379 118.285896
我可以使用
new|u dataframe=df[m1 | m1.shift(-1)]
?我可以使用new|u dataframe=df[m1 | m1.shift(-1)]
?