Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/329.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何使用diff选择接近但在未知范围内的时间?_Python_Pandas - Fatal编程技术网

Python 如何使用diff选择接近但在未知范围内的时间?

Python 如何使用diff选择接近但在未知范围内的时间?,python,pandas,Python,Pandas,以下数据集以到达特定公交车站的公交车的gps时间戳为特征。当公共汽车在车站空闲时,gps发射器继续以半定期增量发送数据 我正试图从这一个公交车站编辑每辆公交车的发车时间。然而,复杂的因素是,相同的公交车可能每隔大约2小时重复一次路线 在下面的数据框中,如果总线NYCT_1202在10:01:19行0停止,并一直停留在10:11:48行1,我想以某种方式选择10:11:48 同样,两个小时后,当同一辆公共汽车在12:51:31再次在2排到达车站时,它会“空转”(可能是停止服务),直到13:51:0

以下数据集以到达特定公交车站的公交车的gps时间戳为特征。当公共汽车在车站空闲时,gps发射器继续以半定期增量发送数据

我正试图从这一个公交车站编辑每辆公交车的发车时间。然而,复杂的因素是,相同的公交车可能每隔大约2小时重复一次路线

在下面的数据框中,如果总线
NYCT_1202
10:01:19
0
停止,并一直停留在
10:11:48
1
,我想以某种方式选择
10:11:48

同样,两个小时后,当同一辆公共汽车在
12:51:31再次在
2排
到达车站时,它会“空转”(可能是停止服务),直到
13:51:02
。我想最后一次选择,
13:51:02

df = pd.DataFrame({'RecordedAtTime': {0: Timestamp('2017-08-23 10:01:19'),
  1: Timestamp('2017-08-23 10:11:48'),
  2: Timestamp('2017-08-23 12:51:31'),
  3: Timestamp('2017-08-23 13:02:02'),
  4: Timestamp('2017-08-23 13:11:27'),
  5: Timestamp('2017-08-23 13:51:35'),
  6: Timestamp('2017-08-23 16:12:27'),
  7: Timestamp('2017-08-23 16:52:25'),
  8: Timestamp('2017-08-07 09:33:42'),
  9: Timestamp('2017-08-07 10:13:36')},
 'VehicleRef': {0: 'NYCT_1202',
  1: 'NYCT_1202',
  2: 'NYCT_1202',
  3: 'NYCT_1202',
  4: 'NYCT_1202',
  5: 'NYCT_1202',
  6: 'NYCT_1202',
  7: 'NYCT_1202',
  8: 'NYCT_1206',
  9: 'NYCT_1206'}})

       RecordedAtTime VehicleRef
0 2017-08-23 10:01:19  NYCT_1202
1 2017-08-23 10:11:48  NYCT_1202 <-This Row

2 2017-08-23 12:51:31  NYCT_1202
3 2017-08-23 13:02:02  NYCT_1202
4 2017-08-23 13:11:27  NYCT_1202
5 2017-08-23 13:51:35  NYCT_1202 <-This Row

6 2017-08-23 16:12:27  NYCT_1202
7 2017-08-23 16:52:25  NYCT_1202 <-This Row

8 2017-08-07 09:33:42  NYCT_1206
9 2017-08-07 10:13:36  NYCT_1206 <-This Row
那么我可以用什么图书馆来解决这个问题呢?有没有更好的方法来使用
.diff
或者我应该以完全不同的方式来处理这个问题

import pandas as pd
from pandas import Timestamp
import datetime as datetime

# Approximate trip duration
trip_minutes = datetime.timedelta(minutes = 90)

# Ensure ordering by time grouped by vehicle
df  = df.sort_values('RecordedAtTime')
dfg = df.groupby('VehicleRef')

# Elapsed time interval is the difference, within vehicle group
df['Elapsed'] = dfg['RecordedAtTime'].diff()

# Elapsed time close to the trip time indicates a trip ending
df['isEnd'] = df['Elapsed'] > trip_minutes

# The start is the row just before the last end - use shift  within group
df['isStart'] = dfg['isEnd'].shift(-1)

# select the rows ensuring that a NaN start event is included
df[df['isStart'] != False]
结果:

       RecordedAtTime VehicleRef  Elapsed  isEnd isStart
9 2017-08-07 10:13:36  NYCT_1206 00:39:54  False     NaN
1 2017-08-23 10:11:48  NYCT_1202 00:10:29  False    True
5 2017-08-23 13:51:35  NYCT_1202 00:40:08  False    True
7 2017-08-23 16:52:25  NYCT_1202 00:39:58  False     NaN

由于某种原因,我无法复制您的输出(缺少第7行)。但是,在将
df.loc[df['eissed'].isnull(),'isEnd']=True
放在
df['isStart']=dfg['isEnd'].shift(-1)
之前,我得到的行与您的注释中的行完全相同。非常感谢!
       RecordedAtTime VehicleRef  Elapsed  isEnd isStart
9 2017-08-07 10:13:36  NYCT_1206 00:39:54  False     NaN
1 2017-08-23 10:11:48  NYCT_1202 00:10:29  False    True
5 2017-08-23 13:51:35  NYCT_1202 00:40:08  False    True
7 2017-08-23 16:52:25  NYCT_1202 00:39:58  False     NaN