如何使用start&;一个df&;的结束时间范围;找到起始位置(&;使用python从另一个df结束车辆每个槽的loc?
我有一个数据帧:df1: 这是车辆的插槽:如何使用start&;一个df&;的结束时间范围;找到起始位置(&;使用python从另一个df结束车辆每个槽的loc?,python,pandas,Python,Pandas,我有一个数据帧:df1: 这是车辆的插槽: CompanyID RegistrationNo slotStartTime slotEndTime 1 602 veh1 2020-07-27 21:12:00 2020-07-27 22:12:00 2 602 veh1 2020-07-27 21:30:00 2020-07-27 22:30:00 3 602
CompanyID RegistrationNo slotStartTime slotEndTime
1 602 veh1 2020-07-27 21:12:00 2020-07-27 22:12:00
2 602 veh1 2020-07-27 21:30:00 2020-07-27 22:30:00
3 602 veh2 2020-07-28 22:16:00 2020-07-28 23:16:00
另:df2:
从这些数据中,我想找到插槽的开始位置和结束位置
RegistrationNo GPSTime Location
0 veh1 2020-07-27 21:12:00 loc1
1 veh1 2020-07-27 21:15:00 loc2
2 veh1 2020-07-27 21:20:00 loc3
3 veh1 2020-07-27 21:30:00 loc4
4 veh1 2020-07-27 21:45:00 loc5
5 veh1 2020-07-27 22:15:00 loc6
6 veh1 2020-07-27 22:29:00 loc7
4 veh2 2020-07-28 21:45:00 loc8
5 veh2 2020-07-28 22:15:00 loc9
6 veh2 2020-07-28 22:29:00 loc10
7 veh2 2020-07-28 22:50:00 loc11
7 veh2 2020-07-28 23:16:00 loc12
预期结果:
CompanyID RegistrationNo slotStartTime slotEndTime slotStartloc slotEndLoc
1 602 veh1 2020-07-27 21:12:00 2020-07-27 22:12:00 loc1 loc5
2 602 veh1 2020-07-27 21:30:00 2020-07-27 22:30:00 loc4 loc7
3 602 veh2 2020-07-28 22:16:00 2020-07-28 23:16:00 loc10 loc12
我尝试过使用group by daterange,但我猜bcoz涉及到另一个df,它不工作并抛出错误({
df=pd.DataFrame({
'CompanyID':[602,602,202],
'RegistrationNo':['veh1','veh1','veh2'],
'slotStartTime':['2020-07-27 21:12:00','2020-07-27 21:30:00',
'2020-07-28 22:16:00'],
'slotEndTime':['2020-07-27 22:12:00','2020-07-27 22:30:00',
'2020-07-28 23:16:00']
})
df2=pd.DataFrame({
'RegistrationNo':['veh1','veh1','veh1','veh1','veh1','veh1','veh1',
'veh2','veh2','veh2','veh2','veh2'],
'GPSTime':['2020-07-27 21:12:00','2020-07-27 21:15:00',
'2020-07-27 21:20:00','2020-07-27 21:30:00',
'2020-07-27 21:45:00','2020-07-27 22:15:00','2020-07-27 22:29:00',
'2020-07-28 21:45:00','2020-07-28 22:15:00','2020-07-28 22:29:00',
'2020-07-28 22:50:00','2020-07-28 23:16:00'],
'location':['loc1','loc2','loc3','loc4','loc5','loc6','loc7','loc8',
'loc9','loc10','loc11','loc12',]
})
df['slotStartTime']=pd.to_datetime(df['slotStartTime'])
df['slotEndTime']=pd.to_datetime(df['slotEndTime'])
df2['GPSTime']=pd.to_datetime(df2['GPSTime'])
#for each for in df merge df2 to get you time range start-end
#then take first and last row to get start loc and end loc
result=pd.DataFrame()
for index,row in df.iterrows():
row= pd.DataFrame(row).T
df_main = pd.merge(row,df2,on='RegistrationNo')
# print(df_main)
df_main = df_main[
(df_main.slotEndTime>df_main.slotStartTime) &
(df_main.slotStartTime<=df_main.GPSTime)&
(df_main.GPSTime<=df_main.slotEndTime)
].sort_values(by=['slotStartTime','RegistrationNo'])
df_main['start_loc'] = df_main.iloc[0]['location']
df_main['end_loc'] = df_main.iloc[-1]['location']
result = result.append(df_main)
#here you have 'result' DF with locations, now you need to assign them to original DF
df=df.merge(result,on=['slotStartTime','slotEndTime','RegistrationNo','CompanyID'],how='inner').drop_duplicates(
keep='last',subset=['slotStartTime','slotEndTime','RegistrationNo']
)
del df['location']
print(df)
CompanyID RegistrationNo slotStartTime slotEndTime GPSTime start_loc end_loc
4 602 veh1 2020-07-27 21:12:00 2020-07-27 22:12:00 2020-07-27 21:45:00 loc1 loc5
8 602 veh1 2020-07-27 21:30:00 2020-07-27 22:30:00 2020-07-27 22:29:00 loc4 loc7
11 202 veh2 2020-07-28 22:16:00 2020-07-28 23:16:00 2020-07-28 23:16:00 loc10 loc12
'公司ID':[602202],
'注册号':['veh1','veh1','veh2'],
“slotStartTime:[“2020-07-27 21:12:00”,“2020-07-27 21:30:00”,
'2020-07-28 22:16:00'],
“慢腾腾时间”:[“2020-07-27 22:12:00”,“2020-07-27 22:30:00”,
'2020-07-28 23:16:00']
})
df2=pd.DataFrame({
‘注册号’:[‘veh1’、‘veh1’、‘veh1’、‘veh1’、‘veh1’、‘veh1’、‘veh1’、‘veh1’,
“veh2”、“veh2”、“veh2”、“veh2”、“veh2”],
“GPSTime”:[“2020-07-27 21:12:00”,“2020-07-27 21:15:00”,
'2020-07-27 21:20:00','2020-07-27 21:30:00',
'2020-07-27 21:45:00','2020-07-27 22:15:00','2020-07-27 22:29:00',
'2020-07-28 21:45:00','2020-07-28 22:15:00','2020-07-28 22:29:00',
'2020-07-28 22:50:00','2020-07-28 23:16:00'],
‘位置’:[‘loc1’、‘loc2’、‘loc3’、‘loc4’、‘loc5’、‘loc6’、‘loc7’、‘loc8’,
‘loc9’、‘loc10’、‘loc11’、‘loc12’、]
})
df['slotStartTime']=pd.to_datetime(df['slotStartTime'])
df['slotEndTime']=pd.to_datetime(df['slotEndTime']))
df2['GPSTime']=pd.to_datetime(df2['GPSTime'])
#对于df合并df2中的每个for,以获得时间范围的开始和结束
#然后取第一行和最后一行以获得起始位置和结束位置
结果=pd.DataFrame()
对于索引,df.iterrows()中的行:
row=pd.DataFrame(row).T
df_main=pd.merge(行,df2,on='RegistrationNo')
#打印(df_主)
df_main=df_main[
(df_main.slotEndTime>df_main.slotStartTime)&
(df_main.slotStartTime这里是一种使用
iterrows()
并使用写入数据帧的方法。在[]
:
df['start_loc'] = ''
df['end_loc'] = ''
for index, row in df.iterrows():
start = row.slotStartTime
end = row.slotEndTime
reg = row.RegistrationNo
mask = ((df2['RegistrationNo'] == reg) &
(start <= df2['GPSTime']) & (df2['GPSTime'] <= end))
df.at[index, 'start_loc'] = df2.loc[mask, 'location'].min()
df.at[index, 'end_loc'] = df2.loc[mask, 'location'].max()
print(df[['start_loc', 'end_loc']]) # other columns omitted to save space
start_loc end_loc
0 loc1 loc5
1 loc4 loc7
2 loc10 loc12
df['start_loc']='
df['end_loc']='
对于索引,df.iterrows()中的行:
开始=行。开始时间
结束=行。时间
reg=行。注册号
掩码=((df2['RegistrationNo']==reg)和
(开始你怎么知道,你用什么规则来分配veh2的开始时间为loc10
?它有22:16:00
开始时间,loc10 GPS是22:29
@sygneto bcoz这是veh2ok插槽的第一个值,看看我的答案
df['start_loc'] = ''
df['end_loc'] = ''
for index, row in df.iterrows():
start = row.slotStartTime
end = row.slotEndTime
reg = row.RegistrationNo
mask = ((df2['RegistrationNo'] == reg) &
(start <= df2['GPSTime']) & (df2['GPSTime'] <= end))
df.at[index, 'start_loc'] = df2.loc[mask, 'location'].min()
df.at[index, 'end_loc'] = df2.loc[mask, 'location'].max()
print(df[['start_loc', 'end_loc']]) # other columns omitted to save space
start_loc end_loc
0 loc1 loc5
1 loc4 loc7
2 loc10 loc12