如何使用start&;一个df&;的结束时间范围;找到起始位置(&;使用python从另一个df结束车辆每个槽的loc?

如何使用start&;一个df&;的结束时间范围;找到起始位置(&;使用python从另一个df结束车辆每个槽的loc?,python,pandas,Python,Pandas,我有一个数据帧:df1: 这是车辆的插槽: CompanyID RegistrationNo slotStartTime slotEndTime 1 602 veh1 2020-07-27 21:12:00 2020-07-27 22:12:00 2 602 veh1 2020-07-27 21:30:00 2020-07-27 22:30:00 3 602

我有一个数据帧:df1: 这是车辆的插槽:

    CompanyID   RegistrationNo  slotStartTime           slotEndTime
1   602         veh1            2020-07-27 21:12:00 2020-07-27 22:12:00
2   602         veh1            2020-07-27 21:30:00 2020-07-27 22:30:00
3   602         veh2            2020-07-28 22:16:00 2020-07-28 23:16:00
另:df2: 从这些数据中,我想找到插槽的开始位置和结束位置

    RegistrationNo  GPSTime         Location
0   veh1            2020-07-27 21:12:00 loc1
1   veh1            2020-07-27 21:15:00 loc2
2   veh1            2020-07-27 21:20:00 loc3
3   veh1            2020-07-27 21:30:00 loc4
4   veh1            2020-07-27 21:45:00 loc5
5   veh1            2020-07-27 22:15:00 loc6
6   veh1            2020-07-27 22:29:00 loc7
4   veh2            2020-07-28 21:45:00 loc8
5   veh2            2020-07-28 22:15:00 loc9
6   veh2            2020-07-28 22:29:00 loc10 
7   veh2            2020-07-28 22:50:00 loc11 
7   veh2            2020-07-28 23:16:00 loc12 
预期结果:

    CompanyID   RegistrationNo  slotStartTime           slotEndTime      slotStartloc slotEndLoc
1   602         veh1            2020-07-27 21:12:00 2020-07-27 22:12:00  loc1         loc5
2   602         veh1            2020-07-27 21:30:00 2020-07-27 22:30:00  loc4         loc7
3   602         veh2            2020-07-28 22:16:00 2020-07-28 23:16:00  loc10        loc12
我尝试过使用group by daterange,但我猜bcoz涉及到另一个df,它不工作并抛出错误({
df=pd.DataFrame({
    'CompanyID':[602,602,202],
    'RegistrationNo':['veh1','veh1','veh2'],
    'slotStartTime':['2020-07-27 21:12:00','2020-07-27 21:30:00',
                     '2020-07-28 22:16:00'],
    'slotEndTime':['2020-07-27 22:12:00','2020-07-27 22:30:00',
                   '2020-07-28 23:16:00']
})
df2=pd.DataFrame({
    'RegistrationNo':['veh1','veh1','veh1','veh1','veh1','veh1','veh1',
                      'veh2','veh2','veh2','veh2','veh2'],
    'GPSTime':['2020-07-27 21:12:00','2020-07-27 21:15:00',
                     '2020-07-27 21:20:00','2020-07-27 21:30:00',
               '2020-07-27 21:45:00','2020-07-27 22:15:00','2020-07-27 22:29:00',
               '2020-07-28 21:45:00','2020-07-28 22:15:00','2020-07-28 22:29:00',
               '2020-07-28 22:50:00','2020-07-28 23:16:00'],
    'location':['loc1','loc2','loc3','loc4','loc5','loc6','loc7','loc8',
                'loc9','loc10','loc11','loc12',]
})

df['slotStartTime']=pd.to_datetime(df['slotStartTime'])
df['slotEndTime']=pd.to_datetime(df['slotEndTime'])
df2['GPSTime']=pd.to_datetime(df2['GPSTime'])

#for each for in df merge df2 to get you time range start-end
#then take first and last row to get start loc and end loc
result=pd.DataFrame()

for index,row in df.iterrows():
    row= pd.DataFrame(row).T
    df_main = pd.merge(row,df2,on='RegistrationNo')
    # print(df_main)
    df_main = df_main[
        (df_main.slotEndTime>df_main.slotStartTime) &
        (df_main.slotStartTime<=df_main.GPSTime)&
        (df_main.GPSTime<=df_main.slotEndTime)
    ].sort_values(by=['slotStartTime','RegistrationNo'])
    df_main['start_loc'] = df_main.iloc[0]['location']
    df_main['end_loc'] = df_main.iloc[-1]['location']
    result = result.append(df_main)
#here you have 'result' DF with locations, now you need to assign them to original DF

df=df.merge(result,on=['slotStartTime','slotEndTime','RegistrationNo','CompanyID'],how='inner').drop_duplicates(
    keep='last',subset=['slotStartTime','slotEndTime','RegistrationNo']
)
del df['location']
print(df)
   CompanyID RegistrationNo       slotStartTime         slotEndTime             GPSTime start_loc end_loc
4        602           veh1 2020-07-27 21:12:00 2020-07-27 22:12:00 2020-07-27 21:45:00      loc1    loc5
8        602           veh1 2020-07-27 21:30:00 2020-07-27 22:30:00 2020-07-27 22:29:00      loc4    loc7
11       202           veh2 2020-07-28 22:16:00 2020-07-28 23:16:00 2020-07-28 23:16:00     loc10   loc12
'公司ID':[602202], '注册号':['veh1','veh1','veh2'], “slotStartTime:[“2020-07-27 21:12:00”,“2020-07-27 21:30:00”, '2020-07-28 22:16:00'], “慢腾腾时间”:[“2020-07-27 22:12:00”,“2020-07-27 22:30:00”, '2020-07-28 23:16:00'] }) df2=pd.DataFrame({ ‘注册号’:[‘veh1’、‘veh1’、‘veh1’、‘veh1’、‘veh1’、‘veh1’、‘veh1’、‘veh1’, “veh2”、“veh2”、“veh2”、“veh2”、“veh2”], “GPSTime”:[“2020-07-27 21:12:00”,“2020-07-27 21:15:00”, '2020-07-27 21:20:00','2020-07-27 21:30:00', '2020-07-27 21:45:00','2020-07-27 22:15:00','2020-07-27 22:29:00', '2020-07-28 21:45:00','2020-07-28 22:15:00','2020-07-28 22:29:00', '2020-07-28 22:50:00','2020-07-28 23:16:00'], ‘位置’:[‘loc1’、‘loc2’、‘loc3’、‘loc4’、‘loc5’、‘loc6’、‘loc7’、‘loc8’, ‘loc9’、‘loc10’、‘loc11’、‘loc12’、] }) df['slotStartTime']=pd.to_datetime(df['slotStartTime']) df['slotEndTime']=pd.to_datetime(df['slotEndTime'])) df2['GPSTime']=pd.to_datetime(df2['GPSTime']) #对于df合并df2中的每个for,以获得时间范围的开始和结束 #然后取第一行和最后一行以获得起始位置和结束位置 结果=pd.DataFrame() 对于索引,df.iterrows()中的行: row=pd.DataFrame(row).T df_main=pd.merge(行,df2,on='RegistrationNo') #打印(df_主) df_main=df_main[ (df_main.slotEndTime>df_main.slotStartTime)&
(df_main.slotStartTime这里是一种使用
iterrows()
并使用
写入数据帧的方法。在[]

df['start_loc'] = ''
df['end_loc'] = ''

for index, row in df.iterrows():
    start = row.slotStartTime
    end = row.slotEndTime
    reg = row.RegistrationNo
    
    mask = ((df2['RegistrationNo'] == reg) & 
            (start <= df2['GPSTime']) & (df2['GPSTime'] <= end))
    
    df.at[index, 'start_loc'] = df2.loc[mask, 'location'].min()
    df.at[index, 'end_loc']   = df2.loc[mask, 'location'].max()
    
print(df[['start_loc', 'end_loc']])   # other columns omitted to save space

  start_loc end_loc
0      loc1    loc5
1      loc4    loc7
2     loc10   loc12
df['start_loc']='
df['end_loc']='
对于索引,df.iterrows()中的行:
开始=行。开始时间
结束=行。时间
reg=行。注册号
掩码=((df2['RegistrationNo']==reg)和

(开始你怎么知道,你用什么规则来分配veh2的开始时间为
loc10
?它有
22:16:00
开始时间,loc10 GPS是
22:29
@sygneto bcoz这是veh2ok插槽的第一个值,看看我的答案
df['start_loc'] = ''
df['end_loc'] = ''

for index, row in df.iterrows():
    start = row.slotStartTime
    end = row.slotEndTime
    reg = row.RegistrationNo
    
    mask = ((df2['RegistrationNo'] == reg) & 
            (start <= df2['GPSTime']) & (df2['GPSTime'] <= end))
    
    df.at[index, 'start_loc'] = df2.loc[mask, 'location'].min()
    df.at[index, 'end_loc']   = df2.loc[mask, 'location'].max()
    
print(df[['start_loc', 'end_loc']])   # other columns omitted to save space

  start_loc end_loc
0      loc1    loc5
1      loc4    loc7
2     loc10   loc12