Python 如何在pandas中检查datetime在两个datetime之间的位置
我有如下第一个数据帧Python 如何在pandas中检查datetime在两个datetime之间的位置,python,pandas,Python,Pandas,我有如下第一个数据帧 trans_df code price date time product 12023 71.23 01-01-2018 06:23:00 MS 12023 61 01-01-2018 07:56:00 HS 12023 71.23 01-01-2018 08:34:00 MS 12023 71.3
trans_df
code price date time product
12023 71.23 01-01-2018 06:23:00 MS
12023 61 01-01-2018 07:56:00 HS
12023 71.23 01-01-2018 08:34:00 MS
12023 71.30 01-01-2018 06:03:00 MS
12023 61 01-01-2018 11:43:00 HS
12023 71.23 01-01-2018 10:11:00 MS
12023 71.23 01-01-2018 04:23:00 MS
12023 72.23 02-01-2018 10:11:00 MS
12023 72.23 02-01-2018 04:23:00 MS
现在,我有了主价格数据框,我在这里检查trans\u df
中设置的价格是否正确,因为交易日期和时间介于effect\u date\u from
和effect\u date\u to
中该特定产品的master\u price
之间
master_price
code price effective_date_from effective_date_to time_from time_to product
12023 71.23 01-01-2018 02-01-2018 06:00:00 05:59:00 MS
12023 61 01-01-2018 02-01-2018 06:00:00 05:59:00 HS
12023 72.23 02-01-2018 03-01-2018 06:00:00 05:59:00 MS
所需的数据帧是
trans_df
code price date time product flag actual_price
12023 71.23 01-01-2018 06:23:00 MS match 71.23
12023 61 01-01-2018 07:56:00 HS match 61
12023 71.23 01-01-2018 08:34:00 MS match 71.23
12023 71.30 01-01-2018 06:03:00 MS mismatch 71.23
12023 61 01-01-2018 11:43:00 HS match 61
12023 71.23 01-01-2018 10:11:00 MS match 71.23
12023 71.23 01-01-2018 04:23:00 MS nan nan
12023 72.23 02-01-2018 10:11:00 MS match 72.23
12023 72.23 02-01-2018 04:23:00 MS match 72.23
使用:
你的约会类型是什么?另外,它们是MM-DD-YYYY吗?日期在
DD-MM-YYYY
中,我有多个日期在trans_df和master_price中,因此在连接这两个数据帧时,我们还必须使用Date?@Neil-hmmm,只需要一个datetime列,一个可行的解决方案是使用melt
。不幸的是,它会创建更多的行,所以解决方案应该是内存消耗。另一个解决方案应该是创建函数,循环每一行并匹配。这个解决方案很慢:(更新了我的问题。如果我们想使用If循环,那么它会太慢吗?@Neil-这取决于两个数据帧的大小。事务df中有2-3个lacs记录,主价格df中有~2k个条目
#convert dates with times to datetimes
master_price['effective_date_from'] = (pd.to_datetime(master_price['effective_date_from'],
format='%d-%m-%Y') +
pd.to_timedelta(master_price['time_from']))
master_price['effective_date_to'] = (pd.to_datetime(master_price['effective_date_to'],
format='%d-%m-%Y') +
pd.to_timedelta(master_price['time_to']))
trans_df['date'] = (pd.to_datetime(trans_df['date'], format='%d-%m-%Y') +
pd.to_timedelta(trans_df['time']))
#join together and filter between
df = trans_df.merge(master_price, on=['code','product'], how='left')
df = df[df.date.between(df.effective_date_from, df.effective_date_to)]
#add only filterd rows to original
df = trans_df.merge(df, on=['code','product','date','time'], how='left')
cols = ['effective_date_from', 'effective_date_to', 'time_to','time_from','price_x']
df = df.drop(cols, axis=1)
#first test missing values then match.mismatch
df['flag'] = np.select([df['price_y'].isnull(),
df['price_y'] == df['price']],
[np.nan, 'match'], default='mismatch')
df = df.rename(columns={'price_y':'actual_price'})
print (df)
code price date time product actual_price flag
0 12023 71.23 2018-01-01 06:23:00 06:23:00 MS 71.23 match
1 12023 61.00 2018-01-01 07:56:00 07:56:00 HS 61.00 match
2 12023 71.23 2018-01-01 08:34:00 08:34:00 MS 71.23 match
3 12023 71.30 2018-01-01 06:03:00 06:03:00 MS 71.23 mismatch
4 12023 61.00 2018-01-01 11:43:00 11:43:00 HS 61.00 match
5 12023 71.23 2018-01-01 10:11:00 10:11:00 MS 71.23 match
6 12023 71.23 2018-01-01 04:23:00 04:23:00 MS NaN nan
7 12023 72.23 2018-01-02 10:11:00 10:11:00 MS 72.23 match
8 12023 72.23 2018-01-02 04:23:00 04:23:00 MS 71.23 mismatch