使用时间戳的子集df-Python
我希望使用特定的时间戳和额外的时间段来子集df。使用下面的命令,使用时间戳的子集df-Python,python,pandas,Python,Pandas,我希望使用特定的时间戳和额外的时间段来子集df。使用下面的命令,df包含我要用于子集df2的特定时间戳。基本上,我使用df中的时间戳并确定前一分钟。然后使用这些时间段创建单独的df,这些df连接在一起以创建最终df 然而,这本身是低效的,但在处理多次时更是如此 import pandas as pd df = pd.DataFrame({ 'Time' : ['2020-08-02 10:01:12.5','2020-08-02 11:01:12.5','2020-08-
df
包含我要用于子集df2
的特定时间戳。基本上,我使用df中的时间戳并确定前一分钟。然后使用这些时间段创建单独的df,这些df连接在一起以创建最终df
然而,这本身是低效的,但在处理多次时更是如此
import pandas as pd
df = pd.DataFrame({
'Time' : ['2020-08-02 10:01:12.5','2020-08-02 11:01:12.5','2020-08-02 12:31:00.0','2020-08-02 12:41:22.6'],
'ID' : ['X','Y','B','X'],
})
# 1 min before timestamp
'2020-08-02 10:00:12.5'
# first timestamp
'2020-08-02 10:01:12.5'
# 1 min before timestamp
'2020-08-02 11:00:02.1'
# second timestamp
'2020-08-02 11:01:02.1'
df2 = pd.DataFrame({
'Time' : ['2020-08-02 10:00:00.1','2020-08-02 10:00:00.2','2020-08-02 10:00:00.3','2020-08-02 10:00:00.4'],
'ID' : ['','','',''],
})
d1 = df2[(df2['Time'] > '2020-08-02 10:00:12.5') & (df2['Time'] <= '2020-08-02 10:01:12.5')]
d2 = df2[(df2['Time'] > '2020-08-02 11:00:02.1') & (df2['Time'] <= '2020-08-02 11:01:02.1')]
df_out = pd.concat([d1,d2])#...include all separate periods of time
熊猫身上有一种方法可以做到这一点
让我使用与原始文章稍有不同的时间戳,以使其更易于说明。为了本例的目的,我将在10:01
、10:03
和10:06
设置df1
时间戳
让我们在时间
前一分钟将1MinBefore
列添加到df
(我们稍后将使用它来合并数据帧):
因此,我们的df
是:
Time ID 1MinBefore
0 2020-08-02 10:01:00 X 2020-08-02 10:00:00
1 2020-08-02 10:03:00 Y 2020-08-02 10:02:00
2 2020-08-02 10:06:00 Z 2020-08-02 10:05:00
让我们使用10:00
和10:07
之间的范围,对于df2
,间隔为30秒:
df2 = pd.DataFrame({
'Time' : pd.date_range(
start='2020-08-02 10:00:00',
end='2020-08-02 10:07:00',
freq='30s'),
'ID' : '',
})
现在是关键步骤,将这些数据帧与merge\u asof
合并:
pd.merge_asof(df2[['Time']], df[['ID', '1MinBefore']],
left_on='Time', right_on='1MinBefore',
tolerance=pd.Timedelta('1min')
输出:
Time ID 1MinBefore
0 2020-08-02 10:00:00 X 2020-08-02 10:00:00
1 2020-08-02 10:00:30 X 2020-08-02 10:00:00
2 2020-08-02 10:01:00 X 2020-08-02 10:00:00
3 2020-08-02 10:01:30 NaN NaT
4 2020-08-02 10:02:00 Y 2020-08-02 10:02:00
5 2020-08-02 10:02:30 Y 2020-08-02 10:02:00
6 2020-08-02 10:03:00 Y 2020-08-02 10:02:00
7 2020-08-02 10:03:30 NaN NaT
8 2020-08-02 10:04:00 NaN NaT
9 2020-08-02 10:04:30 NaN NaT
10 2020-08-02 10:05:00 Z 2020-08-02 10:05:00
11 2020-08-02 10:05:30 Z 2020-08-02 10:05:00
12 2020-08-02 10:06:00 Z 2020-08-02 10:05:00
13 2020-08-02 10:06:30 NaN NaT
14 2020-08-02 10:07:00 NaN NaT
1分钟的公差
参数基本上告诉它应该忽略df中大于1分钟的值
现在我们当然可以在
列之前删除1,并在ID
列上使用fillna
,使其与原始帖子中的预期输出
完全相同。谢谢。好方法。
pd.merge_asof(df2[['Time']], df[['ID', '1MinBefore']],
left_on='Time', right_on='1MinBefore',
tolerance=pd.Timedelta('1min')
Time ID 1MinBefore
0 2020-08-02 10:00:00 X 2020-08-02 10:00:00
1 2020-08-02 10:00:30 X 2020-08-02 10:00:00
2 2020-08-02 10:01:00 X 2020-08-02 10:00:00
3 2020-08-02 10:01:30 NaN NaT
4 2020-08-02 10:02:00 Y 2020-08-02 10:02:00
5 2020-08-02 10:02:30 Y 2020-08-02 10:02:00
6 2020-08-02 10:03:00 Y 2020-08-02 10:02:00
7 2020-08-02 10:03:30 NaN NaT
8 2020-08-02 10:04:00 NaN NaT
9 2020-08-02 10:04:30 NaN NaT
10 2020-08-02 10:05:00 Z 2020-08-02 10:05:00
11 2020-08-02 10:05:30 Z 2020-08-02 10:05:00
12 2020-08-02 10:06:00 Z 2020-08-02 10:05:00
13 2020-08-02 10:06:30 NaN NaT
14 2020-08-02 10:07:00 NaN NaT