Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/294.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用时间戳的子集df-Python_Python_Pandas - Fatal编程技术网

使用时间戳的子集df-Python

使用时间戳的子集df-Python,python,pandas,Python,Pandas,我希望使用特定的时间戳和额外的时间段来子集df。使用下面的命令,df包含我要用于子集df2的特定时间戳。基本上,我使用df中的时间戳并确定前一分钟。然后使用这些时间段创建单独的df,这些df连接在一起以创建最终df 然而,这本身是低效的,但在处理多次时更是如此 import pandas as pd df = pd.DataFrame({ 'Time' : ['2020-08-02 10:01:12.5','2020-08-02 11:01:12.5','2020-08-

我希望使用特定的时间戳和额外的时间段来子集df。使用下面的命令,
df
包含我要用于子集
df2
的特定时间戳。基本上,我使用df中的时间戳并确定前一分钟。然后使用这些时间段创建单独的df,这些df连接在一起以创建最终df

然而,这本身是低效的,但在处理多次时更是如此

import pandas as pd

df = pd.DataFrame({   
        'Time' : ['2020-08-02 10:01:12.5','2020-08-02 11:01:12.5','2020-08-02 12:31:00.0','2020-08-02 12:41:22.6'],             
        'ID' : ['X','Y','B','X'],                 
    })

# 1 min before timestamp
'2020-08-02 10:00:12.5' 
# first timestamp
'2020-08-02 10:01:12.5' 

# 1 min before timestamp
'2020-08-02 11:00:02.1' 
# second timestamp
'2020-08-02 11:01:02.1' 

 df2 = pd.DataFrame({   
        'Time' : ['2020-08-02 10:00:00.1','2020-08-02 10:00:00.2','2020-08-02 10:00:00.3','2020-08-02 10:00:00.4'],             
        'ID' : ['','','',''],                 
    })

d1 = df2[(df2['Time'] > '2020-08-02 10:00:12.5') & (df2['Time'] <= '2020-08-02 10:01:12.5')]
d2 = df2[(df2['Time'] > '2020-08-02 11:00:02.1') & (df2['Time'] <= '2020-08-02 11:01:02.1')]

df_out = pd.concat([d1,d2])#...include all separate periods of time
熊猫身上有一种方法可以做到这一点

让我使用与原始文章稍有不同的时间戳,以使其更易于说明。为了本例的目的,我将在
10:01
10:03
10:06
设置
df1
时间戳

让我们在
时间
前一分钟将
1MinBefore
列添加到
df
(我们稍后将使用它来合并数据帧):

因此,我们的
df
是:

                 Time ID          1MinBefore
0 2020-08-02 10:01:00  X 2020-08-02 10:00:00
1 2020-08-02 10:03:00  Y 2020-08-02 10:02:00
2 2020-08-02 10:06:00  Z 2020-08-02 10:05:00
让我们使用
10:00
10:07
之间的范围,对于
df2
,间隔为30秒:

df2 = pd.DataFrame({   
    'Time' : pd.date_range(
        start='2020-08-02 10:00:00',
        end='2020-08-02 10:07:00',
        freq='30s'),
    'ID' : '',
})
现在是关键步骤,将这些数据帧与
merge\u asof
合并:

pd.merge_asof(df2[['Time']], df[['ID', '1MinBefore']],
              left_on='Time', right_on='1MinBefore',
              tolerance=pd.Timedelta('1min')
输出:

                  Time   ID          1MinBefore
0  2020-08-02 10:00:00    X 2020-08-02 10:00:00
1  2020-08-02 10:00:30    X 2020-08-02 10:00:00
2  2020-08-02 10:01:00    X 2020-08-02 10:00:00
3  2020-08-02 10:01:30  NaN                 NaT
4  2020-08-02 10:02:00    Y 2020-08-02 10:02:00
5  2020-08-02 10:02:30    Y 2020-08-02 10:02:00
6  2020-08-02 10:03:00    Y 2020-08-02 10:02:00
7  2020-08-02 10:03:30  NaN                 NaT
8  2020-08-02 10:04:00  NaN                 NaT
9  2020-08-02 10:04:30  NaN                 NaT
10 2020-08-02 10:05:00    Z 2020-08-02 10:05:00
11 2020-08-02 10:05:30    Z 2020-08-02 10:05:00
12 2020-08-02 10:06:00    Z 2020-08-02 10:05:00
13 2020-08-02 10:06:30  NaN                 NaT
14 2020-08-02 10:07:00  NaN                 NaT
1分钟的
公差
参数基本上告诉它应该忽略
df中大于1分钟的值


现在我们当然可以在
列之前删除
1,并在
ID
列上使用
fillna
,使其与原始帖子中的
预期输出
完全相同。

谢谢。好方法。
pd.merge_asof(df2[['Time']], df[['ID', '1MinBefore']],
              left_on='Time', right_on='1MinBefore',
              tolerance=pd.Timedelta('1min')
                  Time   ID          1MinBefore
0  2020-08-02 10:00:00    X 2020-08-02 10:00:00
1  2020-08-02 10:00:30    X 2020-08-02 10:00:00
2  2020-08-02 10:01:00    X 2020-08-02 10:00:00
3  2020-08-02 10:01:30  NaN                 NaT
4  2020-08-02 10:02:00    Y 2020-08-02 10:02:00
5  2020-08-02 10:02:30    Y 2020-08-02 10:02:00
6  2020-08-02 10:03:00    Y 2020-08-02 10:02:00
7  2020-08-02 10:03:30  NaN                 NaT
8  2020-08-02 10:04:00  NaN                 NaT
9  2020-08-02 10:04:30  NaN                 NaT
10 2020-08-02 10:05:00    Z 2020-08-02 10:05:00
11 2020-08-02 10:05:30    Z 2020-08-02 10:05:00
12 2020-08-02 10:06:00    Z 2020-08-02 10:05:00
13 2020-08-02 10:06:30  NaN                 NaT
14 2020-08-02 10:07:00  NaN                 NaT