Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/302.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
给定时间获取python中5分钟范围内的行_Python_Pandas - Fatal编程技术网

给定时间获取python中5分钟范围内的行

给定时间获取python中5分钟范围内的行,python,pandas,Python,Pandas,我有两个数据帧 我想做的是循环通过df_1中的每一行获取其时间,然后获取与user_id和time+-5分钟匹配的行,并获取第一行的数据。如果不在5分钟内返回NaN 注意:两个数据帧都可以有多个用户id df_1看起来像: user_id created_time 1 2020-03-01 00:00:25 2 2020-03-06 04:20:25 3 2020-03-06 07:00:15 df

我有两个数据帧

我想做的是循环通过df_1中的每一行获取其时间,然后获取与user_id和time+-5分钟匹配的行,并获取第一行的数据。如果不在5分钟内返回NaN

注意:两个数据帧都可以有多个用户id

df_1看起来像:

user_id      created_time       
   1          2020-03-01 00:00:25
   2          2020-03-06 04:20:25
   3          2020-03-06 07:00:15
df_2:

这就是我现在正在做的,但是它看起来效率很低,而且容易出错

lng_list = []
lat_list = []
for row in df_1.itertuples():
    created_time    = getattr(row, "created_time")
    user_id         = getattr(row, "user_id") 

    df = df_2.loc[(df_2["user_id"] == user_id) &
                  (df_2["updated_time"] >= created_time)].copy()    
    if len(df) != 0:
        row = df.iloc[0]

    else:
        last_df = df_2.loc[(df_2["user_id"] == user_id) &
                           (df_2["created_time"] <= created_time)].copy()

        if len(last_df) == 0:
            lng_list.append(np.nan)
            lat_list.append(np.nan)
        else:
            row = last_df.iloc[-1]


    lng_list.append(row["lng"])
    lat_list.append(row["lat"])

df_1["lng"] = lng_list
df_1["lat"] = lat_list
液化天然气清单=[] lat_列表=[] 对于df_1.itertuples()中的行: 创建的时间=getattr(行,“创建的时间”) user\u id=getattr(第行,“user\u id”) df=df_2.loc[(df_2[“用户id”]==用户id)& (df_2[“更新的_时间”]>=创建的_时间)].copy() 如果len(df)!=0: 行=df.iloc[0] 其他: last_df=df_2.loc[(df_2[“用户id”]==用户id)&
(df_2[“created_time”]由于两个数据帧中都有多个
用户id
,因此
合并可能是您的最佳选择:

new_df = (df_1.merge(df_2, on='user_id', how='right')
              .assign(time_diff=lambda x: x.created_time.sub(x.updated_at)
                                           .abs().lt(pd.to_timedelta(5, unit='min')),
                     )
         )
new_df.loc[~new_df['time_diff'], ['lat','lng']] = np.nan
输出:

   user_id        created_time          updated_at      lat     lng  time_diff
0        1 2020-03-01 00:00:25 2020-03-01 00:02:25  35.2323  123.23       True
1        2 2020-03-06 04:20:25 2020-03-06 04:27:22      NaN     NaN      False
2        3 2020-03-06 07:00:15 2020-03-06 06:59:59  13.2323  127.23       True

请注意,这可能无法解决您的问题,因为对于每个
create\u时间
,您将在
上更新多个
请检查以下解决方案

# Convert date column into datetime object 
df1['created_time'] = pd.to_datetime(df1['created_time'])
df2['updated_at'] = pd.to_datetime(df2['updated_at'])

# Create filters based on condition
user_id_condition = df1['user_id'] == df2['user_id'] 
n_min_before = df1['created_time'] - pd.to_timedelta(5, unit='min')
n_min_after = df1['created_time'] + pd.to_timedelta(5, unit='min')
time_condition = (df2['updated_at'] <= n_min_after) & (n_min_before <= df2['updated_at'])

# Apply filters and find intersection rows in df2
intersect_df2 = df2[user_id_condition & time_condition][['lat', 'lng', 'user_id']]

# Merge df1 with intersect_df2 (left merge preserves size of df1)
output_df = pd.merge(df1, intersect_df2, on='user_id', how='left')

#将日期列转换为日期时间对象
df1['created_time']=pd.to_datetime(df1['created_time'])
df2['updated_at']=pd.to_datetime(df2['updated_at'])
#根据条件创建过滤器

用户标识条件=df1['user\u id']==df2['user\u id'] n_min_before=df1['created_time']-pd.to_timedelta(5,unit='min') n_min_after=df1['created_time']+pd.to_timedelta(5,unit='min')
时间\条件=(df2['updated\ u at']你能为你的问题分享一个输入/期望的输出吗?@isabella很抱歉,我添加了它,希望这是为了清楚。谢谢。它给了我
InvalidIndexError:Reindexing只对唯一值的索引对象有效,因为我有多个用户id?它不是只增加了5分钟吗?我需要过去5分钟和之后5分钟ey可以有多个。这也为您提供了一个上阈值的示例。您可以使用相同的想法轻松创建一个下阈值。如果它们有多个,那么您如何确定哪一行使用哪个创建日期?或者df_1和df_2是否逐行对齐?也许我忘了提到可能有多个用户id,因此它给了我
ValueError:只能比较标签相同的系列对象
df1、df2或两者中的多个用户id?两者,我已经解决了我的问题。错误发生在哪一行?顺便说一句,我将上一时间条件添加到用户id条件=df1['user\u id']==df2['user\u id']这一行创建了我上面提到的值错误。
   user_id        created_time          updated_at      lat     lng  time_diff
0        1 2020-03-01 00:00:25 2020-03-01 00:02:25  35.2323  123.23       True
1        2 2020-03-06 04:20:25 2020-03-06 04:27:22      NaN     NaN      False
2        3 2020-03-06 07:00:15 2020-03-06 06:59:59  13.2323  127.23       True
# Convert date column into datetime object 
df1['created_time'] = pd.to_datetime(df1['created_time'])
df2['updated_at'] = pd.to_datetime(df2['updated_at'])

# Create filters based on condition
user_id_condition = df1['user_id'] == df2['user_id'] 
n_min_before = df1['created_time'] - pd.to_timedelta(5, unit='min')
n_min_after = df1['created_time'] + pd.to_timedelta(5, unit='min')
time_condition = (df2['updated_at'] <= n_min_after) & (n_min_before <= df2['updated_at'])

# Apply filters and find intersection rows in df2
intersect_df2 = df2[user_id_condition & time_condition][['lat', 'lng', 'user_id']]

# Merge df1 with intersect_df2 (left merge preserves size of df1)
output_df = pd.merge(df1, intersect_df2, on='user_id', how='left')