Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/344.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 中用户的重叠请求数_Python_Pandas_Datetime_Dataframe - Fatal编程技术网

Python 中用户的重叠请求数

Python 中用户的重叠请求数,python,pandas,datetime,dataframe,Python,Pandas,Datetime,Dataframe,我有一个带有user\u id列和start\u date和end\u date列的数据框 我想创建一个新列,用于查找给定时间间隔内每个用户的重叠开始日期和结束日期的数量 有没有一种不使用for循环的方法可以做到这一点 例如: User | Start | End | Simultaneous Events `0 user_x 2013-02-09 2013-02-11 2` <---- overlaps with row 2 `

我有一个带有
user\u id
列和
start\u date
end\u date
列的数据框

我想创建一个新列,用于查找给定时间间隔内每个用户的重叠开始日期和结束日期的数量

有没有一种不使用for循环的方法可以做到这一点

例如:

User         |    Start |         End  | Simultaneous Events


`0  user_x  2013-02-09  2013-02-11   2`   <---- overlaps with row 2

`1  user_x  2013-06-06  2013-06-08   1`

`2  user_x  2013-02-10  2013-02-13   2`

`3  user_y  2014-01-06  2014-01-11   1`

`4  user_x  2014-01-06  2014-01-11   1`
User |开始|结束|同时事件

`0 user_x 2013-02-09 2013-02-11 2`如果您要求有一个智能算法来快速解决它,下面的内容没有帮助

import pandas as pd
import numpy as np
df = pd.DataFrame([
    [0,'user_x','2013-02-09','2013-02-11'],
    [1,'user_x','2013-06-06','2013-06-08'],
    [2,'user_x','2013-02-10','2013-02-13'],
    [3,'user_y','2014-01-06','2014-01-11'],
    [4,'user_x','2014-01-06','2014-01-11']])

df.columns = ['id','user','start','end']
merge_df = pd.merge(df, df, on=['user'], suffixes=['','_compare'])
merge_df['overlap'] = ((merge_df['start']>=merge_df['start_compare'])&(merge_df['start']<=merge_df['end_compare'])) | ((merge_df['end']>=merge_df['start_compare'])&(merge_df['end']<=merge_df['end_compare']))
result = merge_df[merge_df.overlap>0].groupby(['id','user','start','end']).agg({'id_compare':np.size}).reset_index()
“无循环”,如果您的意图是使用递归,那么下面的内容并没有帮助

import pandas as pd
import numpy as np
df = pd.DataFrame([
    [0,'user_x','2013-02-09','2013-02-11'],
    [1,'user_x','2013-06-06','2013-06-08'],
    [2,'user_x','2013-02-10','2013-02-13'],
    [3,'user_y','2014-01-06','2014-01-11'],
    [4,'user_x','2014-01-06','2014-01-11']])

df.columns = ['id','user','start','end']
merge_df = pd.merge(df, df, on=['user'], suffixes=['','_compare'])
merge_df['overlap'] = ((merge_df['start']>=merge_df['start_compare'])&(merge_df['start']<=merge_df['end_compare'])) | ((merge_df['end']>=merge_df['start_compare'])&(merge_df['end']<=merge_df['end_compare']))
result = merge_df[merge_df.overlap>0].groupby(['id','user','start','end']).agg({'id_compare':np.size}).reset_index()

只是一个旁注,为了更好地理解这一点,我建议您阅读sql,这将是有帮助的。这个想法很简单。匹配所有具有相同用户id的行(pd.merge)并确定其是否为重叠,最后按用户id分组以统计重叠id的出现情况。

有数据的示例吗?@White I添加了一个模拟