Python 熊猫在时间上交叉配对_Python_Pandas

Python 熊猫在时间上交叉配对

python pandas

Python 熊猫在时间上交叉配对,python,pandas,Python,Pandas,我有两只熊猫数据帧充满了时间戳。我想在5天内交叉匹配这些事件。如果我要将df1交叉匹配到df2，我想要一个大小为len（df1）的列表（一般意义上），其中每个元素包含df1中元素的索引列表，这些索引位于df2中相应元素的指定时间限制内。我还希望类似的结构，而不是指数，包含事件之间的天数例如： df1 = pd.DataFrame({'date_1': ['2016-10-10', '2016-10-11', '2016-10-18', '2016-10-29']}) df2 = pd.Data

我有两只熊猫

数据帧

充满了

时间戳

。我想在5天内交叉匹配这些事件。如果我要将df1交叉匹配到df2，我想要一个大小为len（df1）的列表（一般意义上），其中每个元素包含df1中元素的索引列表，这些索引位于df2中相应元素的指定时间限制内。我还希望类似的结构，而不是指数，包含事件之间的天数

例如：

df1 = pd.DataFrame({'date_1': ['2016-10-10', '2016-10-11', '2016-10-18', '2016-10-29']})
df2 = pd.DataFrame({'date_2': ['2016-10-10', '2016-10-05', '2016-10-27', '2016-10-01']})

输出：

matched_indices = [[0,1], [0], [3], []]
matched_deltas  = [[0,1], [5], [2], []]

有什么想法吗？

一种解决方案是遍历df2的所有行，并找出与df1中日期的差异

matched_indices = []
matched_deltas = []
# iterate throug hthe rows of df2
for index, row in df2.iterrows():
    # s is a series that stores the difference between the two dates, the index is the same as df1's
    s = abs((df1['date_1'] - row['date_2']).dt.days)
    # keep only the differences that are less than 5
    s = s.where(s<=5).dropna()
    # add the indices to matched_index 
    matched_indices.append(list(s.index.values))
    # add the values to matched_deltas
    matched_deltas.append(list(s.values.astype(int)))

匹配的_索引=[]
匹配的_delta=[]
#遍历df2的行
对于索引，df2.iterrows（）中的行：
#s是存储两个日期之间差异的序列，索引与df1相同
s=绝对值（（df1['date_1']-行['date_2']）.dt.days）
#仅保留小于5的差异
其中（s一种解决方案是迭代df2的所有行，并找出与df1中日期的差异
matched_indices = []
matched_deltas = []
# iterate throug hthe rows of df2
for index, row in df2.iterrows():
    # s is a series that stores the difference between the two dates, the index is the same as df1's
    s = abs((df1['date_1'] - row['date_2']).dt.days)
    # keep only the differences that are less than 5
    s = s.where(s<=5).dropna()
    # add the indices to matched_index 
    matched_indices.append(list(s.index.values))
    # add the values to matched_deltas
    matched_deltas.append(list(s.values.astype(int)))

匹配的_索引=[]
匹配的_delta=[]
#遍历df2的行
对于索引，df2.iterrows（）中的行：
#s是存储两个日期之间差异的序列，索引与df1相同
s=绝对值（（df1['date_1']-行['date_2']）.dt.days）
#仅保留小于5的差异
s=s，其中
输出：
[[0.0, 1.0], [5.0], [2.0], nan]
[[0, 1], [0], [3], nan]

输出：
[[0.0, 1.0], [5.0], [2.0], nan]
[[0, 1], [0], [3], nan]

如果您要包含代码（一件好事），至少要粘贴您实际运行的代码。祝您好运呵呵：-）@WeNYoBen那太傻了。我已经纠正了你故意犯的错误！谢谢你最初的帮助anyways@piRSquared我想这是一个有趣的问题，如果一开始不适定的话。如果你打算包含代码（一件好事），那么现在就可以运行代码了至少你已经运行了粘贴代码。祝你好运，呵呵：-）@WeNYoBen，那太傻了。我已经纠正了你故意犯的错误！谢谢你最初的帮助anyways@piRSquared注意：现在可以自由运行代码了。现在，你能取消你的反对票吗？我认为这是一个有趣的问题，如果一开始是不适定的话。