如何在R中复制相同的foverlaps输出，并在python中合并pandas？_Python_Pandas

如何在R中复制相同的foverlaps输出，并在python中合并pandas？

python pandas

如何在R中复制相同的foverlaps输出，并在python中合并pandas？,python,pandas,Python,Pandas,我正在使用函数在我的R个表中进行合并。但我需要使用python重现相同的输出。我搜索了一下，在pandas库中找到了函数。但即使使用这个函数，我也不能重现相同的输出首先是R中的输出：这是第一个表（间隔）：这是第二个表（decomp）： R中进行合并的代码： relations <- foverlaps(decomp, intervals, type='within', nomatch=0) 现在我在python中的输出：这是第一个表（df_对）：这是第二个表（相邻的df_）：

我正在使用函数在我的R个表中进行合并。但我需要使用python重现相同的输出。我搜索了一下，在pandas库中找到了函数。但即使使用这个函数，我也不能重现相同的输出

首先是R中的输出：

这是第一个表（间隔）：

这是第二个表（decomp）：

R中进行合并的代码：

relations <- foverlaps(decomp, intervals, type='within', nomatch=0)

现在我在python中的输出：

这是第一个表（df_对）：

这是第二个表（相邻的df_）：

现在是问题所在，在使用pandas merge时，我没有用python重现相同的输出。我尝试了几种方法，但都没有成功，以下是我使用它的一种方法：

df = df_of_pairs.merge(df_of_adjacent, left_on=['V1'], right_on=['V2'] )

输出（df）：

这个问题非常类似于，但在这种情况下，它有不同的列

我无法轻松获得您想要的确切输出，但这里有一个使用

IntervalIndex

的部分解决方案

s1 = pd.IntervalIndex.from_arrays(df1['V1'], df1['V2'])  # default: closed='right'
s2 = pd.IntervalIndex.from_arrays(df2['V1'], df2['V2'])
df_of_adjacent.set_index(s2, inplace=True)
df_of_adjacent.loc[s1]
          V1  V2 subid
(1, 4]     1   4     A
(4, 5]     4   5     B
(4, 5]     4   5     B
(5, 6]     5   6     C
(6, 9]     6   9     D
(6, 9]     6   9     D
(9, 11]    9  11     E
(11, 12]  11  12     F
(11, 12]  11  12     F
(12, 17]  12  17     G
(18, 20]  18  20     I

谢谢你的帮助！我将尝试使用这个函数。您知道其他可能解决此问题的函数吗？也许是python的另一个库？hello@peter，我无法获得与您相同的输出！我的结果与原始数据帧相同！可能会发生什么？@PabloPavan，我不知道还有其他python包会复制R的

foverlaps

的结果，但这两个线程看起来很相关：至于为什么会得到与原始数据帧相同的结果，您必须缺少

inplace=True

或者没有保存我答案中第三行的结果。

    V1 V2 intid i.V1 i.V2 subid
 1:  1  5     1    1    4     A
 2:  1  5     1    4    5     B
 3:  4  9     2    4    5     B
 4:  4  9     2    5    6     C
 5:  4  9     2    6    9     D
 6:  6 12     3    6    9     D
 7:  6 12     3    9   11     E
 8:  6 12     3   11   12     F
 9: 11 17     4   11   12     F
10: 11 17     4   12   17     G
11: 18 20     5   18   20     I

   V1  V2  intid
0   1   5      1
1   4   9      2
2   6  12      3
3  11  17      4
4  18  20      5

   V1  V2 subid
0   1   4     A
1   4   5     B
2   5   6     C
3   6   9     D
4   9  11     E
5  11  12     F
6  12  17     G
7  17  18     H
8  18  20     I

df = df_of_pairs.merge(df_of_adjacent, left_on=['V1'], right_on=['V2'] )

   V1_x  V2_x  intid  V1_y  V2_y subid
0     4     9      2     1     4     A
1     6    12      3     5     6     C
2    11    17      4     9    11     E
3    18    20      5    17    18     H

s1 = pd.IntervalIndex.from_arrays(df1['V1'], df1['V2'])  # default: closed='right'
s2 = pd.IntervalIndex.from_arrays(df2['V1'], df2['V2'])
df_of_adjacent.set_index(s2, inplace=True)
df_of_adjacent.loc[s1]
          V1  V2 subid
(1, 4]     1   4     A
(4, 5]     4   5     B
(4, 5]     4   5     B
(5, 6]     5   6     C
(6, 9]     6   9     D
(6, 9]     6   9     D
(9, 11]    9  11     E
(11, 12]  11  12     F
(11, 12]  11  12     F
(12, 17]  12  17     G
(18, 20]  18  20     I