Python 有条件地循环一个数据帧中的染色体和位置到另一个数据帧中的染色体和间隔

Python 有条件地循环一个数据帧中的染色体和位置到另一个数据帧中的染色体和间隔,python,pandas,Python,Pandas,我想有条件地将df1中的'Chr'和'position'循环到df2中的'Chr'和interval(其中df1中的位置介于'start'和'end'之间),然后在df1中添加'logr'和'seg'列 我期望的输出是: df1= pd.DataFrame({'Chr':['1', '1', '2', '2', '3','3','4'], 'position':[50, 500, 1030, 2005 , 3575,50, 250]}) df2 = pd.DataFrame({

我想有条件地将df1中的'Chr'和'position'循环到df2中的'Chr'和interval(其中df1中的位置介于'start'和'end'之间),然后在df1中添加'logr'和'seg'列

我期望的输出是:

df1= pd.DataFrame({'Chr':['1', '1', '2', '2', '3','3','4'],
         'position':[50, 500, 1030, 2005 , 3575,50, 250]})
df2 = pd.DataFrame({'Chr':['1', '1', '1', '1',           
  '1','2','2','2','2','2','3','3','3','3','3'],
             'start':  
[0,100,1000,2000,3000,0,100,1000,2000,3000,0,100,1000,2000,3000],
             'end': 
 [100,1000,2000,3000,4000,100,1000,2000,3000,4000,100,1000,2000,3000,4000],
             'logr':[3, 4, 5, 6, 7,8,9,10,11,12,13,15,16,17,18],
             'seg':[0.2,0.5,0.2,0.1,0.5,0.5,0.2,0.2,0.1,0.2,0.1,0.5,0.5,0.9,0.3]})
提前感谢。

对所有组合使用with outer join,然后对提取列使用“按”和“with”进行筛选,并对添加缺少的行使用“最后一次左连接”:

df3= pd.DataFrame({'Chr':['1', '1', '2', '2', '3','3','4'],
         'position':[50, 500, 1030, 2005 , 3575,50, 250],
           'logr':[3, 4, 10,11, 18,13, "NA"],
             'seg':[0.2,0.5,0.2,0.1,0.3,0.1,"NA"]})
df3=df1.merge(df2,on='Chr',how='outer')
#默认情况下,between是包含的(>=,df3.pop('start'))&(df3['position']s)和(df3['position']尝试使用and


使用
indicator=True执行
left merge
。接下来,
query
检查
开始
结束
之间的
位置
合并
值仅为
left\u
。最后,删除不需要的列

   Chr  position    logr    seg
0   1   50           3.0    0.2
6   1   500          4.0    0.5
12  2   1030         10.0   0.2
18  2   2005         11.0   0.1
24  3   3575         18.0   0.3
25  3   50           13.0   0.1
30  4   250          NaN    NaN

df1.merge(df2,'left',indicator=True).query('(到目前为止有没有尝试过?)?
df3 = df1.merge(df2, on='Chr', how='outer')
s = df3.pop('start')
e = df3.pop('end')
df3 = df3[df3['position'].between(s, e) | s.isna() | e.isna()]
#if different closed intervals
#df3 = df3[(df3['position'] > s) & (df3['position'] <= e) | s.isna() | e.isna()]
print (df3)
   Chr  position  logr  seg
0    1        50   3.0  0.2
6    1       500   4.0  0.5
12   2      1030  10.0  0.2
18   2      2005  11.0  0.1
24   3      3575  18.0  0.3
25   3        50  13.0  0.1
30   4       250   NaN  NaN
import pandas pd
import numpy as np
res_df = pd.merge(df1,df2,on=['Chr'],how='outer')

res_df['check_between'] = np.where((res_df['position']>=res_df['start'])&(res_df['position']<=res_df['end']),True,False)

df3 = res_df[(res_df['check_between']==True) |
              (res_df['start'].isnull())|
              (res_df['end'].isnull()) ]

df3.drop(['check_between','start','end'],axis=1,inplace=True)

   Chr  position    logr    seg
0   1   50           3.0    0.2
6   1   500          4.0    0.5
12  2   1030         10.0   0.2
18  2   2005         11.0   0.1
24  3   3575         18.0   0.3
25  3   50           13.0   0.1
30  4   250          NaN    NaN
df1.merge(df2, 'left', indicator=True).query('(start<=position<=end) | _merge.eq("left_only")') \
                                      .drop(['start', 'end', '_merge'],1)

Out[364]:
   Chr  position  logr  seg
0    1        50   3.0  0.2
6    1       500   4.0  0.5
12   2      1030  10.0  0.2
18   2      2005  11.0  0.1
24   3      3575  18.0  0.3
25   3        50  13.0  0.1
30   4       250   NaN  NaN