Python 有条件地循环一个数据帧中的染色体和位置到另一个数据帧中的染色体和间隔
我想有条件地将df1中的'Chr'和'position'循环到df2中的'Chr'和interval(其中df1中的位置介于'start'和'end'之间),然后在df1中添加'logr'和'seg'列 我期望的输出是:Python 有条件地循环一个数据帧中的染色体和位置到另一个数据帧中的染色体和间隔,python,pandas,Python,Pandas,我想有条件地将df1中的'Chr'和'position'循环到df2中的'Chr'和interval(其中df1中的位置介于'start'和'end'之间),然后在df1中添加'logr'和'seg'列 我期望的输出是: df1= pd.DataFrame({'Chr':['1', '1', '2', '2', '3','3','4'], 'position':[50, 500, 1030, 2005 , 3575,50, 250]}) df2 = pd.DataFrame({
df1= pd.DataFrame({'Chr':['1', '1', '2', '2', '3','3','4'],
'position':[50, 500, 1030, 2005 , 3575,50, 250]})
df2 = pd.DataFrame({'Chr':['1', '1', '1', '1',
'1','2','2','2','2','2','3','3','3','3','3'],
'start':
[0,100,1000,2000,3000,0,100,1000,2000,3000,0,100,1000,2000,3000],
'end':
[100,1000,2000,3000,4000,100,1000,2000,3000,4000,100,1000,2000,3000,4000],
'logr':[3, 4, 5, 6, 7,8,9,10,11,12,13,15,16,17,18],
'seg':[0.2,0.5,0.2,0.1,0.5,0.5,0.2,0.2,0.1,0.2,0.1,0.5,0.5,0.9,0.3]})
提前感谢。对所有组合使用with outer join,然后对提取列使用“按”和“with”进行筛选,并对添加缺少的行使用“最后一次左连接”:
df3= pd.DataFrame({'Chr':['1', '1', '2', '2', '3','3','4'],
'position':[50, 500, 1030, 2005 , 3575,50, 250],
'logr':[3, 4, 10,11, 18,13, "NA"],
'seg':[0.2,0.5,0.2,0.1,0.3,0.1,"NA"]})
df3=df1.merge(df2,on='Chr',how='outer')
#默认情况下,between是包含的(>=,df3.pop('start'))&(df3['position']s)和(df3['position']尝试使用and
使用indicator=True执行left merge
。接下来,query
检查开始
,结束
或之间的位置
合并
值仅为left\u
。最后,删除不需要的列
Chr position logr seg
0 1 50 3.0 0.2
6 1 500 4.0 0.5
12 2 1030 10.0 0.2
18 2 2005 11.0 0.1
24 3 3575 18.0 0.3
25 3 50 13.0 0.1
30 4 250 NaN NaN
df1.merge(df2,'left',indicator=True).query('(到目前为止有没有尝试过?)?
df3 = df1.merge(df2, on='Chr', how='outer')
s = df3.pop('start')
e = df3.pop('end')
df3 = df3[df3['position'].between(s, e) | s.isna() | e.isna()]
#if different closed intervals
#df3 = df3[(df3['position'] > s) & (df3['position'] <= e) | s.isna() | e.isna()]
print (df3)
Chr position logr seg
0 1 50 3.0 0.2
6 1 500 4.0 0.5
12 2 1030 10.0 0.2
18 2 2005 11.0 0.1
24 3 3575 18.0 0.3
25 3 50 13.0 0.1
30 4 250 NaN NaN
import pandas pd
import numpy as np
res_df = pd.merge(df1,df2,on=['Chr'],how='outer')
res_df['check_between'] = np.where((res_df['position']>=res_df['start'])&(res_df['position']<=res_df['end']),True,False)
df3 = res_df[(res_df['check_between']==True) |
(res_df['start'].isnull())|
(res_df['end'].isnull()) ]
df3.drop(['check_between','start','end'],axis=1,inplace=True)
Chr position logr seg
0 1 50 3.0 0.2
6 1 500 4.0 0.5
12 2 1030 10.0 0.2
18 2 2005 11.0 0.1
24 3 3575 18.0 0.3
25 3 50 13.0 0.1
30 4 250 NaN NaN
df1.merge(df2, 'left', indicator=True).query('(start<=position<=end) | _merge.eq("left_only")') \
.drop(['start', 'end', '_merge'],1)
Out[364]:
Chr position logr seg
0 1 50 3.0 0.2
6 1 500 4.0 0.5
12 2 1030 10.0 0.2
18 2 2005 11.0 0.1
24 3 3575 18.0 0.3
25 3 50 13.0 0.1
30 4 250 NaN NaN