Python 根据其他列的值,在熊猫中添加新列

Python 根据其他列的值,在熊猫中添加新列,python,python-3.x,pandas,Python,Python 3.x,Pandas,您好,我需要帮助才能在数据帧中添加两个新列,例如: Name start1 end1 OK0100087.1_0 0 375 OK0100087.1_1 376 750 OK0100087.1_2 751 1000 OK0100088.1 0 87766 OK0100089.1 0 66778 OK0100090.1_0 0 47519 OK0100090.1_1 47520 73733 我们

您好,我需要帮助才能在数据帧中添加两个新列,例如:

Name           start1  end1
OK0100087.1_0  0      375
OK0100087.1_1  376    750
OK0100087.1_2  751    1000
OK0100088.1    0      87766  
OK0100089.1    0      66778
OK0100090.1_0  0      47519
OK0100090.1_1  47520  73733
我们的想法是添加
start2
end2
,例如:

Name           start1 end1  start2 end2 
OK0100087.1_0  0      375   1000   625 
OK0100087.1_1  376    750   624    250
OK0100087.1_2  751    1000  249    0
OK0100088.1    0      87766 87766  0      
OK0100089.1    0      66778 66778  0
OK0100090.1_0  0      47519 73733  26214
OK0100090.1_1  47520  73733 26213  0
因此,要找到
开始2
结束2
的新值,需要在每个
名称中找到
内容编号

因此,例如
OK0100087.1

Name           start1 end1  start2 end2 
OK0100087.1_0  0      375    
OK0100087.1_1  376    750   
OK0100087.1_2  751    1000 
最高值=1000

Name           start1 end1  start2 end2 
OK0100087.1_0  0      375   1000   625 
OK0100087.1_1  376    750   624    250 
然后第一个
start2
1000。

Name           start1 end1  start2 end2 
OK0100087.1_0  0      375   1000   
OK0100087.1_1  376    750   
OK0100087.1_2  751    1000  
然后第一个
end2
将是=start2-(end1-start1)so1000-(375-0)=625

Name           start1 end1  start2 end2 
OK0100087.1_0  0      375   1000   625 
OK0100087.1_1  376    750   
OK0100087.1_2  751    1000  
然后第二个
start2
将是end2-1(625-1)=624

Name           start1 end1  start2 end2 
OK0100087.1_0  0      375   1000   625 
OK0100087.1_1  376    750   624   
然后再次
end2
start2-(end1-start1)so624-(750-376)=250

Name           start1 end1  start2 end2 
OK0100087.1_0  0      375   1000   625 
OK0100087.1_1  376    750   624    250 

最后,我们应该得到:

Name           start1 end1  start2 end2 
OK0100087.1_0  0      375   1000   625 
OK0100087.1_1  376    750   624    250
OK0100087.1_2  751    1000  249    0
OK0100088.1    0      87766 87766  0      
OK0100089.1    0      66778 66778  0
OK0100090.1_0  0      47519 73733  26214
OK0100090.1_1  47520  73733 26213  0
有没有人有这样的想法?非常感谢你的帮助

df = pd.DataFrame({'Name': ['OK0100087.1_0',
  'OK0100087.1_1',
  'OK0100087.1_2',
  'OK0100088.1',
  'OK0100089.1',
  'OK0100090.1_0',
  'OK0100090.1_1'],
 'start1': [0, 376, 751, 0, 0, 0, 47520],
 'end1': [375, 750, 1000, 87766, 66778, 47519, 73733]})


df['base'] = df['Name'].apply(lambda x: x.split('_')[0])
df['start2'] = df.groupby('base')['end1'].transform('max')

output = pd.DataFrame(columns = df.columns)
for index, group in df.groupby('base'):
    t = group.copy()
    for x in range(len(group)):
        
        t['end2'] = t['start2'] - (t['end1'] - t['start1'])
        t['start2'].update((t['end2'] - 1).shift(1))
    output = output.append(t)
    
    
output.drop(columns='base', inplace=True)

output['end2'] = output['end2'].astype(int)
输出

             Name   start1  end1    start2    end2
0   OK0100087.1_0        0  375       1000     625
1   OK0100087.1_1      376  750        624     250
2   OK0100087.1_2      751  1000       249       0
3   OK0100088.1          0  87766    87766       0
4   OK0100089.1          0  66778    66778       0
5   OK0100090.1_0        0  47519    73733   26214
6   OK0100090.1_1    47520  73733    26213       0
这只是
groupby().transform()
,因为您可以提取唯一的名称:

total = df.groupby(df.Name.str.extract('^([^\.]+)')[0])['end1'].transform('max')

df['start2'] = total - df['start1']

df['end2'] = total - df['end1']
输出:

            Name  start1   end1  start2   end2
0  OK0100087.1_0       0    375    1000    625
1  OK0100087.1_1     376    750     624    250
2  OK0100087.1_2     751   1000     249      0
3    OK0100088.1       0  87766   87766      0
4    OK0100089.1       0  66778   66778      0
5  OK0100090.1_0       0  47519   73733  26214
6  OK0100090.1_1   47520  73733   26213      0