Python 根据其他列的值,在熊猫中添加新列
您好,我需要帮助才能在数据帧中添加两个新列,例如:Python 根据其他列的值,在熊猫中添加新列,python,python-3.x,pandas,Python,Python 3.x,Pandas,您好,我需要帮助才能在数据帧中添加两个新列,例如: Name start1 end1 OK0100087.1_0 0 375 OK0100087.1_1 376 750 OK0100087.1_2 751 1000 OK0100088.1 0 87766 OK0100089.1 0 66778 OK0100090.1_0 0 47519 OK0100090.1_1 47520 73733 我们
Name start1 end1
OK0100087.1_0 0 375
OK0100087.1_1 376 750
OK0100087.1_2 751 1000
OK0100088.1 0 87766
OK0100089.1 0 66778
OK0100090.1_0 0 47519
OK0100090.1_1 47520 73733
我们的想法是添加start2
和end2
,例如:
Name start1 end1 start2 end2
OK0100087.1_0 0 375 1000 625
OK0100087.1_1 376 750 624 250
OK0100087.1_2 751 1000 249 0
OK0100088.1 0 87766 87766 0
OK0100089.1 0 66778 66778 0
OK0100090.1_0 0 47519 73733 26214
OK0100090.1_1 47520 73733 26213 0
因此,要找到开始2
和结束2
的新值,需要在每个名称中找到内容编号
因此,例如OK0100087.1
:
Name start1 end1 start2 end2
OK0100087.1_0 0 375
OK0100087.1_1 376 750
OK0100087.1_2 751 1000
取最高值=1000
Name start1 end1 start2 end2
OK0100087.1_0 0 375 1000 625
OK0100087.1_1 376 750 624 250
然后第一个start2
将1000。
Name start1 end1 start2 end2
OK0100087.1_0 0 375 1000
OK0100087.1_1 376 750
OK0100087.1_2 751 1000
然后第一个end2
将是=start2-(end1-start1)so1000-(375-0)=625
Name start1 end1 start2 end2
OK0100087.1_0 0 375 1000 625
OK0100087.1_1 376 750
OK0100087.1_2 751 1000
然后第二个start2
将是end2-1(625-1)=624
Name start1 end1 start2 end2
OK0100087.1_0 0 375 1000 625
OK0100087.1_1 376 750 624
然后再次end2
将start2-(end1-start1)so624-(750-376)=250
Name start1 end1 start2 end2
OK0100087.1_0 0 375 1000 625
OK0100087.1_1 376 750 624 250
等
最后,我们应该得到:
Name start1 end1 start2 end2
OK0100087.1_0 0 375 1000 625
OK0100087.1_1 376 750 624 250
OK0100087.1_2 751 1000 249 0
OK0100088.1 0 87766 87766 0
OK0100089.1 0 66778 66778 0
OK0100090.1_0 0 47519 73733 26214
OK0100090.1_1 47520 73733 26213 0
有没有人有这样的想法?非常感谢你的帮助
df = pd.DataFrame({'Name': ['OK0100087.1_0',
'OK0100087.1_1',
'OK0100087.1_2',
'OK0100088.1',
'OK0100089.1',
'OK0100090.1_0',
'OK0100090.1_1'],
'start1': [0, 376, 751, 0, 0, 0, 47520],
'end1': [375, 750, 1000, 87766, 66778, 47519, 73733]})
df['base'] = df['Name'].apply(lambda x: x.split('_')[0])
df['start2'] = df.groupby('base')['end1'].transform('max')
output = pd.DataFrame(columns = df.columns)
for index, group in df.groupby('base'):
t = group.copy()
for x in range(len(group)):
t['end2'] = t['start2'] - (t['end1'] - t['start1'])
t['start2'].update((t['end2'] - 1).shift(1))
output = output.append(t)
output.drop(columns='base', inplace=True)
output['end2'] = output['end2'].astype(int)
输出
Name start1 end1 start2 end2
0 OK0100087.1_0 0 375 1000 625
1 OK0100087.1_1 376 750 624 250
2 OK0100087.1_2 751 1000 249 0
3 OK0100088.1 0 87766 87766 0
4 OK0100089.1 0 66778 66778 0
5 OK0100090.1_0 0 47519 73733 26214
6 OK0100090.1_1 47520 73733 26213 0
这只是groupby().transform()
,因为您可以提取唯一的名称:
total = df.groupby(df.Name.str.extract('^([^\.]+)')[0])['end1'].transform('max')
df['start2'] = total - df['start1']
df['end2'] = total - df['end1']
输出:
Name start1 end1 start2 end2
0 OK0100087.1_0 0 375 1000 625
1 OK0100087.1_1 376 750 624 250
2 OK0100087.1_2 751 1000 249 0
3 OK0100088.1 0 87766 87766 0
4 OK0100089.1 0 66778 66778 0
5 OK0100090.1_0 0 47519 73733 26214
6 OK0100090.1_1 47520 73733 26213 0