Pandas 如何在同一数据帧内合并补充数据?

Pandas 如何在同一数据帧内合并补充数据?,pandas,merge,Pandas,Merge,我有一个合并部分缺失数据的复杂案例。我有一个包含所有数据点的数据帧: import pandas as pd import numpy as np df = pd.DataFrame(columns = ['Name', 'Time', 'C1', 'C2', 'Target1', 'Target2'], data = [['Sample1', 0, 0, 0, np.nan, 1.5], ['Sample1', 24,

我有一个合并部分缺失数据的复杂案例。我有一个包含所有数据点的数据帧:

import pandas as pd
import numpy as np
df = pd.DataFrame(columns = ['Name', 'Time', 'C1', 'C2', 'Target1', 'Target2'], 
              data = [['Sample1', 0, 0, 0, np.nan, 1.5],
                      ['Sample1', 24, 0, 0, np.nan, 1.6],
                      ['Sample1', 48, 0, 0, np.nan, 1.7],
                      ['Sample1', 0, 1, 0, np.nan, 2.5],
                      ['Sample1', 24, 1, 0, np.nan, 2.6],
                      ['Sample1', 48, 1, 0, np.nan, 2.7],
                      ['Sample1', 0, 0, 0, 10, np.nan],
                      ['Sample1', 24, 0, 0, 20, np.nan],
                      ['Sample1', 48, 0, 0, 30, np.nan],
                      ['Sample1', 0, 0, 0, np.nan, 1.8]
                      ])

      Name  Time  C1  C2  Target1  Target2
0  Sample1     0   0   0      NaN      1.5
1  Sample1    24   0   0      NaN      1.6
2  Sample1    48   0   0      NaN      1.7
3  Sample1     0   1   0      NaN      2.5
4  Sample1    24   1   0      NaN      2.6
5  Sample1    48   1   0      NaN      2.7
6  Sample1     0   0   0     10.0      NaN
7  Sample1    24   0   0     20.0      NaN
8  Sample1    48   0   0     30.0      NaN
9  Sample1     0   0   0      NaN      1.8
在这里,第0、1和2行分别具有与第6、7和8行相同的特性,因此需要合并它们。第9行与第0行相同,但目标列相同,因此在本例中,我希望创建另一列。最后,我想制作:

      Name  Time  C1  C2  Target1  Target2  Target2_x
0  Sample1     0   0   0     10.0      1.5      1.8
1  Sample1    24   0   0     20.0      1.6      NaN
2  Sample1    48   0   0     30.0      1.7      NaN
3  Sample1     0   1   0      NaN      2.5      NaN
4  Sample1    24   1   0      NaN      2.6      NaN
5  Sample1    48   1   0      NaN      2.7      NaN

如果同一样本有两个或两个以上的重复,它应该可以工作。我无法计算合并、加入、groupby等的组合。提前感谢您的帮助。

首先使用重塑方式创建计数器列,仅删除
NaN
s列,对于正确的新列,使用顺序名称将列转换为
系列
,使用另一个
cumcount
,并在列表理解中使用
f-string
s设置新列名称:

c = ['Name', 'Time', 'C1', 'C2']
df = df.set_index([*c, df.groupby(c).cumcount()]).unstack().dropna(how='all', axis=1)

s = df.columns.to_series().groupby(level=0, sort=False).cumcount()
df.columns = [f'{k}_{v}' for (k, k1), v in s.items()]

df = df.reset_index()
print (df)
      Name  Time  C1  C2  Target1_0  Target2_0  Target2_1
0  Sample1     0   0   0       10.0        1.5        1.8
1  Sample1     0   1   0        NaN        2.5        NaN
2  Sample1    24   0   0       20.0        1.6        NaN
3  Sample1    24   1   0        NaN        2.6        NaN
4  Sample1    48   0   0       30.0        1.7        NaN
5  Sample1    48   1   0        NaN        2.7        NaN

首先通过使用“重塑方式”创建计数器列,通过仅删除
NaN
s列,对于正确的新列,使用“将列转换为
系列”
,使用另一个
cumcount
,并使用
f-string
s在列表理解中设置新列名称:

c = ['Name', 'Time', 'C1', 'C2']
df = df.set_index([*c, df.groupby(c).cumcount()]).unstack().dropna(how='all', axis=1)

s = df.columns.to_series().groupby(level=0, sort=False).cumcount()
df.columns = [f'{k}_{v}' for (k, k1), v in s.items()]

df = df.reset_index()
print (df)
      Name  Time  C1  C2  Target1_0  Target2_0  Target2_1
0  Sample1     0   0   0       10.0        1.5        1.8
1  Sample1     0   1   0        NaN        2.5        NaN
2  Sample1    24   0   0       20.0        1.6        NaN
3  Sample1    24   1   0        NaN        2.6        NaN
4  Sample1    48   0   0       30.0        1.7        NaN
5  Sample1    48   1   0        NaN        2.7        NaN