Pandas 如何在同一数据帧内合并补充数据?
我有一个合并部分缺失数据的复杂案例。我有一个包含所有数据点的数据帧:Pandas 如何在同一数据帧内合并补充数据?,pandas,merge,Pandas,Merge,我有一个合并部分缺失数据的复杂案例。我有一个包含所有数据点的数据帧: import pandas as pd import numpy as np df = pd.DataFrame(columns = ['Name', 'Time', 'C1', 'C2', 'Target1', 'Target2'], data = [['Sample1', 0, 0, 0, np.nan, 1.5], ['Sample1', 24,
import pandas as pd
import numpy as np
df = pd.DataFrame(columns = ['Name', 'Time', 'C1', 'C2', 'Target1', 'Target2'],
data = [['Sample1', 0, 0, 0, np.nan, 1.5],
['Sample1', 24, 0, 0, np.nan, 1.6],
['Sample1', 48, 0, 0, np.nan, 1.7],
['Sample1', 0, 1, 0, np.nan, 2.5],
['Sample1', 24, 1, 0, np.nan, 2.6],
['Sample1', 48, 1, 0, np.nan, 2.7],
['Sample1', 0, 0, 0, 10, np.nan],
['Sample1', 24, 0, 0, 20, np.nan],
['Sample1', 48, 0, 0, 30, np.nan],
['Sample1', 0, 0, 0, np.nan, 1.8]
])
Name Time C1 C2 Target1 Target2
0 Sample1 0 0 0 NaN 1.5
1 Sample1 24 0 0 NaN 1.6
2 Sample1 48 0 0 NaN 1.7
3 Sample1 0 1 0 NaN 2.5
4 Sample1 24 1 0 NaN 2.6
5 Sample1 48 1 0 NaN 2.7
6 Sample1 0 0 0 10.0 NaN
7 Sample1 24 0 0 20.0 NaN
8 Sample1 48 0 0 30.0 NaN
9 Sample1 0 0 0 NaN 1.8
在这里,第0、1和2行分别具有与第6、7和8行相同的特性,因此需要合并它们。第9行与第0行相同,但目标列相同,因此在本例中,我希望创建另一列。最后,我想制作:
Name Time C1 C2 Target1 Target2 Target2_x
0 Sample1 0 0 0 10.0 1.5 1.8
1 Sample1 24 0 0 20.0 1.6 NaN
2 Sample1 48 0 0 30.0 1.7 NaN
3 Sample1 0 1 0 NaN 2.5 NaN
4 Sample1 24 1 0 NaN 2.6 NaN
5 Sample1 48 1 0 NaN 2.7 NaN
如果同一样本有两个或两个以上的重复,它应该可以工作。我无法计算合并、加入、groupby等的组合。提前感谢您的帮助。首先使用重塑方式创建计数器列,仅删除
NaN
s列,对于正确的新列,使用顺序名称将列转换为系列
,使用另一个cumcount
,并在列表理解中使用f-string
s设置新列名称:
c = ['Name', 'Time', 'C1', 'C2']
df = df.set_index([*c, df.groupby(c).cumcount()]).unstack().dropna(how='all', axis=1)
s = df.columns.to_series().groupby(level=0, sort=False).cumcount()
df.columns = [f'{k}_{v}' for (k, k1), v in s.items()]
df = df.reset_index()
print (df)
Name Time C1 C2 Target1_0 Target2_0 Target2_1
0 Sample1 0 0 0 10.0 1.5 1.8
1 Sample1 0 1 0 NaN 2.5 NaN
2 Sample1 24 0 0 20.0 1.6 NaN
3 Sample1 24 1 0 NaN 2.6 NaN
4 Sample1 48 0 0 30.0 1.7 NaN
5 Sample1 48 1 0 NaN 2.7 NaN
首先通过使用“重塑方式”创建计数器列,通过仅删除
NaN
s列,对于正确的新列,使用“将列转换为系列”
,使用另一个cumcount
,并使用f-string
s在列表理解中设置新列名称:
c = ['Name', 'Time', 'C1', 'C2']
df = df.set_index([*c, df.groupby(c).cumcount()]).unstack().dropna(how='all', axis=1)
s = df.columns.to_series().groupby(level=0, sort=False).cumcount()
df.columns = [f'{k}_{v}' for (k, k1), v in s.items()]
df = df.reset_index()
print (df)
Name Time C1 C2 Target1_0 Target2_0 Target2_1
0 Sample1 0 0 0 10.0 1.5 1.8
1 Sample1 0 1 0 NaN 2.5 NaN
2 Sample1 24 0 0 20.0 1.6 NaN
3 Sample1 24 1 0 NaN 2.6 NaN
4 Sample1 48 0 0 30.0 1.7 NaN
5 Sample1 48 1 0 NaN 2.7 NaN