Python 在具有多个值的数据帧上合并
我有如下数据帧:Python 在具有多个值的数据帧上合并,python,pandas,Python,Pandas,我有如下数据帧: _data_orig = [ [1, 3.2], [3, 3.9], [4, 1.2], [5, 2.2] ] _columns1 = ["ID", "GPA"] _data_new = [ [1, "Bob"], [2, "Sam"], [3, "Jane"], [3, "Sanoj"] ] _columns2 = ["ID", "Name"] df_orig = p
_data_orig = [
[1, 3.2],
[3, 3.9],
[4, 1.2],
[5, 2.2]
]
_columns1 = ["ID", "GPA"]
_data_new = [
[1, "Bob"],
[2, "Sam"],
[3, "Jane"],
[3, "Sanoj"]
]
_columns2 = ["ID", "Name"]
df_orig = pd.DataFrame(data=_data_orig, columns=_columns1)
df_new = pd.DataFrame(data=_data_new, columns=_columns2)
当我这样做时:
df_merge = pd.merge(df_orig, df_new, how='left')
我得到:
ID GPA Name
0 1 3.2 Bob
1 3 3.9 Jane
2 3 3.9 Sanoj
3 4 1.2 NaN
4 5 2.2 NaN
您可以看到ID:3被重复。我希望采用这种格式,以便ID:3不会从df_orig:
ID GPA Name Name_1
0 1 3.2 Bob
1 3 3.9 Jane Sanoj
2 4 1.2 NaN
4 5 2.2 NaN
试试这个:
让我们创建以下帮助器DF
In [279]: x = (df_new.groupby('ID')['Name']
...: .apply(';'.join)
...: .str.split(';', expand=True)
...: .add_prefix('Name_')
...: .reset_index())
...:
In [280]: x
Out[280]:
ID Name_0 Name_1
0 1 Bob None
1 2 Sam None
2 3 Jane Sanoj
现在我们可以简单地将它与df_orig
df合并
In [281]: pd.merge(df_orig, x, how='left').fillna('')
...:
Out[281]:
ID GPA Name_0 Name_1
0 1 3.2 Bob
1 3 3.9 Jane Sanoj
2 4 1.2
3 5 2.2
考虑
pivot
关闭groupby()。cumcount
与合并
:
df_new['IDcount'] = "Name_" + (df_new.groupby("ID").cumcount() + 1).astype(str)
df_wide = df_new.pivot(index="ID", columns="IDcount", values="Name").reset_index()
df_merge = pd.merge(df_orig, df_wide, on='ID', how='left')
# ID GPA Name_1 Name_2
# 0 1 3.2 Bob None
# 1 3 3.9 Jane Sanoj
# 2 4 1.2 NaN NaN
# 3 5 2.2 NaN NaN
谢谢“非常顺利。”萨诺伊,很高兴它能帮上忙