Python 根据Df1索引合并数据集
我有一个数据集(df1),我想在其中填充第二个数据集(df2)中的数据。两个数据帧中只有一列重叠,我已将该列设置为df1和df2的索引,因此我可以在索引上合并Python 根据Df1索引合并数据集,python,pandas,merge,Python,Pandas,Merge,我有一个数据集(df1),我想在其中填充第二个数据集(df2)中的数据。两个数据帧中只有一列重叠,我已将该列设置为df1和df2的索引,因此我可以在索引上合并 df = pd.read_excel('Data.xlsx', sheetname= 'Dataset1') df2 = pd.read_excel('Data.xlsx', sheetname= 'Dataset2') df1.set_index("ORG_ID", inplace=True) df2.set_index("ORG_ID
df = pd.read_excel('Data.xlsx', sheetname= 'Dataset1')
df2 = pd.read_excel('Data.xlsx', sheetname= 'Dataset2')
df1.set_index("ORG_ID", inplace=True)
df2.set_index("ORG_ID", inplace=True)
df3 = df1.merge(df2.ix[:,df2.columns-df1.columns], left_index=True, right_index=True, how="outer")
我希望输出是一个新的数据集(df3),它列出了来自df1的所有数据,包括索引(ORG_ID),并包括来自df2的所有新列,其中包含了基于df1中列出的ORG_ID的填充数据。
python在这里所做的似乎是给我一个新的数据帧(df3),填充df1的数据,然后将第二个数据集(df2)中的所有组织ID添加到df1中的组织ID下面,这不是我想要的
我也尝试过首先使用combine_,但它似乎产生了类似的结果
df3= df1.combine_first(df2)
Dataset1 (df1)
ORG_ID COUNTRY TOWN STORE PRODUCT PRICE
1 Spain Madrid Pink Garment 100
2 Greece Chania White Toy 200
3 U.K Manchester Red Garment 300
4 Italy Rome Red Accessory 500
5 Spain Marbella Blue Accessory 20
6 Greece Chania Green Garment 25
7 U.K Manchester Pink Toy 36
8 Italy Siena Red Accessory 150
9 Spain Barcelona White Toy 200
10 Greece Corfu Blue Accessory 500
数据集2(df2)
数据集3(df3)-我想要什么
ORG_ID COUNTRY TOWN STORE PRODUCT PRICE CUSTOMER TYPE PARENT REGION
1 Spain Madrid Pink Garment 100 T Fig Tulip Europe
2 Greece Chania White Toy 200 NaN NaN NaN NaN
3 U.K Manchester Red Garment 300 NaN NaN NaN NaN
4 Italy Rome Red Accessory 500 Y Pop Rose Europe
5 Spain Marbella Blue Accessory 20 A Pop Rose Europe
6 Greece Chania Green Garment 25 R Fig Lily Europe
7 U.K Manchester Pink Toy 36 H Pop Tulip Europe
8 Italy Siena Red Accessory 150 S Fig Rose Europe
9 Spain Barcelona White Toy 200 NaN NaN NaN NaN
10 Greece Corfu Blue Accessory 500 A Cry Tulip Europe
您不必在DataFame中设置索引。您可以将
merge
与on
参数和how='left'
一起使用
df1 = pd.read_excel('Data.xlsx', sheetname= 'Dataset1')
df2 = pd.read_excel('Data.xlsx', sheetname= 'Dataset2')
df3 = df1.merge(df2, how='left', on='ORG_ID')
输出:
ORG_ID COUNTRY TOWN STORE PRODUCT PRICE CUSTOMER TYPE PARENT \
0 1 Spain Madrid Pink Garment 100 T Fig Tulip
1 2 Greece Chania White Toy 200 NaN NaN NaN
2 3 U.K Manchester Red Garment 300 NaN NaN NaN
3 4 Italy Rome Red Accessory 500 Y Pop Rose
4 5 Spain Marbella Blue Accessory 20 A Pop Rose
5 6 Greece Chania Green Garment 25 R Fig Lily
6 7 U.K Manchester Pink Toy 36 H Pop Tulip
7 8 Italy Siena Red Accessory 150 S Fig Rose
8 9 Spain Barcelona White Toy 200 NaN NaN NaN
9 10 Greece Corfu Blue Accessory 500 A Cry Tulip
REGION
0 Europe
1 NaN
2 NaN
3 Europe
4 Europe
5 Europe
6 Europe
7 Europe
8 NaN
9 Europe
您不必在DataFame中设置索引。您可以将
merge
与on
参数和how='left'
一起使用
df1 = pd.read_excel('Data.xlsx', sheetname= 'Dataset1')
df2 = pd.read_excel('Data.xlsx', sheetname= 'Dataset2')
df3 = df1.merge(df2, how='left', on='ORG_ID')
输出:
ORG_ID COUNTRY TOWN STORE PRODUCT PRICE CUSTOMER TYPE PARENT \
0 1 Spain Madrid Pink Garment 100 T Fig Tulip
1 2 Greece Chania White Toy 200 NaN NaN NaN
2 3 U.K Manchester Red Garment 300 NaN NaN NaN
3 4 Italy Rome Red Accessory 500 Y Pop Rose
4 5 Spain Marbella Blue Accessory 20 A Pop Rose
5 6 Greece Chania Green Garment 25 R Fig Lily
6 7 U.K Manchester Pink Toy 36 H Pop Tulip
7 8 Italy Siena Red Accessory 150 S Fig Rose
8 9 Spain Barcelona White Toy 200 NaN NaN NaN
9 10 Greece Corfu Blue Accessory 500 A Cry Tulip
REGION
0 Europe
1 NaN
2 NaN
3 Europe
4 Europe
5 Europe
6 Europe
7 Europe
8 NaN
9 Europe