Pandas 根据列的组合添加两个表的值_Pandas

Pandas 根据列的组合添加两个表的值

pandas

Pandas 根据列的组合添加两个表的值,pandas,Pandas,我有两张桌子： df1 = pd.DataFrame({ "c_id": [2000,3000,3000], "cloud":["GCP","GCP","Azure"], "invoice":[100,100,300] }) c_id cloud invoice 2000 GCP 100 3000 GCP 100 3000 Azure 300 df2 = pd.DataFrame({ "c_id": [100

我有两张桌子：

df1 = pd.DataFrame({
    "c_id": [2000,3000,3000], 
    "cloud":["GCP","GCP","Azure"], 
    "invoice":[100,100,300]
})

c_id    cloud   invoice
2000    GCP     100
3000    GCP     100
3000    Azure   300

df2 = pd.DataFrame({
    "c_id": [1000,2000,2000,3000,3000], 
    "cloud":["Azure","GCP","Azure","AWS","Azure"], 
    "invoice":[200,200,300,100,100]
})

c_id    cloud   invoice
1000    Azure   200
2000    GCP     200
2000    Azure   300
3000    AWS     100
3000    Azure   100

我想根据列

c_id

和

cloud

的组合添加这两个表。我想要的结果是：

c_id    cloud   invoice
1000    Azure   200
2000    Azure   300
2000    GCP     300
3000    AWS     100
3000    Azure   400
3000    GCP     100

在我的示例中，我只显示了列

invoice

。在我的实际数据集中，实际上有40多列具有更多约束。一些列仅在

云为Azure
时才有值，而其他列仅在云为Azure
或GCP
时才有值
是否有一种干净的方法可以添加df1
和df2
？
与聚合和一起使用：
df1 = pd.DataFrame({
    "c_id": [2000,3000,3000], 
    "cloud":["GCP","GCP","Azure"], 
    "invoice":[100,100,300]
})
print (df1)
   c_id  cloud  invoice
0  2000    GCP      100
1  3000    GCP      100
2  3000  Azure      300


df2 = pd.DataFrame({
    "c_id": [1000,2000,2000,3000,3000], 
    "cloud":["Azure","GCP","Azure","AWS","Azure"], 
    "invoice":[200,200,300,100,100]
})
print (df2)
   c_id  cloud  invoice
0  1000  Azure      200
1  2000    GCP      200
2  2000  Azure      300
3  3000    AWS      100
4  3000  Azure      100


与聚合总和一起使用
：
df1 = pd.DataFrame({
    "c_id": [2000,3000,3000], 
    "cloud":["GCP","GCP","Azure"], 
    "invoice":[100,100,300]
})
print (df1)
   c_id  cloud  invoice
0  2000    GCP      100
1  3000    GCP      100
2  3000  Azure      300


df2 = pd.DataFrame({
    "c_id": [1000,2000,2000,3000,3000], 
    "cloud":["Azure","GCP","Azure","AWS","Azure"], 
    "invoice":[200,200,300,100,100]
})
print (df2)
   c_id  cloud  invoice
0  1000  Azure      200
1  2000    GCP      200
2  2000  Azure      300
3  3000    AWS      100
4  3000  Azure      100


您还可以使用：
输出：
    c_id  cloud  invoice
0  1000  Azure   200.00
1  2000  Azure   300.00
2  2000    GCP   300.00
3  3000    AWS   100.00
4  3000  Azure   400.00
5  3000    GCP   100.00

您还可以使用：
输出：
    c_id  cloud  invoice
0  1000  Azure   200.00
1  2000  Azure   300.00
2  2000    GCP   300.00
3  3000    AWS   100.00
4  3000  Azure   400.00
5  3000    GCP   100.00

所以我不是投反对票的人，但这个答案对我来说并不合适。我有很多列，不仅仅是invoice
，它还将处理其他列的总和。请让我知道一个具体的问题。我在遵循另一个答案后马上得到了它。尽管如此，谢谢你抽出时间@sagungrp我知道其他答案很好。也请检查我的更新答案。我检查了你的代码，它运行良好，符合我的要求，所以我不是投反对票的人，但这个答案不适合我。我有很多列，不仅仅是invoice
，它还将处理其他列的总和。请让我知道一个具体的问题。我在遵循另一个答案后马上得到了它。尽管如此，谢谢你抽出时间@sagungrp我知道其他答案很好。也请检查我的更新答案。我检查了你的代码，它运行良好，符合我的要求