Python 根据另一个数据帧的匹配列删除随机数量的行
我有两个数据帧;订单和退货 订单:Python 根据另一个数据帧的匹配列删除随机数量的行,python,pandas,dataframe,duplicates,Python,Pandas,Dataframe,Duplicates,我有两个数据帧;订单和退货 订单: Index | OrderID |TransactionID | ProductID | BuyerID | Date | TotalOrder | ProductPrice ----------------------------------------------------------------------------------------------- 0 | A | A-1 | 05
Index | OrderID |TransactionID | ProductID | BuyerID | Date | TotalOrder | ProductPrice
-----------------------------------------------------------------------------------------------
0 | A | A-1 | 05 | 1 | dd-mm-yyy | 140 | 50
1 | A | A-2 | 45 | 1 | dd-mm-yyy | 140 | 90
2 | B | B-1 | 33 | 1 | dd-mm-yyy | 15 | 10
3 | B | B-2 | 01 | 1 | dd-mm-yyy | 15 | 5
4 | C | C-1 | 45 | 1 | dd-mm-yyy | 90 | 90
5 | D | D-1 | 45 | 1 | dd-mm-yyy | 90 | 90
6 | E | E-1 | 45 | 1 | dd-mm-yyy | 90 | 90
7 | F | F-1 | 45 | 2 | dd-mm-yyy | 90 | 90
返回:
ProductID | BuyerID | ProductPrice | Amount
------------------------------------------------------------------------------------------------
33 | 1 | 10 | 1
45 | 1 | 90 | 2
01 | 1 | 5 | 1
对于退货中的每一行,订单中具有匹配ProductID、BuyerID和ProductPrice的行应删除n次(=退货['Amount']
)。因此,我将只得到索引为0、7和1、4、5或6中的两个的行
Index | OrderID |TransactionID | ProductID | BuyerID | Date | TotalOrder | ProductPrice
------------------------------------------------------------------------------------------------
0 | A | A-1 | 05 | 1 | dd-mm-yyy | 140 | 50
7 | F | F-1 | 45 | 2 | dd-mm-yyy | 90 | 90
-----------------------------------------------------------------------------------------
| 1 | A | A-2 | 45 | 1 | dd-mm-yyy | 140 | 90 |
| 4 | C | C-1 | 45 | 1 | dd-mm-yyy | 90 | 90 |+ 2 out
| 5 | D | D-1 | 45 | 1 | dd-mm-yyy | 90 | 90 |of these
| 6 | E | E-1 | 45 | 1 | dd-mm-yyy | 90 | 90 |
-----------------------------------------------------------------------------------------
有什么方法可以做到这一点吗?这应该可以:
import pandas as pd
orders = pd.DataFrame(
{
'orderId': ['a', 'a', 'b', 'b', 'c', 'd', 'e', 'f'],
'pid': [5, 45, 33, 1, 45, 45, 45, 45],
'bid': [1, 1, 1, 1, 1, 1, 1, 2],
'torder': [140, 140, 15, 15, 90, 90, 90, 90],
'px': [50, 90, 10, 5, 90, 90, 90, 90]
}
)
returns = pd.DataFrame(
{
'pid': [33, 45, 1],
'bid': [1, 1, 1],
'px': [10, 90, 5],
'amount': [1, 2, 1]
}
)
orders['temp'] = 1
orders['rid'] = orders.groupby(['pid', 'bid', 'px'])['temp'].transform(pd.Series.cumsum)
orders = orders.merge(returns, on=['pid', 'bid', 'px'], how='outer').fillna(0)
left_orders = orders[orders.rid > orders.amount].drop(columns=['temp', 'rid', 'amount'])
print(left_orders)
输出:
orderId pid bid torder px
0 a 5 1 140 50
3 d 45 1 90 90
4 e 45 1 90 90
7 f 45 2 90 90
这应该起作用:
import pandas as pd
orders = pd.DataFrame(
{
'orderId': ['a', 'a', 'b', 'b', 'c', 'd', 'e', 'f'],
'pid': [5, 45, 33, 1, 45, 45, 45, 45],
'bid': [1, 1, 1, 1, 1, 1, 1, 2],
'torder': [140, 140, 15, 15, 90, 90, 90, 90],
'px': [50, 90, 10, 5, 90, 90, 90, 90]
}
)
returns = pd.DataFrame(
{
'pid': [33, 45, 1],
'bid': [1, 1, 1],
'px': [10, 90, 5],
'amount': [1, 2, 1]
}
)
orders['temp'] = 1
orders['rid'] = orders.groupby(['pid', 'bid', 'px'])['temp'].transform(pd.Series.cumsum)
orders = orders.merge(returns, on=['pid', 'bid', 'px'], how='outer').fillna(0)
left_orders = orders[orders.rid > orders.amount].drop(columns=['temp', 'rid', 'amount'])
print(left_orders)
输出:
orderId pid bid torder px
0 a 5 1 140 50
3 d 45 1 90 90
4 e 45 1 90 90
7 f 45 2 90 90
欢迎来到SO!如果你提供一个答案,它真的能帮助用户找到答案。欢迎使用SO!如果你提供了一个答案,它真的能帮助用户找到答案。