Python 使用基于其他列的值填充np.nan
我尝试将Python 使用基于其他列的值填充np.nan,python,pandas,Python,Pandas,我尝试将offer\u id与相应的事务匹配。这是数据集: time event offer_id amount 2077 0 offer received f19421c1d4aa40978ebb69ca19b0e20d NaN 15973 6 offer viewed f19421c1d4aa40978ebb69ca19b0e20d NaN 15974
offer\u id
与相应的事务匹配。这是数据集:
time event offer_id amount
2077 0 offer received f19421c1d4aa40978ebb69ca19b0e20d NaN
15973 6 offer viewed f19421c1d4aa40978ebb69ca19b0e20d NaN
15974 6 transaction NaN 3.43
18470 12 transaction NaN 6.01
18471 12 offer completed f19421c1d4aa40978ebb69ca19b0e20d NaN
43417 108 transaction NaN 11.00
44532 114 transaction NaN 1.69
50587 150 transaction NaN 3.23
55277 168 offer received 9b98b8c7a33c4b65b9aebfe6a799e6d9 NaN
96598 258 transaction NaN 2.18
规则是,当查看报价时,交易属于此报价id。如果报价已收到,但未查看,则交易不属于报价id。我希望time
变量可以清楚地说明这一点。这是期望的结果:
time event offer_id amount
2077 0 offer received f19421c1d4aa40978ebb69ca19b0e20d NaN
15973 6 offer viewed f19421c1d4aa40978ebb69ca19b0e20d NaN
15974 6 transaction f19421c1d4aa40978ebb69ca19b0e20d 3.43
18470 12 transaction f19421c1d4aa40978ebb69ca19b0e20d 6.01
18471 12 offer completed f19421c1d4aa40978ebb69ca19b0e20d NaN
43417 108 transaction NaN 11.00
44532 114 transaction NaN 1.69
50587 150 transaction NaN 3.23
55277 168 offer received 9b98b8c7a33c4b65b9aebfe6a799e6d9 NaN
96598 258 transaction NaN 2.18
示例代码:
import pandas as pd
import numpy as np
d = {'time': [0, 6, 6, 12, 12, 108, 144, 150, 168, 258],
'event': ["offer received", "offer viewed", "transaction", "transaction", "offer completed", "transaction", "transaction", "transaction", "offer received", "transaction"],
'offer_id': ["f19421c1d4aa40978ebb69ca19b0e20d", "f19421c1d4aa40978ebb69ca19b0e20d", np.nan, np.nan, "f19421c1d4aa40978ebb69ca19b0e20d", np.nan, np.nan, np.nan, "9b98b8c7a33c4b65b9aebfe6a799e6d9", np.nan]}
df = pd.DataFrame(d)
print("Original data:\n{}\n".format(df))
is_offer_viewed = False
now_offer_id = np.nan
for index, row in df.iterrows():
if row['event'] == "offer viewed":
is_offer_viewed = True
now_offer_id = row['offer_id']
elif row['event'] == "transaction" and is_offer_viewed:
df.at[index, 'offer_id'] = now_offer_id
elif row['event'] == "offer completed":
is_offer_viewed = False
now_offer_id = np.nan
print("Processed data:\n{}\n".format(df))
产出:
Original data:
time event offer_id
0 0 offer received f19421c1d4aa40978ebb69ca19b0e20d
1 6 offer viewed f19421c1d4aa40978ebb69ca19b0e20d
2 6 transaction NaN
3 12 transaction NaN
4 12 offer completed f19421c1d4aa40978ebb69ca19b0e20d
5 108 transaction NaN
6 144 transaction NaN
7 150 transaction NaN
8 168 offer received 9b98b8c7a33c4b65b9aebfe6a799e6d9
9 258 transaction NaN
Processed data:
time event offer_id
0 0 offer received f19421c1d4aa40978ebb69ca19b0e20d
1 6 offer viewed f19421c1d4aa40978ebb69ca19b0e20d
2 6 transaction f19421c1d4aa40978ebb69ca19b0e20d
3 12 transaction f19421c1d4aa40978ebb69ca19b0e20d
4 12 offer completed f19421c1d4aa40978ebb69ca19b0e20d
5 108 transaction NaN
6 144 transaction NaN
7 150 transaction NaN
8 168 offer received 9b98b8c7a33c4b65b9aebfe6a799e6d9
9 258 transaction NaN
直到报价完成,对吗?是的,完全正确,在那之后必须收到新的报价,并且viewed@DataMastery很乐意帮忙:)