Python 熊猫：按第一次和最后一次出现填充每一行_Python_Pandas_Jupyter Notebook_Data Science

Python 熊猫：按第一次和最后一次出现填充每一行

python pandas jupyter-notebook

Python 熊猫：按第一次和最后一次出现填充每一行,python,pandas,jupyter-notebook,data-science,Python,Pandas,Jupyter Notebook,Data Science,我的数据包括发票和客户。一个客户可以有多张发票。一张发票始终属于一个客户。发票每天更新（报告日期）我的目标是以天为单位计算客户的年龄（参见“以天为单位的年龄”一栏）。为了实现这一点，我采用客户报告日期的第一次出现，并计算与报告日期最后一次出现的差异 e、 g.客户1发生在2014年8月至2015年8月。因此，他/她1天大 Report Date Invoice No Customer No Amount Age in Days 2018-08-14 A 1

我的数据包括发票和客户。一个客户可以有多张发票。一张发票始终属于一个客户。发票每天更新（报告日期）

我的目标是以天为单位计算客户的年龄（参见“以天为单位的年龄”一栏）。为了实现这一点，我采用客户报告日期的第一次出现，并计算与报告日期最后一次出现的差异

e、 g.客户1发生在2014年8月至2015年8月。因此，他/她1天大

Report Date  Invoice No   Customer No  Amount  Age in Days
2018-08-14   A            1            50$     1
2018-08-14   B            1            100$    1
2018-08-14   C            2            75$     2

2018-08-15   A            1            20$     1
2018-08-15   B            1            45$     1
2018-08-15   C            2            70$     2

2018-08-16   C            2            40$     1
2018-08-16   D            3            100$    0
2018-08-16   E            3            60$     0

我解决了这个问题，但是效率很低，而且耗时太长。我的数据包含2600万行。下面我只计算了一位客户的年龄

# List every customer no
customerNo = df["Customer No"].unique()
customer_age = []

# Testing for one specific customer
testCustomer = df.loc[df["Customer No"] == customerNo[0]]
testCustomer = testCustomer.sort_values(by="Report Date", ascending=True)

first_occur = testCustomer.iloc[0]['Report Date']
last_occur = testCustomer.iloc[-1]['Report Date']
age = (last_occur - first_occur).days

customer_age.extend([age] * len(testCustomer))
testCustomer.loc[:,'Customer Age']=customer_age

有没有更好的方法来解决这个问题？

如果您需要每个客户一个值，表明其年龄，您可以使用一个组（非常常见）：

如果您需要每个客户一个值，表示其年龄，您可以使用group by（非常常见）：

与和聚合一起使用：

grps = df.groupby('Customer No')['Report Date']    
df['Age in Days'] = (grps.transform('last') - grps.transform('first')).dt.days

[外]

与和聚合一起使用：

grps = df.groupby('Customer No')['Report Date']    
df['Age in Days'] = (grps.transform('last') - grps.transform('first')).dt.days

[外]

非常感谢你！这很有效。在10秒内计算出2600万行@阿兹里翁·尼斯：）非常感谢！这很有效。在10秒内计算出2600万行@阿兹里翁·尼斯：）

  Report Date Invoice No  Customer No Amount  Age in Days
0  2018-08-14          A            1    50$            1
1  2018-08-14          B            1   100$            1
2  2018-08-14          C            2    75$            2
3  2018-08-15          A            1    20$            1
4  2018-08-15          B            1    45$            1
5  2018-08-15          C            2    70$            2
6  2018-08-16          C            2    40$            2
7  2018-08-16          D            3   100$            0
8  2018-08-16          E            3    60$            0