Python 创建标识一系列列行为的新列

Python 创建标识一系列列行为的新列,python,pandas,dataframe,numpy,Python,Pandas,Dataframe,Numpy,我有以下数据框(有关excel文件,请参阅下面的链接): 我希望结果如图所示(使用新的列更新类型): 每个帐户都有一个或多个由合同id表示的合同。每个合同也有自己的月期限(月期限) 续约类型应为“定期”或“提前”。如果一份合同的前一份合同尚未到期,或合同期限尚未结束(根据合同期限,从合同生效之日开始付款,之后根据合同期限数按月付款),且在最后四个月内仍有付款,则该合同被视为“提前”合同(根据以日期为标题的列显示每月付款)。如果合同是第一份合同,前一份合同在过去四个月内没有付款,或者前一份合同已经

我有以下数据框(有关excel文件,请参阅下面的链接):

我希望结果如图所示(使用新的列更新类型):

每个帐户都有一个或多个由合同id表示的合同。每个合同也有自己的月期限(月期限)

续约类型应为“定期”或“提前”。如果一份合同的前一份合同尚未到期,或合同期限尚未结束(根据合同期限,从合同生效之日开始付款,之后根据合同期限数按月付款),且在最后四个月内仍有付款,则该合同被视为“提前”合同(根据以日期为标题的列显示每月付款)。如果合同是第一份合同,前一份合同在过去四个月内没有付款,或者前一份合同已经结束其期限,则认为该合同是“定期”合同

尝试使用此代码执行此操作,但存在一些问题,因为它将一些“早期”分类为“常规”(对于续订类型,请注意,此代码还包含另一列合同类型的循环):


我不能包括字典,因为它太长了,这里是指向excel文件的链接:

这可以通过字典来记住hisotry。或者,按帐户/开始日期对数据排序,然后使用shift(1)来实现类似的功能

import datetime
from dateutil.relativedelta import relativedelta
d={}
def renewal_type(row):
    try:
        acct=row['account_id']
        result='Early'
        if acct not in d: #first contract
            result='Regular'
        else:
            prev=d[acct]
            dt=row['date_activated'].replace(day=1)
            if sum([abs(prev[dt+relativedelta(months=-n)]) for n in range(4)])==0: #no pay in past 4 mth. I don't quite get where your cut-off is. This can be range(1,5)
                result='Regular'
            elif prev['date_activated'].replace(day=1)+relativedelta(months=prev['term_months']-1)<dt: #prev contract expired
                result='Regular'
        d[acct]=row.copy()
    except:
        print('ERROR',row)
        result='ERROR'
    return result
df['rt1']=df.apply(renewal_type,axis=1)
df['rt1']
导入日期时间
从dateutil.relativedelta导入relativedelta
d={}
def更新类型(世界其他地区):
尝试:
账户=行['account\u id']
结果‘‘早’
如果账户不在d中:#第一份合同
结果='Regular'
其他:
上一个=d[科目]
dt=行['date_activated']。替换(day=1)
如果范围(4)内n的总和([abs(prev[dt+relativedelta(months=-n)])=0:#过去4个月没有工资。我不太明白你的截止点在哪里。这可以是范围(1,5)
结果='Regular'

elif prev['date_activated']替换(day=1)+relativedelta(months=prev['term_months']-1)你能提供你的文件的简化版本吗?只有3-4列的5-10行。嗨,我不认为这是可能的,因为我还需要考虑这些合同在4个月内没有付款。嗨!你也能帮助我解决这个问题吗?为什么我会得到一个‘键错:时间戳’(2019-0601 0:00:00)“?在您发布的数据中,它是有效的。我已更新代码以打印将触发异常的场景
account_id  contract_id date_activated  term_months renewal_type    2009-01-01 00:00:00 2009-02-01 00:00:00 2009-03-01 00:00:00 2009-04-01 00:00:00 2009-05-01 00:00:00 ... 2020-06-01 00:00:00 2020-07-01 00:00:00 2020-08-01 00:00:00 2020-09-01 00:00:00 2020-10-01 00:00:00 2020-11-01 00:00:00 2020-12-01 00:00:00 2021-01-01 00:00:00 2021-02-01 00:00:00 2021-03-01 00:00:00
0   1234    A   2009-07-01  24  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
1   1212    B   2019-06-25  24  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
2   1111    C   2014-03-13  24  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
3   11112   FF  2017-02-09  12  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
4   5454    FAS 2015-08-04  36  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
5   48468   DFAF    2010-06-10  12  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
6   89795   SDFDF   2017-09-19  24  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
7   12454545    FADS    2017-06-26  12  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
8   12454545    FDAGDG  2018-06-01  12  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
9   12454545    ADGADGFAD   2019-01-28  12  Early   0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
10  12454545    ADGADGASDGADSG  2020-01-24  12  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
11  12454545    ADD 2020-03-02  11  Early   0   0   0   0   0   ... 620.984848  620.984848  620.984848  620.984848  620.984848  620.984848  620.984848  620.984848  0.00    0.00
12  12454545    ADFGG   2021-02-24  12  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    687.94  687.94
13  1646468 ASDADGAD    2019-10-14  36  Regular 0   0   0   0   0   ... 1504.700000 1504.700000 1504.700000 1504.700000 1504.700000 1504.700000 1504.700000 1504.700000 1504.70 1504.70
14  5454555 ADGA    2018-04-02  30  Regular 0   0   0   0   0   ... 528.000000  528.000000  528.000000  528.000000  0.000000    0.000000    0.000000    0.000000    0.00    0.00
15  48654   GHDG    2018-10-18  36  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
16  4546486 DGHDG   2009-01-01  12  Regular 323 323 323 323 323 ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
17  4546486 DFGHGDHDGH  2009-05-07  12  Early   0   0   0   0   399 ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
18  4546486 DFGAA   2009-09-10  12  Early   0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
19  4546486 SGFHJJ  2010-09-08  36  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
20  4546486 SFGHJR  2013-09-06  36  Regular 0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
21  4546486 HTUIJR  2015-10-27  36  Early   0   0   0   0   0   ... 0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.00    0.00
def get_types(monthly_payments):
    def f(s):
        check = monthly_payments.loc[
            (s.date_activated.year == monthly_payments.index.year) &
            (s.date_activated.month == monthly_payments.index.month)
            ].iloc[0]

        if check.wb == 0:
            # If rolling sum of 4 months prior is 0
            s['contract_type'] = 'Winback'
        elif check.og_upg == 0:
            # If Prior Month is 0
            s['contract_type'] = 'Original'

        elif check.max_pmt > check.og_upg:
            # If Prior Month is not missing and current month is more
            s['contract_type'] = 'Upgrade'
        else:
            s['contract_type'] = 'Renewal'

        if check.early:
            # If Early
            s['renewal_type'] = 'Early'
        else:
            s['renewal_type'] = 'Regular'
        return s

    return f

def apply_types(g):
    # Get Non Payment Info
    account_info = g[g.columns[:4]]
    # Transpose Monthly Payments To Rows
    monthly_payments = g.loc[:, g.columns[4:]].T
    # Make Sure Index is DT
    monthly_payments.index = pd.to_datetime(monthly_payments.index)
    # Get Check for is early based on number of payments
    monthly_payments['early'] = monthly_payments.astype(bool).sum(axis=1) > 1
    # Max Payment In Month
    monthly_payments['max_pmt'] = monthly_payments.max(axis=1)
    # 1 Month Prior
    monthly_payments['og_upg'] = monthly_payments.max_pmt.shift().fillna(0)
    # Rolling Sum of 4 Months Prior
    monthly_payments['wb'] = monthly_payments.max_pmt \
        .rolling(min_periods=0, window=4).sum().shift()
    # Concat New Columns With Original Payment Information
    return pd.concat((
        account_info.apply(get_types(monthly_payments), axis=1),
        g[g.columns[4:]]
    ), axis=1)

df = df.groupby('account_id', as_index=False).apply(apply_types).reset_index(drop=True)
import datetime
from dateutil.relativedelta import relativedelta
d={}
def renewal_type(row):
    try:
        acct=row['account_id']
        result='Early'
        if acct not in d: #first contract
            result='Regular'
        else:
            prev=d[acct]
            dt=row['date_activated'].replace(day=1)
            if sum([abs(prev[dt+relativedelta(months=-n)]) for n in range(4)])==0: #no pay in past 4 mth. I don't quite get where your cut-off is. This can be range(1,5)
                result='Regular'
            elif prev['date_activated'].replace(day=1)+relativedelta(months=prev['term_months']-1)<dt: #prev contract expired
                result='Regular'
        d[acct]=row.copy()
    except:
        print('ERROR',row)
        result='ERROR'
    return result
df['rt1']=df.apply(renewal_type,axis=1)
df['rt1']