Python 创建标识一系列列行为的新列
我有以下数据框(有关excel文件,请参阅下面的链接): 我希望结果如图所示(使用新的列更新类型): 每个帐户都有一个或多个由合同id表示的合同。每个合同也有自己的月期限(月期限) 续约类型应为“定期”或“提前”。如果一份合同的前一份合同尚未到期,或合同期限尚未结束(根据合同期限,从合同生效之日开始付款,之后根据合同期限数按月付款),且在最后四个月内仍有付款,则该合同被视为“提前”合同(根据以日期为标题的列显示每月付款)。如果合同是第一份合同,前一份合同在过去四个月内没有付款,或者前一份合同已经结束其期限,则认为该合同是“定期”合同 尝试使用此代码执行此操作,但存在一些问题,因为它将一些“早期”分类为“常规”(对于续订类型,请注意,此代码还包含另一列合同类型的循环):Python 创建标识一系列列行为的新列,python,pandas,dataframe,numpy,Python,Pandas,Dataframe,Numpy,我有以下数据框(有关excel文件,请参阅下面的链接): 我希望结果如图所示(使用新的列更新类型): 每个帐户都有一个或多个由合同id表示的合同。每个合同也有自己的月期限(月期限) 续约类型应为“定期”或“提前”。如果一份合同的前一份合同尚未到期,或合同期限尚未结束(根据合同期限,从合同生效之日开始付款,之后根据合同期限数按月付款),且在最后四个月内仍有付款,则该合同被视为“提前”合同(根据以日期为标题的列显示每月付款)。如果合同是第一份合同,前一份合同在过去四个月内没有付款,或者前一份合同已经
我不能包括字典,因为它太长了,这里是指向excel文件的链接:这可以通过字典来记住hisotry。或者,按帐户/开始日期对数据排序,然后使用shift(1)来实现类似的功能
import datetime
from dateutil.relativedelta import relativedelta
d={}
def renewal_type(row):
try:
acct=row['account_id']
result='Early'
if acct not in d: #first contract
result='Regular'
else:
prev=d[acct]
dt=row['date_activated'].replace(day=1)
if sum([abs(prev[dt+relativedelta(months=-n)]) for n in range(4)])==0: #no pay in past 4 mth. I don't quite get where your cut-off is. This can be range(1,5)
result='Regular'
elif prev['date_activated'].replace(day=1)+relativedelta(months=prev['term_months']-1)<dt: #prev contract expired
result='Regular'
d[acct]=row.copy()
except:
print('ERROR',row)
result='ERROR'
return result
df['rt1']=df.apply(renewal_type,axis=1)
df['rt1']
导入日期时间
从dateutil.relativedelta导入relativedelta
d={}
def更新类型(世界其他地区):
尝试:
账户=行['account\u id']
结果‘‘早’
如果账户不在d中:#第一份合同
结果='Regular'
其他:
上一个=d[科目]
dt=行['date_activated']。替换(day=1)
如果范围(4)内n的总和([abs(prev[dt+relativedelta(months=-n)])=0:#过去4个月没有工资。我不太明白你的截止点在哪里。这可以是范围(1,5)
结果='Regular'
elif prev['date_activated']替换(day=1)+relativedelta(months=prev['term_months']-1)你能提供你的文件的简化版本吗?只有3-4列的5-10行。嗨,我不认为这是可能的,因为我还需要考虑这些合同在4个月内没有付款。嗨!你也能帮助我解决这个问题吗?为什么我会得到一个‘键错:时间戳’(2019-0601 0:00:00)“?在您发布的数据中,它是有效的。我已更新代码以打印将触发异常的场景
account_id contract_id date_activated term_months renewal_type 2009-01-01 00:00:00 2009-02-01 00:00:00 2009-03-01 00:00:00 2009-04-01 00:00:00 2009-05-01 00:00:00 ... 2020-06-01 00:00:00 2020-07-01 00:00:00 2020-08-01 00:00:00 2020-09-01 00:00:00 2020-10-01 00:00:00 2020-11-01 00:00:00 2020-12-01 00:00:00 2021-01-01 00:00:00 2021-02-01 00:00:00 2021-03-01 00:00:00
0 1234 A 2009-07-01 24 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
1 1212 B 2019-06-25 24 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
2 1111 C 2014-03-13 24 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
3 11112 FF 2017-02-09 12 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
4 5454 FAS 2015-08-04 36 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
5 48468 DFAF 2010-06-10 12 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
6 89795 SDFDF 2017-09-19 24 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
7 12454545 FADS 2017-06-26 12 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
8 12454545 FDAGDG 2018-06-01 12 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
9 12454545 ADGADGFAD 2019-01-28 12 Early 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
10 12454545 ADGADGASDGADSG 2020-01-24 12 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
11 12454545 ADD 2020-03-02 11 Early 0 0 0 0 0 ... 620.984848 620.984848 620.984848 620.984848 620.984848 620.984848 620.984848 620.984848 0.00 0.00
12 12454545 ADFGG 2021-02-24 12 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 687.94 687.94
13 1646468 ASDADGAD 2019-10-14 36 Regular 0 0 0 0 0 ... 1504.700000 1504.700000 1504.700000 1504.700000 1504.700000 1504.700000 1504.700000 1504.700000 1504.70 1504.70
14 5454555 ADGA 2018-04-02 30 Regular 0 0 0 0 0 ... 528.000000 528.000000 528.000000 528.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
15 48654 GHDG 2018-10-18 36 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
16 4546486 DGHDG 2009-01-01 12 Regular 323 323 323 323 323 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
17 4546486 DFGHGDHDGH 2009-05-07 12 Early 0 0 0 0 399 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
18 4546486 DFGAA 2009-09-10 12 Early 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
19 4546486 SGFHJJ 2010-09-08 36 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
20 4546486 SFGHJR 2013-09-06 36 Regular 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
21 4546486 HTUIJR 2015-10-27 36 Early 0 0 0 0 0 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00 0.00
def get_types(monthly_payments):
def f(s):
check = monthly_payments.loc[
(s.date_activated.year == monthly_payments.index.year) &
(s.date_activated.month == monthly_payments.index.month)
].iloc[0]
if check.wb == 0:
# If rolling sum of 4 months prior is 0
s['contract_type'] = 'Winback'
elif check.og_upg == 0:
# If Prior Month is 0
s['contract_type'] = 'Original'
elif check.max_pmt > check.og_upg:
# If Prior Month is not missing and current month is more
s['contract_type'] = 'Upgrade'
else:
s['contract_type'] = 'Renewal'
if check.early:
# If Early
s['renewal_type'] = 'Early'
else:
s['renewal_type'] = 'Regular'
return s
return f
def apply_types(g):
# Get Non Payment Info
account_info = g[g.columns[:4]]
# Transpose Monthly Payments To Rows
monthly_payments = g.loc[:, g.columns[4:]].T
# Make Sure Index is DT
monthly_payments.index = pd.to_datetime(monthly_payments.index)
# Get Check for is early based on number of payments
monthly_payments['early'] = monthly_payments.astype(bool).sum(axis=1) > 1
# Max Payment In Month
monthly_payments['max_pmt'] = monthly_payments.max(axis=1)
# 1 Month Prior
monthly_payments['og_upg'] = monthly_payments.max_pmt.shift().fillna(0)
# Rolling Sum of 4 Months Prior
monthly_payments['wb'] = monthly_payments.max_pmt \
.rolling(min_periods=0, window=4).sum().shift()
# Concat New Columns With Original Payment Information
return pd.concat((
account_info.apply(get_types(monthly_payments), axis=1),
g[g.columns[4:]]
), axis=1)
df = df.groupby('account_id', as_index=False).apply(apply_types).reset_index(drop=True)
import datetime
from dateutil.relativedelta import relativedelta
d={}
def renewal_type(row):
try:
acct=row['account_id']
result='Early'
if acct not in d: #first contract
result='Regular'
else:
prev=d[acct]
dt=row['date_activated'].replace(day=1)
if sum([abs(prev[dt+relativedelta(months=-n)]) for n in range(4)])==0: #no pay in past 4 mth. I don't quite get where your cut-off is. This can be range(1,5)
result='Regular'
elif prev['date_activated'].replace(day=1)+relativedelta(months=prev['term_months']-1)<dt: #prev contract expired
result='Regular'
d[acct]=row.copy()
except:
print('ERROR',row)
result='ERROR'
return result
df['rt1']=df.apply(renewal_type,axis=1)
df['rt1']