R 有效的贷款偿还计算
我有一个客户贷款发放和还款表,我已经像这样进行了预处理R 有效的贷款偿还计算,r,optimization,data.table,R,Optimization,Data.table,我有一个客户贷款发放和还款表,我已经像这样进行了预处理 customerID | balanceChange | trxDate | TYPE 242105 | 500 | 20170605 | loan 242105 | 1500 | 20170605 | loan 242105 | -1000 | 20170607 | payment 242111 | 50
customerID | balanceChange | trxDate | TYPE
242105 | 500 | 20170605 | loan
242105 | 1500 | 20170605 | loan
242105 | -1000 | 20170607 | payment
242111 | 500 | 20170605 | loan
242111 | -500 | 20170606 | payment
242111 | 500 | 20170607 | loan
242111 | -500 | 20170609 | payment
242151 | 500 | 20170605 | loan
我想做的是(1)计算每天发放的每一笔贷款,其中有多少已全额偿还,以及(2)客户花了多少天才支付这些贷款
还款规则当然是先进先出(FIFO),所以最早的贷款首先得到偿还
在上面的例子中,解决方案是
trxDate | nRepayments | timeGap(days)
20170605 | 2 | 1.5
20170606 | 0 | 0
20170607 | 1 | 2
因此,关于解决方案为何是这样的解释是在20170605,发放了4笔贷款(2笔给客户ID 242105,另外两笔给242111和242151),但只有2笔贷款得到了偿还(500笔给242105,500笔给242111)。时间间隔是每个客户偿还所用天数的平均总和(242105在20170607-2天偿还,242111在20170606-1天偿还),因此(2+1)/2=1.5
我尝试用下面的R脚本计算NREPays(我想如果我这样做了,时间间隔应该是小菜一碟)
#Recoveries
data_loans_rec <- data_loans %>% arrange(customerID, trxDate) %>% as.data.table()
data_loans_rec[is.na(data_loans_rec)] <- 0
data_loans_rec <- data_loans_rec[, index := seq_len(.N), by = customerID][!(index == 1 & TYPE == "payment")][, index := seq_len(.N), by = customerID]
n_loans_given <- data_loans[TYPE == "loan", ][, .(nloans = .N), .(payment)][order(payment)]
n_loans_rec <- copy(n_loans_given)
n_loans_rec[, nloans:=0]
unique_cust <- unique(data_loans_rec$customerID)
#Check repayment for every customer================
for (i in 1:length(unique_cust)) {
cur_cust <- unique_cust[i]
list_loan <- as.vector(data_loans_rec[customerID == cur_cust & TYPE == "loan", .(balanceChange)] )
list_loan_time <- as.vector(data_loans_rec[customerID == cur_cust & TYPE == "loan", .(trxDate) ])
list_pay <- as.vector(data_loans_rec[customerID == cur_cust & TYPE == "payment", .(balanceChange) ])
if (dim(list_pay)[1] == 0) { #if there are no payments
list_pay <- c(0)
}
sum_paid <- sum(abs(list_pay))
i_paid_until <- 0
for (i_loantime in 1:(dim(list_loan_time)[1])) {
#if there is only one loan
if (i_loantime == 0) {
i_loantime <- 1
}
loan_curr <- list_loan[i_loantime]
loan_left <- loan_curr - sum_paid
if (loan_left <= 0) {
n_loans_rec[trxDate == list_loan_time[i_loantime], nloans:=nloans+1]
sum_paid <- sum_paid - loan_curr
print (paste(i_loantime, list_loan_time[i_loantime], n_loans_rec[trxDate == list_loan_time[i_loantime], .(nloans)]))
# break
} else {
break
}
}
print (i)
}
#恢复
数据\u loans\u rec%arrange(customerID,trxDate)%>%as.data.table()
data\u loans\u rec[is.na(data\u loans\u rec)]您的逻辑相当复杂,对于这个答案,我不打算完全复制它;我的目的只是给你一些关于如何优化的想法
此外,正如评论中提到的,您可以尝试并行化,或者使用另一种编程语言
无论如何,由于您的设置已经使用了data.table
,您可以尝试尽可能多地使用全局操作,这通常比大循环更快。举个例子
我首先计算每个客户id的余额和完成的付款总额:
data_loans_rec <- data_loans_rec[, balance := sum(balanceChange), by = customerID]
data_loans_rec <- data_loans_rec[, sumPayments := sum(balanceChange[TYPE == "payment"]), by = customerID]
这样你就得到了想要的结果,至少对于你的例子来说是这样
customerID balanceChange trxDate TYPE index balance sumPayments repaid
1: 242105 500 20170605 loan 1 1000 -1000 TRUE
2: 242105 1500 20170605 loan 2 1000 -1000 FALSE
3: 242105 -1000 20170607 payment 3 1000 -1000 NA
4: 242111 500 20170605 loan 1 0 -1000 TRUE
5: 242111 -500 20170606 payment 2 0 -1000 NA
6: 242111 500 20170607 loan 3 0 -1000 TRUE
7: 242111 -500 20170609 payment 4 0 -1000 NA
8: 242151 500 20170605 loan 1 500 0 FALSE
优点是:最终的循环对更少的客户有效,您已经预先计算了一些内容,并且您依赖data.table
来实际替换您的循环。希望这种方法能给你带来进步。我认为这是一次尝试。你的逻辑相当复杂,有了这个答案,我不想完全复制它;我的目的只是给你一些关于如何优化的想法
此外,正如评论中提到的,您可以尝试并行化,或者使用另一种编程语言
无论如何,由于您的设置已经使用了data.table
,您可以尝试尽可能多地使用全局操作,这通常比大循环更快。举个例子
我首先计算每个客户id的余额和完成的付款总额:
data_loans_rec <- data_loans_rec[, balance := sum(balanceChange), by = customerID]
data_loans_rec <- data_loans_rec[, sumPayments := sum(balanceChange[TYPE == "payment"]), by = customerID]
这样你就得到了想要的结果,至少对于你的例子来说是这样
customerID balanceChange trxDate TYPE index balance sumPayments repaid
1: 242105 500 20170605 loan 1 1000 -1000 TRUE
2: 242105 1500 20170605 loan 2 1000 -1000 FALSE
3: 242105 -1000 20170607 payment 3 1000 -1000 NA
4: 242111 500 20170605 loan 1 0 -1000 TRUE
5: 242111 -500 20170606 payment 2 0 -1000 NA
6: 242111 500 20170607 loan 3 0 -1000 TRUE
7: 242111 -500 20170609 payment 4 0 -1000 NA
8: 242151 500 20170605 loan 1 500 0 FALSE
优点是:最终的循环对更少的客户有效,您已经预先计算了一些内容,并且您依赖data.table
来实际替换您的循环。希望这种方法能给你带来进步。我认为这是一种尝试。不是答案,但这是一种可以并行化的情况吗?不是答案,但这是一种可以并行化的情况吗?