R 有效的贷款偿还计算

R 有效的贷款偿还计算,r,optimization,data.table,R,Optimization,Data.table,我有一个客户贷款发放和还款表,我已经像这样进行了预处理 customerID | balanceChange | trxDate | TYPE 242105 | 500 | 20170605 | loan 242105 | 1500 | 20170605 | loan 242105 | -1000 | 20170607 | payment 242111 | 50

我有一个客户贷款发放和还款表,我已经像这样进行了预处理

customerID | balanceChange | trxDate        | TYPE
242105     | 500           | 20170605       | loan
242105     | 1500          | 20170605       | loan
242105     | -1000         | 20170607       | payment
242111     | 500           | 20170605       | loan
242111     | -500          | 20170606       | payment
242111     | 500           | 20170607       | loan
242111     | -500          | 20170609       | payment
242151     | 500           | 20170605       | loan
我想做的是(1)计算每天发放的每一笔贷款,其中有多少已全额偿还,以及(2)客户花了多少天才支付这些贷款

还款规则当然是先进先出(FIFO),所以最早的贷款首先得到偿还

在上面的例子中,解决方案是

trxDate      | nRepayments   | timeGap(days)
20170605     | 2             | 1.5
20170606     | 0             | 0
20170607     | 1             | 2
因此,关于解决方案为何是这样的解释是在20170605,发放了4笔贷款(2笔给客户ID 242105,另外两笔给242111和242151),但只有2笔贷款得到了偿还(500笔给242105,500笔给242111)。时间间隔是每个客户偿还所用天数的平均总和(242105在20170607-2天偿还,242111在20170606-1天偿还),因此(2+1)/2=1.5

我尝试用下面的R脚本计算NREPays(我想如果我这样做了,时间间隔应该是小菜一碟)

#Recoveries
data_loans_rec <- data_loans %>% arrange(customerID, trxDate) %>% as.data.table()
data_loans_rec[is.na(data_loans_rec)] <- 0
data_loans_rec <- data_loans_rec[, index := seq_len(.N), by = customerID][!(index == 1 & TYPE == "payment")][, index := seq_len(.N), by = customerID]
n_loans_given <- data_loans[TYPE == "loan", ][, .(nloans = .N), .(payment)][order(payment)]
n_loans_rec <- copy(n_loans_given)
n_loans_rec[, nloans:=0]


unique_cust <- unique(data_loans_rec$customerID)

#Check repayment for every customer================
for (i in 1:length(unique_cust)) {


  cur_cust <- unique_cust[i]
  list_loan <- as.vector(data_loans_rec[customerID == cur_cust & TYPE == "loan", .(balanceChange)]  )
  list_loan_time <- as.vector(data_loans_rec[customerID == cur_cust & TYPE == "loan", .(trxDate) ])
  list_pay <- as.vector(data_loans_rec[customerID == cur_cust & TYPE == "payment", .(balanceChange) ])

  if (dim(list_pay)[1] == 0) { #if there are no payments
    list_pay <- c(0)
  }

  sum_paid <- sum(abs(list_pay))
  i_paid_until <- 0

  for (i_loantime in 1:(dim(list_loan_time)[1])) {
    #if there is only one loan
    if (i_loantime == 0) {
      i_loantime <- 1
    }
    loan_curr <- list_loan[i_loantime]
    loan_left <- loan_curr - sum_paid
    if (loan_left <= 0) {

      n_loans_rec[trxDate == list_loan_time[i_loantime], nloans:=nloans+1]
      sum_paid <- sum_paid - loan_curr
      print (paste(i_loantime, list_loan_time[i_loantime], n_loans_rec[trxDate == list_loan_time[i_loantime], .(nloans)]))
      # break
    } else {
      break
    }



  }

  print (i)


}
#恢复
数据\u loans\u rec%arrange(customerID,trxDate)%>%as.data.table()

data\u loans\u rec[is.na(data\u loans\u rec)]您的逻辑相当复杂,对于这个答案,我不打算完全复制它;我的目的只是给你一些关于如何优化的想法

此外,正如评论中提到的,您可以尝试并行化,或者使用另一种编程语言

无论如何,由于您的设置已经使用了
data.table
,您可以尝试尽可能多地使用全局操作,这通常比大循环更快。举个例子

我首先计算每个客户id的余额和完成的付款总额:

data_loans_rec <- data_loans_rec[, balance := sum(balanceChange), by = customerID]
data_loans_rec <- data_loans_rec[, sumPayments := sum(balanceChange[TYPE == "payment"]), by = customerID]
这样你就得到了想要的结果,至少对于你的例子来说是这样

   customerID balanceChange  trxDate    TYPE index balance sumPayments repaid
1:     242105           500 20170605    loan     1    1000       -1000   TRUE
2:     242105          1500 20170605    loan     2    1000       -1000  FALSE
3:     242105         -1000 20170607 payment     3    1000       -1000     NA
4:     242111           500 20170605    loan     1       0       -1000   TRUE
5:     242111          -500 20170606 payment     2       0       -1000     NA
6:     242111           500 20170607    loan     3       0       -1000   TRUE
7:     242111          -500 20170609 payment     4       0       -1000     NA
8:     242151           500 20170605    loan     1     500           0  FALSE

优点是:最终的循环对更少的客户有效,您已经预先计算了一些内容,并且您依赖
data.table
来实际替换您的循环。希望这种方法能给你带来进步。我认为这是一次尝试。

你的逻辑相当复杂,有了这个答案,我不想完全复制它;我的目的只是给你一些关于如何优化的想法

此外,正如评论中提到的,您可以尝试并行化,或者使用另一种编程语言

无论如何,由于您的设置已经使用了
data.table
,您可以尝试尽可能多地使用全局操作,这通常比大循环更快。举个例子

我首先计算每个客户id的余额和完成的付款总额:

data_loans_rec <- data_loans_rec[, balance := sum(balanceChange), by = customerID]
data_loans_rec <- data_loans_rec[, sumPayments := sum(balanceChange[TYPE == "payment"]), by = customerID]
这样你就得到了想要的结果,至少对于你的例子来说是这样

   customerID balanceChange  trxDate    TYPE index balance sumPayments repaid
1:     242105           500 20170605    loan     1    1000       -1000   TRUE
2:     242105          1500 20170605    loan     2    1000       -1000  FALSE
3:     242105         -1000 20170607 payment     3    1000       -1000     NA
4:     242111           500 20170605    loan     1       0       -1000   TRUE
5:     242111          -500 20170606 payment     2       0       -1000     NA
6:     242111           500 20170607    loan     3       0       -1000   TRUE
7:     242111          -500 20170609 payment     4       0       -1000     NA
8:     242151           500 20170605    loan     1     500           0  FALSE

优点是:最终的循环对更少的客户有效,您已经预先计算了一些内容,并且您依赖
data.table
来实际替换您的循环。希望这种方法能给你带来进步。我认为这是一种尝试。

不是答案,但这是一种可以并行化的情况吗?不是答案,但这是一种可以并行化的情况吗?