Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/71.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
财务数据-R数据表-按条件分组_R_Data.table_Grouping_Self Join - Fatal编程技术网

财务数据-R数据表-按条件分组

财务数据-R数据表-按条件分组,r,data.table,grouping,self-join,R,Data.table,Grouping,Self Join,给出以下数据。带有财务数据的表: userId systemBankId accountId valueDate quantity description 871 0065 6422 2013-02-28 -52400 AMORTIZACION PRESTAMO 871 0065 6422 2013-03-28 -52400 AMORTIZACION PRE

给出以下
数据。带有财务数据的表

userId  systemBankId    accountId   valueDate   quantity    description
871     0065            6422        2013-02-28  -52400      AMORTIZACION PRESTAMO       
871     0065            6422        2013-03-28  -52400  AMORTIZACION PRESTAMO   
871     0065            6422        2013-04-01  -3000000    AMORTIZACION PRESTAMO
871     0065            6422        2013-04-30  -52349  AMORTIZACION PRESTAMO   
871     0065            6422        2013-05-31  -52349  AMORTIZACION PRESTAMO   
871     0065            6422        2013-06-28  -52349  AMORTIZACION PRESTAMO   
871     0065            6422        2013-07-30  -52349  AMORTIZACION PRESTAMO   
871     0065            6422        2013-08-30  -52349  AMORTIZACION PRESTAMO   
871     0065            6422        2013-09-30  -52349  AMORTIZACION PRESTAMO   
871     0065            6422        2013-10-30  -52349  AMORTIZACION PRESTAMO   
871     0065            6422        2013-11-29  -52349  AMORTIZACION PRESTAMO   
871     0065            6422        2013-12-30  -52349  AMORTIZACION PRESTAMO   
871     0065            6422        2014-01-30  -52349  AMORTIZACION PRESTAMO   
871     0065            6422        2014-02-28  -52349  AMORTIZACION PRESTAMO   
871     0065            6422        2014-03-31  -52349  AMORTIZACION PRESTAMO   
871     0065            6422        2014-04-30  -52349  AMORTIZACION PRESTAMO   
871     0065            6422        2014-05-30  -52349  AMORTIZACION PRESTAMO   
871     0065            6422        2014-06-30  -52349  AMORTIZACION PRESTAMO   
871     0065            6422        2014-07-30  -52349  AMORTIZACION PRESTAMO   
871     0065            6422        2014-08-29  -52349  AMORTIZACION PRESTAMO   
871     0065            6422        2014-09-30  -52349  AMORTIZACION PRESTAMO   
871     0065            6422        2014-10-30  -52349  AMORTIZACION PRESTAMO   
871     0065            6422        2014-11-28  -52349  AMORTIZACION PRESTAMO
我想按
userId
systemBankId
accountId
数量进行分组:

dt[userId==871L,.N,by=.(userId,systemBankId,accountId,quantity)]
结果如下:

   userId systemBankId accountId quantity  N
   871         0065      6422   -52400     3
   871         0065      6422 -3000000     1
   871         0065      6422   -52349    20
但是,第一笔和第三笔是同一笔交易:抵押付款,第二笔是贷款

我想按以下方式分组:

userId systemBankId accountId quantity N
   871         0065      6422   -XXXXX 23
   871         0065      6422 -3000000  1
因此,您可以看到,在24个月内,该用户有23笔抵押交易和1笔贷款交易付款

问题是:有没有一种简单的方法可以做到这一点?(即):

对于介于[-20%,20%]之间的付款,视为相等

提前谢谢你

致以最良好的祝愿


要获取上述数据的数据帧,请执行以下操作:

structure(list(userId = c(871L, 871L, 871L, 871L, 871L, 871L, 
871L, 871L, 871L, 871L, 871L, 871L, 871L, 871L, 871L, 871L, 871L, 
871L, 871L, 871L, 871L, 871L, 871L), systemBankId = c(65L, 65L, 
65L, 65L, 65L, 65L, 65L, 65L, 65L, 65L, 65L, 65L, 65L, 65L, 65L, 
65L, 65L, 65L, 65L, 65L, 65L, 65L, 65L), accountId = c(6422L, 
6422L, 6422L, 6422L, 6422L, 6422L, 6422L, 6422L, 6422L, 6422L, 
6422L, 6422L, 6422L, 6422L, 6422L, 6422L, 6422L, 6422L, 6422L, 
6422L, 6422L, 6422L, 6422L), valueDate = structure(c(2L, 4L, 
1L, 10L, 23L, 5L, 14L, 16L, 17L, 19L, 8L, 21L, 9L, 3L, 22L, 11L, 
12L, 13L, 15L, 7L, 18L, 20L, 6L), .Label = c("01/04/2013", "28/02/2013", 
"28/02/2014", "28/03/2013", "28/06/2013", "28/11/2014", "29/08/2014", 
"29/11/2013", "30/01/2014", "30/04/2013", "30/04/2014", "30/05/2014", 
"30/06/2014", "30/07/2013", "30/07/2014", "30/08/2013", "30/09/2013", 
"30/09/2014", "30/10/2013", "30/10/2014", "30/12/2013", "31/03/2014", 
"31/05/2013"), class = "factor"), quantity = c(-52400L, -52400L, 
-3000000L, -52349L, -52349L, -52349L, -52349L, -52349L, -52349L, 
-52349L, -52349L, -52349L, -52349L, -52349L, -52349L, -52349L, 
-52349L, -52349L, -52349L, -52349L, -52349L, -52349L, -52349L
), description = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "AMORTIZACION PRESTAMO", class = "factor")), .Names = c("userId", 
"systemBankId", "accountId", "valueDate", "quantity", "description"
), class = "data.frame", row.names = c(NA, -23L))
U p D A T E

最后一步是在原始数据集中标记抵押付款和贷款付款的交易

根据我的回答:

a) 标准:在24个月的时间段内,如果有20个或更多按用户ID、系统银行ID、帐户ID、数量(-20%,20%)计算的经常性交易,则为抵押付款:

tmp <- dt[userId==871L,.N,by=.(userId,systemBankId,accountId,round(quantity * 5, -floor(log10(abs(quantity))))/5)][N>20,list(userId,systemBankId,accountId,round,N)]

userId systemBankId accountId  round  N
871         0065      6422    -52000 23
tmp 20,列表(userId,systemBankId,accountId,round,N)]
userId systemBankId accountId第N轮
871         0065      6422    -52000 23
我知道有23笔抵押贷款交易

b) 我需要确定这23项交易:

tmp2 <- dt[userId==871L,list(userId,systemBankId,accountId,round=round(quantity * 5, -floor(log10(abs(quantity))))/5)]

merge(tmp,tmp2,by=c('userId','systemBankId','accountId','round'))

   userId systemBankId accountId  round  N
   871         0065      6422 -52000 23
   871         0065      6422 -52000 23
   871         0065      6422 -52000 23
   871         0065      6422 -52000 23
   871         0065      6422 -52000 23
   871         0065      6422 -52000 23
   871         0065      6422 -52000 23
   871         0065      6422 -52000 23
   871         0065      6422 -52000 23
   871         0065      6422 -52000 23
   871         0065      6422 -52000 23
   871         0065      6422 -52000 23
   871         0065      6422 -52000 23
   871         0065      6422 -52000 23
   871         0065      6422 -52000 23
   871         0065      6422 -52000 23
   871         0065      6422 -52000 23
   871         0065      6422 -52000 23
   871         0065      6422 -52000 23
   871         0065      6422 -52000 23
   871         0065      6422 -52000 23
   871         0065      6422 -52000 23
   871         0065      6422 -52000 23
userId systemBankId accountId  round  N

tmp2这里有一个关于
dplyr的快速破解:

library(dplyr)
setDF(dt) %>% mutate(quantity =  round(quantity/10000, 0)) %>%
  group_by(userId, systemBankId, accountId, quantity) %>% tally()
其中:

#Source: local data frame [2 x 5]
#Groups: userId, systemBankId, accountId
#
#  userId systemBankId accountId quantity  n
#1    871           65      6422     -300  1
#2    871           65      6422       -5 22
编辑

正如大卫在评论中提到的,这个答案过于简单化了。更为一致的方法类似于罗兰的建议:

library(dplyr)
setDF(dt) %>% 
  mutate(quantity = round(quantity * 5, -floor(log10(abs(quantity))))/5) %>%
  group_by(userId, systemBankId, accountId, quantity) %>% tally()
或使用
数据。表

dt[userId == 871L, .N, by = .(userId, systemBankId, accountId, quantity = round(quantity * 5, -floor(log10(abs(quantity))))/5)]

这里有一个由@dnlbrky创建的聪明函数:

#创建一个函数以返回前面的行

rowShift可能是沿着这些线的smth:

dt[, round.qty := quantity[1] * round(quantity/quantity[1]), by = .(userId, systemBankId, accountId)]

dt[, .N, by = .(userId, systemBankId, accountId, round.qty)]
#   userId systemBankId accountId round.qty  N
#1:    871           65      6422    -52400 22
#2:    871           65      6422  -2986800  1

在您的示例中,您只有2个
-52400
quantity,而不是3个。类似于
DT[userId==871L.N,by=(userId,systemBankId,accountId,quantity=round(quantity*5,-floor(log10(abs(quantity)))/5)]
?@Roland您应该将其作为一个答案发布。不确定为什么不能将OPs代码稍微修改为
DT[userId==871L,.N,by=(userId,systemBankId,accountId,quantity=round(quantity/10000,0))
,为什么
dplyr
?如果
[-20%,20%]
被认为是相等的,那么这个答案在概念上也是错误的,因为你假设morgatge的付款是10000。试试
dt[,quantity:=quantity*100]
然后再次运行代码,看看会发生什么。
#Create a function to return previous rows 
   rowShift <- function(x, shiftLen = 1L) {
     r <- (1L + shiftLen):(length(x) + shiftLen)
     r[r<1] <- NA
     return(x[r])
   }
dt$prev_qty_low  <-rowShift(dt$quantity,-1) * .8
dt$prev_qty_high <-rowShift(dt$quantity,-1) * 1.2
dt[, round.qty := quantity[1] * round(quantity/quantity[1]), by = .(userId, systemBankId, accountId)]

dt[, .N, by = .(userId, systemBankId, accountId, round.qty)]
#   userId systemBankId accountId round.qty  N
#1:    871           65      6422    -52400 22
#2:    871           65      6422  -2986800  1