R 根据另一个data.table中的上次更改更新data.table
我目前有两个数据表,如下所示:R 根据另一个data.table中的上次更改更新data.table,r,join,data.table,R,Join,Data.table,我目前有两个数据表,如下所示: dt1 <- data.table(urn=c("1","1","1","1","1","1","1","2","2","2","2","3","3","3","3","3","3","3","2"), date=as.Date(c("2014-01-15","2014-02-15","2014-03-15","2014-04-15","2014-05-15","2014-06-15","2014-07-15",
dt1 <- data.table(urn=c("1","1","1","1","1","1","1","2","2","2","2","3","3","3","3","3","3","3","2"),
date=as.Date(c("2014-01-15","2014-02-15","2014-03-15","2014-04-15","2014-05-15","2014-06-15","2014-07-15",
"2014-04-15","2014-05-15","2014-06-15","2014-07-15",
"2014-04-15","2014-03-15","2014-05-15","2014-02-15","2014-06-14","2014-08-15","2014-07-15","2014-09-16")),
amount=c(20,20,15,15,15,20,25,
15,15,20,20,
30,30,30,30,25,25,25,20))
#dt1
# urn date amount
# 1: 1 2014-01-15 20
# 2: 1 2014-02-15 20
# 3: 1 2014-03-15 15
# 4: 1 2014-04-15 15
# 5: 1 2014-05-15 15
# 6: 1 2014-06-15 20
# 7: 1 2014-07-15 25
# 8: 2 2014-04-15 15
# 9: 2 2014-05-15 15
#10: 2 2014-06-15 20
#11: 2 2014-07-15 20
#12: 3 2014-04-15 30
#13: 3 2014-03-15 30
#14: 3 2014-05-15 30
#15: 3 2014-02-15 30
#16: 3 2014-06-14 25
#17: 3 2014-08-15 25
#18: 3 2014-07-15 25
#19: 2 2014-09-16 20
dt2 <- data.table(urn=c("1","2","3"), lastamount=c(25,20,25),lastchangedate=as.Date(c(NA,NA,NA)))
#dt2
# urn lastamount lastchangedate
# 1: 1 25 <NA>
# 2: 2 20 <NA>
# 3: 3 25 <NA>
这显然是一个样本数据集。给出一个数量级,我的真实dt1有350万条记录,dt2有250K条记录
谢谢
更新:
由于我的真实dt2中的列比示例中显示的多,因此我需要能够将它们保留在最终输出中。我不想覆盖dt2的当前实例,只想单独更新lastchangedate。下面是我使用的代码
setkey(dt1, urn) # this is after I had used setkey(dt1, urn, date) to order dt1 properly
setkey(dt2, urn)
dt2[dt1[,list(lastchangedate=max(date[which(diff(amount)!=0)+1])),
by=urn],lastchangedate:=i.lastchangedate]
这应该做到:
setkey(dt1, urn, date) ## sort table
dt2 <- dt1[, list(lastamount=amount[.N],
lastchangedate=max(date[which(diff(amount)!=0)+1])),
by=urn]
setkey(dt1,urn,date)##排序表
dt2是stackoverflow的新手,我试图了解如何提出有用的问题。我真的很感激关于这个问题的反馈对我没有帮助,所以我可以为未来改进。
setkey(dt1, urn, date) ## sort table
dt2 <- dt1[, list(lastamount=amount[.N],
lastchangedate=max(date[which(diff(amount)!=0)+1])),
by=urn]