r data.table条件和引用/从另一个单独的data.table执行查找
我是R和data.tables的初学者,但我已经读了足够多的书,确信它们对于大型数据集的速度/效率。我到处找,找不到我认为容易的问题的答案 问题是:给定两个数据表DT1和DT2r data.table条件和引用/从另一个单独的data.table执行查找,r,data.table,R,Data.table,我是R和data.tables的初学者,但我已经读了足够多的书,确信它们对于大型数据集的速度/效率。我到处找,找不到我认为容易的问题的答案 问题是:给定两个数据表DT1和DT2 DT1 <- data.table(AA=c("A","B","C","A","B","C","A","B","C","A","B","C"), BB=c(35,45,25,25,85,15,55,55,95,35,25,75) ) DT2
DT1 <- data.table(AA=c("A","B","C","A","B","C","A","B","C","A","B","C"),
BB=c(35,45,25,25,85,15,55,55,95,35,25,75)
)
DT2 <- data.table(CC=c("A","B","C","A","B","C"),
DD=c(10,20,30,40,50,60),
EE=c(5,5,10,10,15,20)
)
在NewCol的每个单元格中复制。我不想把这篇文章和两个Excel表格混在一起,显然Excel并不是很好,因为有很多原因,但如果在Excel中这么简单,那么在R data.tables中也一定相当简单,对吧?在最新的
希望得到的联接列的命名在将来会变得更加合理,因为现在只需使用
setnames
就可以了!!!谢谢你,埃迪!!!对于那些正在努力安装1.9.7的人(我在Windows环境中),我发现在安装Rtools时必须选择“编辑路径”选项才能使其正常工作。我重新启动了。
DT1_DesiredOutput <- data.table(AA=c("A","B","C","A","B","C","A","B","C","A","B","C"),
BB=c(35,45,25,25,85,15,55,55,95,35,25,75),
NewCol=c(10,15,30,10,0,30,0,0,0,10,15,0)
)
{=SUM(IF( ($F$7:$F$12=B7)*($G$7:$G$12>C7), $H$7:$H$12))}
DT2[DT1, on = .(CC = AA, DD >= BB), .(NewCol = sum(EE, na.rm = T)), by = .EACHI]
# CC DD NewCol
# 1: A 35 10
# 2: B 45 15
# 3: C 25 30
# 4: A 25 10
# 5: B 85 0
# 6: C 15 30
# 7: A 55 0
# 8: B 55 0
# 9: C 95 0
#10: A 35 10
#11: B 25 15
#12: C 75 0