如何使嵌套for循环更有效并与apply一起使用_R_For Loop_Lapply

如何使嵌套for循环更有效并与apply一起使用

r for-loop

如何使嵌套for循环更有效并与apply一起使用,r,for-loop,lapply,R,For Loop,Lapply,我正在尝试将功能嵌套的for循环转换为与apply一起使用。我希望这将使它更快。（从我读到的应该是，虽然这并不总是正确的）主数据帧中有大约150K行需要循环…非常耗时我在R中编写了一个for循环，检查df1中的date.time是否位于df2中的两个date.times之间，如果df1和df2中的代码匹配，则df2中的位置将粘贴到df1中下面是子集示例数据 df1<-structure(list(date.time = structure(c(1455922438, 145592244

我正在尝试将功能嵌套的for循环转换为与apply一起使用。我希望这将使它更快。（从我读到的应该是，虽然这并不总是正确的）主数据帧中有大约150K行需要循环…非常耗时

我在R中编写了一个for循环，检查df1中的date.time是否位于df2中的两个date.times之间，如果df1和df2中的代码匹配，则df2中的位置将粘贴到df1中

下面是子集示例数据

df1<-structure(list(date.time = structure(c(1455922438, 1455922445, 
1455922449, 1455922457, 1455922459, 1455922461), class = c("POSIXct", 
"POSIXt"), tzone = ""), code = c(32221, 32222, 32221, 32222, 
32222, 32221)), .Names = c("date.time", "code"), row.names = 50000:50005, class = "data.frame")

df2<-structure(list(Location = 11:12, Code = 32221:32222, t_in = structure(c(1455699600, 
1455699600), class = c("POSIXct", "POSIXt"), tzone = ""), t_out = structure(c(1456401600, 
1456401600), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("Location", 
"Code", "t_in", "t_out"), class = "data.frame", row.names = 11:12)

df1包data.table
有重叠的范围连接，可以很快完成这项工作。您要查找的函数是foverlaps
。下面是一个示例，在使用foverlaps
之前进行一点清洁：
require(data.table)

dt1 <- data.table(df1)
dt2 <- data.table(df2)

## need to create a range in dt 1 to find overlaps on
dt1[,start:=date.time]
dt1[,end:=date.time]

## clean up names to match each other
setnames(dt2,c("Location","Code","start","end"))
setnames(dt1,c("code"),c("Code"))

setkey(dt1,Code,start,end)
setkey(dt2,Code,start,end)

## use foverlaps with the additional matching variable Code
out <- foverlaps(dt1,dt2,type="any",
                 by.x=c("Code","start","end"),
                 by.y=c("Code","start","end"))

## more renaming and selection of the same subset of columns
setnames(out,"i.start","date.time")
out <- out[,.(date.time,Code,Location)]

我试图构建一个“无循环”版本，它不依赖于for
或apply
。看看速度是否更快：
trans <- which( outer(X=df1$code, Y=df2$Code,'==') & 
                outer(df1$date.time , df2$t_in, ">") & 
                outer(df1$date.time, df2$t_out , "<")  , arr.ind=TRUE)
df1$Location [ trans[,1] ] <- df2$Location [ trans[,2] ]
df1
#------
                date.time  code Location
50000 2016-02-19 14:53:58 32221       11
50001 2016-02-19 14:54:05 32222       12
50002 2016-02-19 14:54:09 32221       11
50003 2016-02-19 14:54:17 32222       12
50004 2016-02-19 14:54:19 32222       12
50005 2016-02-19 14:54:21 32221       11

trans可能比我的三重outer
方法快，但我希望OP将报告系统时间-两种方法的结果与他的双for-loop方法相比。如果您能提供一个稍微好一点的测试用例，其中一些I，j组合在时间条件下不满足，那将是理想的。在任何情况下，请报告system.time结果。您可以尝试使用sqldf包，看看是否将df转换为本地数据库，然后对其执行查询有助于提高速度。
require(data.table)

dt1 <- data.table(df1)
dt2 <- data.table(df2)

## need to create a range in dt 1 to find overlaps on
dt1[,start:=date.time]
dt1[,end:=date.time]

## clean up names to match each other
setnames(dt2,c("Location","Code","start","end"))
setnames(dt1,c("code"),c("Code"))

setkey(dt1,Code,start,end)
setkey(dt2,Code,start,end)

## use foverlaps with the additional matching variable Code
out <- foverlaps(dt1,dt2,type="any",
                 by.x=c("Code","start","end"),
                 by.y=c("Code","start","end"))

## more renaming and selection of the same subset of columns
setnames(out,"i.start","date.time")
out <- out[,.(date.time,Code,Location)]

> out
             date.time  Code Location
1: 2016-02-19 14:53:58 32221       11
2: 2016-02-19 14:54:09 32221       11
3: 2016-02-19 14:54:21 32221       11
4: 2016-02-19 14:54:05 32222       12
5: 2016-02-19 14:54:17 32222       12
6: 2016-02-19 14:54:19 32222       12

trans <- which( outer(X=df1$code, Y=df2$Code,'==') & 
                outer(df1$date.time , df2$t_in, ">") & 
                outer(df1$date.time, df2$t_out , "<")  , arr.ind=TRUE)
df1$Location [ trans[,1] ] <- df2$Location [ trans[,2] ]
df1
#------
                date.time  code Location
50000 2016-02-19 14:53:58 32221       11
50001 2016-02-19 14:54:05 32222       12
50002 2016-02-19 14:54:09 32221       11
50003 2016-02-19 14:54:17 32222       12
50004 2016-02-19 14:54:19 32222       12
50005 2016-02-19 14:54:21 32221       11