Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/logging/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使嵌套for循环更有效并与apply一起使用_R_For Loop_Lapply - Fatal编程技术网

如何使嵌套for循环更有效并与apply一起使用

如何使嵌套for循环更有效并与apply一起使用,r,for-loop,lapply,R,For Loop,Lapply,我正在尝试将功能嵌套的for循环转换为与apply一起使用。我希望这将使它更快。(从我读到的应该是,虽然这并不总是正确的)主数据帧中有大约150K行需要循环…非常耗时 我在R中编写了一个for循环,检查df1中的date.time是否位于df2中的两个date.times之间,如果df1和df2中的代码匹配,则df2中的位置将粘贴到df1中 下面是子集示例数据 df1<-structure(list(date.time = structure(c(1455922438, 145592244

我正在尝试将功能嵌套的for循环转换为与apply一起使用。我希望这将使它更快。(从我读到的应该是,虽然这并不总是正确的)主数据帧中有大约150K行需要循环…非常耗时

我在R中编写了一个for循环,检查df1中的date.time是否位于df2中的两个date.times之间,如果df1和df2中的代码匹配,则df2中的位置将粘贴到df1中

下面是子集示例数据

df1<-structure(list(date.time = structure(c(1455922438, 1455922445, 
1455922449, 1455922457, 1455922459, 1455922461), class = c("POSIXct", 
"POSIXt"), tzone = ""), code = c(32221, 32222, 32221, 32222, 
32222, 32221)), .Names = c("date.time", "code"), row.names = 50000:50005, class = "data.frame")

df2<-structure(list(Location = 11:12, Code = 32221:32222, t_in = structure(c(1455699600, 
1455699600), class = c("POSIXct", "POSIXt"), tzone = ""), t_out = structure(c(1456401600, 
1456401600), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("Location", 
"Code", "t_in", "t_out"), class = "data.frame", row.names = 11:12)

df1包
data.table
有重叠的范围连接,可以很快完成这项工作。您要查找的函数是
foverlaps
。下面是一个示例,在使用
foverlaps
之前进行一点清洁:

require(data.table)

dt1 <- data.table(df1)
dt2 <- data.table(df2)

## need to create a range in dt 1 to find overlaps on
dt1[,start:=date.time]
dt1[,end:=date.time]

## clean up names to match each other
setnames(dt2,c("Location","Code","start","end"))
setnames(dt1,c("code"),c("Code"))

setkey(dt1,Code,start,end)
setkey(dt2,Code,start,end)

## use foverlaps with the additional matching variable Code
out <- foverlaps(dt1,dt2,type="any",
                 by.x=c("Code","start","end"),
                 by.y=c("Code","start","end"))

## more renaming and selection of the same subset of columns
setnames(out,"i.start","date.time")
out <- out[,.(date.time,Code,Location)]
我试图构建一个“无循环”版本,它不依赖于
for
apply
。看看速度是否更快:

trans <- which( outer(X=df1$code, Y=df2$Code,'==') & 
                outer(df1$date.time , df2$t_in, ">") & 
                outer(df1$date.time, df2$t_out , "<")  , arr.ind=TRUE)
df1$Location [ trans[,1] ] <- df2$Location [ trans[,2] ]
df1
#------
                date.time  code Location
50000 2016-02-19 14:53:58 32221       11
50001 2016-02-19 14:54:05 32222       12
50002 2016-02-19 14:54:09 32221       11
50003 2016-02-19 14:54:17 32222       12
50004 2016-02-19 14:54:19 32222       12
50005 2016-02-19 14:54:21 32221       11

trans可能比我的三重
outer
方法快,但我希望OP将报告
系统时间
-两种方法的结果与他的双for-loop方法相比。如果您能提供一个稍微好一点的测试用例,其中一些I,j组合在时间条件下不满足,那将是理想的。在任何情况下,请报告system.time结果。您可以尝试使用sqldf包,看看是否将df转换为本地数据库,然后对其执行查询有助于提高速度。
require(data.table)

dt1 <- data.table(df1)
dt2 <- data.table(df2)

## need to create a range in dt 1 to find overlaps on
dt1[,start:=date.time]
dt1[,end:=date.time]

## clean up names to match each other
setnames(dt2,c("Location","Code","start","end"))
setnames(dt1,c("code"),c("Code"))

setkey(dt1,Code,start,end)
setkey(dt2,Code,start,end)

## use foverlaps with the additional matching variable Code
out <- foverlaps(dt1,dt2,type="any",
                 by.x=c("Code","start","end"),
                 by.y=c("Code","start","end"))

## more renaming and selection of the same subset of columns
setnames(out,"i.start","date.time")
out <- out[,.(date.time,Code,Location)]
> out
             date.time  Code Location
1: 2016-02-19 14:53:58 32221       11
2: 2016-02-19 14:54:09 32221       11
3: 2016-02-19 14:54:21 32221       11
4: 2016-02-19 14:54:05 32222       12
5: 2016-02-19 14:54:17 32222       12
6: 2016-02-19 14:54:19 32222       12
trans <- which( outer(X=df1$code, Y=df2$Code,'==') & 
                outer(df1$date.time , df2$t_in, ">") & 
                outer(df1$date.time, df2$t_out , "<")  , arr.ind=TRUE)
df1$Location [ trans[,1] ] <- df2$Location [ trans[,2] ]
df1
#------
                date.time  code Location
50000 2016-02-19 14:53:58 32221       11
50001 2016-02-19 14:54:05 32222       12
50002 2016-02-19 14:54:09 32221       11
50003 2016-02-19 14:54:17 32222       12
50004 2016-02-19 14:54:19 32222       12
50005 2016-02-19 14:54:21 32221       11