Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/65.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 高效搜索_R_Search - Fatal编程技术网

R 高效搜索

R 高效搜索,r,search,R,Search,我已经编写了以下代码 a<-matrix(0,1,nrow = nrow(x)) for(i in 1:nrow(x)){ for(j in 1:nrow(y)){ if((y[j,3] > x[i,2]) & (y[j,2] == x[i,1])){ a[i,]<- y[j,4] i<- i+1 } } y看起来像 y1 y2 y3 y4 1 401 10 22.152 2 401 40 167.

我已经编写了以下代码

a<-matrix(0,1,nrow = nrow(x))
for(i in 1:nrow(x)){
  for(j in 1:nrow(y)){
   if((y[j,3] > x[i,2]) & (y[j,2] == x[i,1])){
     a[i,]<- y[j,4]
     i<- i+1
   }
  }
y看起来像

 y1 y2   y3    y4
  1 401 10  22.152
  2 401 40  167.986
  3 401 70  393.198
  4 401 100 923
  5 401 120 923
  6 401 140 686.712
  7 401 160 865.774...
我想成为:

其中nrowy>nrowx为真。有没有可能提高效率?

使用data.table,我们从“x”和“y”在“x1”、“y1”上联接,将i指定为y3>x2,按“x1”和“x2”分组,得到第一行

library(data.table)
setDT(x)[y, on = .(x1 = y2), allow.cartesian = TRUE][y3 > x2, head(.SD, 1) , .(x1, x2)]
#    x1  x2 y1  y3      y4
#1: 401   4  1  10  22.152
#2: 401  38  2  40 167.986
#3: 401 142  7 160 865.774
数据
我们可以使用dplyr合并然后过滤

library(dplyr)

left_join(x, y, by = c("x1" = "y2")) %>% 
  filter(y3 > x2) %>% 
  arrange(y3) %>% 
  group_by(x1, x2) %>% 
  slice(1) 

# Source: local data frame [3 x 5]
# Groups: x1, x2 [3]
# 
#      x1    x2    y1    y3      y4
#   <int> <int> <int> <int>   <dbl>
# 1   401     4     1    10  22.152
# 2   401    38     2    40 167.986
# 3   401   142     7   160 865.774
数据
你能举一些小例子来说明x和y中的数据是什么样子的吗?这会显示代码的实际功能吗?@Marius清楚吗?当nrowy>nrowx时,for循环如何工作?我停止代码并检查值,0被我想要的数字填充,但这花费了太多时间。我在filter_impl.data中得到以下错误,圆点:不正确的长度37286,预期为:2997729@Jamil对于提供的示例数据,没有错误。请提供将复制错误的数据。改为使用read.csv是否会产生影响?另外,我不是以文本形式阅读。@Jamil您的工作区中应该已经有了x和y。我正在使用read.tabletext=。。。所以我有x和y进行测试。列名称不一致是问题所在,您的代码工作得很好!谢谢
library(data.table)
setDT(x)[y, on = .(x1 = y2), allow.cartesian = TRUE][y3 > x2, head(.SD, 1) , .(x1, x2)]
#    x1  x2 y1  y3      y4
#1: 401   4  1  10  22.152
#2: 401  38  2  40 167.986
#3: 401 142  7 160 865.774
x <- read.table(text = "
  x1  x2
401 4
401 38
401 142", header = TRUE)

y <- read.table(text = "
y1 y2   y3    y4
1 401 10  22.152
2 401 40  167.986
3 401 70  393.198
4 401 100 923
5 401 120 923
6 401 140 686.712
7 401 160 865.774", header = TRUE)
library(dplyr)

left_join(x, y, by = c("x1" = "y2")) %>% 
  filter(y3 > x2) %>% 
  arrange(y3) %>% 
  group_by(x1, x2) %>% 
  slice(1) 

# Source: local data frame [3 x 5]
# Groups: x1, x2 [3]
# 
#      x1    x2    y1    y3      y4
#   <int> <int> <int> <int>   <dbl>
# 1   401     4     1    10  22.152
# 2   401    38     2    40 167.986
# 3   401   142     7   160 865.774
x <- read.table(text = "
  x1  x2
401 4
401 38
401 142", header = TRUE)

y <- read.table(text = "
y1 y2   y3    y4
1 401 10  22.152
2 401 40  167.986
3 401 70  393.198
4 401 100 923
5 401 120 923
6 401 140 686.712
7 401 160 865.774", header = TRUE)