Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/jsf-2/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 有没有办法优化这段代码,因为执行这段代码需要几个小时_R - Fatal编程技术网

R 有没有办法优化这段代码,因为执行这段代码需要几个小时

R 有没有办法优化这段代码,因为执行这段代码需要几个小时,r,R,我有两个数据帧,我想根据第二个数据帧中的存在情况对列的每个元素进行分类 我的数据如下所示: for (i in 1:99653) { for(j in 1:3226) { if (grepl(cdata$LegDigitsDialed[i],sdata$SavedPhone[j]) == TRUE) { cdata$category[i] = "Supplier" cdata$su_name[i] = sdata$su

我有两个数据帧,我想根据第二个数据帧中的存在情况对列的每个元素进行分类

我的数据如下所示:

for (i in 1:99653)
{
  for(j in 1:3226)
    {
    if (grepl(cdata$LegDigitsDialed[i],sdata$SavedPhone[j]) == TRUE)

        {
          cdata$category[i] = "Supplier"
          cdata$su_name[i] = sdata$sushortname[j]
        }

      else
        {
          cdata$category[i] = "Customer"
          cdata$su_name[i] = "Null"      
        }

    }
}
我想要的是

>cdata
LegDigitsDialed
"a"
"b"
"c"


>sdata
SavedPhone
"aa"
"c"
所以基本上我的伪代码是

LegDigitsDialed     category
"a"                 "Supplier"
"b"                 "Customer"
"c"                 "Supplier"

for(i=1,i如上所述,您的代码可能有问题,但已经回答了问题的“如何加速”部分:

您可以摆脱
for
循环(如果您得到
if
问题的答案,速度可能会快上千倍)

for (i=1,i<100000,i++)   for(j=1,j<3500,j++)
      {
        if (j contains i) //partial string matching
            populate i(different column) with some value
        else
            populate i(different column) with some other value
      }

如果要在不同的列和行中搜索一个字符串匹配项,并在原始行索引中另外保存此匹配项的结果,以下操作可能会有所帮助:

         a     b
[1,]  TRUE  TRUE
[2,]  TRUE FALSE
[3,] FALSE  TRUE
[4,] FALSE FALSE
库(dplyr)
#生成示例数据
cdata%
做({
#初始化不匹配的值

类别首先,生成sdata的副本数据帧以添加额外的列

library(dplyr)
# generate example data
cdata <- data.frame(SavedPhone = c("a_a", "a_b", "a_a", "x_y"),
                LegDigitsDialed = c("a", "b", "c", "a"),
                sushortname = c("Max", "Moritz", "Something", "Max"),
                stringsAsFactors=F)

# run one loop within `dplyr`
cdata %>% 
  do({

    # initialize no match values
    category <- rep("Customer", nrow(.))
    su_name <- rep("NULL", nrow(.))

    # loop through `LegDigitsDialed` column
    for(idx in 1:nrow(.)) {

      # find matching index if possible
      search_idx <- which(grepl(.$LegDigitsDialed[idx], .$SavedPhone)==T)

      # overwrite default value
      category[search_idx] <- "Supplier"
      su_name[search_idx] <- .$SavedPhone[search_idx]
    }

    # return data frame
    data.frame(category=category, su_name=su_name, 
               LegDigitsDialed=.$LegDigitsDialed, SavedPhone=.$SavedPhone,
               stringsAsFactors=F)
  })
lappy
用于所有元素的迭代,而
pmatch
用于部分匹配


请让我知道结果。

你能不能让它成为一个可复制的示例,这样我们就可以帮助你?(即,如果我们可以运行你的代码并看到问题,这会有所帮助)你可以使用dput()函数。所以:如果我理解正确,你的代码可能没有执行你希望它执行的操作。它会覆盖
cdata$category[i]
例如3226次,即只写入最后一个值。您在这里到底想做什么?
library(dplyr)
# generate example data
cdata <- data.frame(SavedPhone = c("a_a", "a_b", "a_a", "x_y"),
                LegDigitsDialed = c("a", "b", "c", "a"),
                sushortname = c("Max", "Moritz", "Something", "Max"),
                stringsAsFactors=F)

# run one loop within `dplyr`
cdata %>% 
  do({

    # initialize no match values
    category <- rep("Customer", nrow(.))
    su_name <- rep("NULL", nrow(.))

    # loop through `LegDigitsDialed` column
    for(idx in 1:nrow(.)) {

      # find matching index if possible
      search_idx <- which(grepl(.$LegDigitsDialed[idx], .$SavedPhone)==T)

      # overwrite default value
      category[search_idx] <- "Supplier"
      su_name[search_idx] <- .$SavedPhone[search_idx]
    }

    # return data frame
    data.frame(category=category, su_name=su_name, 
               LegDigitsDialed=.$LegDigitsDialed, SavedPhone=.$SavedPhone,
               stringsAsFactors=F)
  })
new.sdata <- sdata
new.sdata$category <- "Supplier"
cdata$category <- lapply(cdata$LegDigitsDialed, function(x) new.sdata$category[pmatch(x, sdata$SavedPhone)])
cdata$su_name <- lapply(cdata$LegDigitsDialed, function(x) sdata$sushortname[pmatch(x, sdata$SavedPhone)])
cdata$category[is.na(cdata$category)] = "Customer"
cdata$su_name[is.na(cdata$su_name)] = "Null"