Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/70.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R:递归*ply/plyr函数;用于环路更换_R_Recursion_Plyr_Apply - Fatal编程技术网

R:递归*ply/plyr函数;用于环路更换

R:递归*ply/plyr函数;用于环路更换,r,recursion,plyr,apply,R,Recursion,Plyr,Apply,我试图用*ply类型的函数替换for循环 我遇到的问题是,我不确定如何重复更新相同的数据 以下是一些示例数据,我知道这个特定示例可以通过其他方式完成,但这只是为了简单起见-我的真实示例要复杂得多: sample_pat_rep <- data.frame(matrix(NA, ncol=2, nrow=3, dimnames=list(c(), c("Pattern","Replacement"))), stringsAsFactors=FALSE) sample_pat_rep[1,]

我试图用*ply类型的函数替换for循环

我遇到的问题是,我不确定如何重复更新相同的数据

以下是一些示例数据,我知道这个特定示例可以通过其他方式完成,但这只是为了简单起见-我的真实示例要复杂得多:

sample_pat_rep <-  data.frame(matrix(NA, ncol=2, nrow=3, dimnames=list(c(), c("Pattern","Replacement"))), stringsAsFactors=FALSE)
sample_pat_rep[1,] <-  c("a","A")
sample_pat_rep[2,] <-  c("b","B")
sample_pat_rep[3,] <-  c("c","C")

sample_strings <-  data.frame(matrix(NA, ncol=2, nrow=3, dimnames=list(c(), c("Original","Fixed"))), stringsAsFactors=FALSE)
sample_strings[1,] <-  c("aaaaaaaa bbbbbbbb cccccccc","aaaaaaaa bbbbbbbb cccccccc")
sample_strings[2,] <-  c("aAaAaAaA bBbBbBbB cCcCcCcC","aAaAaAaA bBbBbBbB cCcCcCcC")
sample_strings[3,] <-  c("AaAaAaAa BbBbBbBb CcCcCcCc","AaAaAaAa BbBbBbBb CcCcCcCc")
以下是for循环版本:

sample_strings1 <- sample_strings
for (i in 1:nrow(sample_pat_rep))
{
  sample_strings1[,c("Fixed")] <- gsub(sample_pat_rep[i,c("Pattern")], sample_pat_rep[i,c("Replacement")], sample_strings1[,c("Fixed")],ignore.case = TRUE)
} 
当我尝试用adply复制此数据时,它不会更新数据—它会复制并重新绑定数据

sample_strings2 <- adply(.data=sample_pat_rep, .margins=1, .fun = function(x,data){

data[,c("Fixed")] <- gsub(x[,c("Pattern")], x[,c("Replacement")], data[,c("Fixed")],ignore.case = TRUE)
return(data)

}, data=sample_strings, .expand = FALSE, .progress = "none", .inform = FALSE, .parallel = FALSE, .paropts = NULL)
我相信有一个简单的解决办法。我看了Rapply,但不清楚这是否是修复方法

也许写一个函数来调用??使用Rapply

提前谢谢

更新:新数据

这更接近实际情况。匹配是动态的,并且基于外部系统。我试图避免过于复杂的正则表达式或嵌套的if-else

library(plyr)

sample_match <-  data.frame(matrix(NA, ncol=1, nrow=3, dimnames=list(c(), c("Match"))), stringsAsFactors=FALSE)
sample_match[1,] <-  c("dog")
sample_match[2,] <-  c("cat")
sample_match[3,] <-  c("bear")

sample_strings <-  data.frame(matrix(NA, ncol=2, nrow=3, dimnames=list(c(), c("Sentence","Has_Animal"))), stringsAsFactors=FALSE)
sample_strings[1,] <-  c("This person only has a cat",0)
sample_strings[2,] <-  c("This person has a cat and a dog",0)
sample_strings[3,] <-  c("This person has no animals",0)

sample_strings1 <- sample_strings
for (i in 1:nrow(sample_match))
{
 sample_strings1[,c("Has_Animal")] <- ifelse(grepl(sample_match[i,c("Match")], sample_strings1[,c("Sentence")]), 1,sample_strings1[,c("Has_Animal")])
} 


sample_strings2 <- adply(.data=sample_match, .margins=1, .fun = function(x,data){

 data[,c("Has_Animal")] <- ifelse(grepl(x[,c("Match")], data[,c("Sentence")]), 1,data[,c("Has_Animal")])
 return(data)

}, data=sample_strings, .expand = FALSE, .progress = "none", .inform = FALSE, .parallel = FALSE, .paropts = NULL)
更新:误解了问题,示例2是要求的结果。现在更新了给出示例1的答案,需要哪个IIUC

下面是一个使用base的解决方案:

如果不希望匹配包含中模式的单词,例如:concatenate contains cat,则可以使用regex\b作为单词边界

pattern = paste(paste("\\b", sample_match$Match, "\\b", sep=""), collapse="|")
grepl(pattern, c("cat", "concatenate"))
# [1] TRUE FALSE
以下是一种直接的plyr方法:

ddply(sample_strings,.(Sentence),function(x,ref = sample_match) {
  any(unlist(strsplit(x[["Sentence"]]," ")) %in% ref[[1]])
  })

                         Sentence    V1
1 This person has a cat and a dog  TRUE
2      This person has no animals FALSE
3      This person only has a cat  TRUE

忍不住要问:你不能使用toupper?我可以在这个例子中使用,但我的真实案例与gsub没有任何关系。这只是我想到的第一件事。它也与gsub无关?嗯,你能给我们一个实际问题的例子吗?阿伦,我很感激这一点,但这和我的adply函数是一样的。我正在寻找一个最终的数据集,它是3行,这三行数据集正在更新,而不是每次追加3行数据集。例如,For循环答案有3行。谢谢Arun。这真的很有帮助。谢谢安德鲁。我真的很感激。
ddply(sample_strings,.(Sentence),function(x,ref = sample_match) {
  any(unlist(strsplit(x[["Sentence"]]," ")) %in% ref[[1]])
  })

                         Sentence    V1
1 This person has a cat and a dog  TRUE
2      This person has no animals FALSE
3      This person only has a cat  TRUE