R 如何匹配最长的匹配字符串
我有字符串和字符向量。我想找到所有字符串在字符向量匹配尽可能多的字符从字符串开始。 例如:R 如何匹配最长的匹配字符串,r,R,我有字符串和字符向量。我想找到所有字符串在字符向量匹配尽可能多的字符从字符串开始。 例如: s <- "abs" vc <- c("ab","bb","abc","acbd","dert") result <- c("ab","abc") s另一种解释: s <- "abs" # Updated vc vc <- c("ab","bb","abc","acbd","dert","abwabsabs") st <- strsplit(s, "")[[1]]
s <- "abs"
vc <- c("ab","bb","abc","acbd","dert")
result <- c("ab","abc")
s另一种解释:
s <- "abs"
# Updated vc
vc <- c("ab","bb","abc","acbd","dert","abwabsabs")
st <- strsplit(s, "")[[1]]
mtc <- sapply(strsplit(substr(vc, 1, nchar(s)), ""),
function(i) {
m <- i == st[1:length(i)]
sum(m * cumsum(m))})
vc[mtc == max(mtc)]
#[1] "ab" "abc" "abwabsabs"
# Another vector vc
vc <- c("ab","bb","abc","acbd","dert","absq","abab")
....
vc[mtc == max(mtc)]
#[1] "absq"
这里有一个函数,它使用grep
检查给定字符串s
是否与vc
中任何字符串的开头匹配,递归地从s
的结尾删除一个字符:
myfun <- function(s, vc) {
notDone <- TRUE
maxChar <- max(nchar(vc)) # EDIT: these two lines truncate s to
s <- substr(s, 1, maxChar) # the maximum number of chars in vc
subN <- nchar(s)
while(notDone & subN > 0){
ss <- substr(s, 1, subN)
ans <- grep(sprintf("^%s", ss), vc, val = TRUE)
if(length(ans)) {
notDone <- FALSE
} else {
subN <- subN - 1
}
}
return(ans)
}
s <- "abs"
# Updated vc from @Julius's answer
vc <- c("ab","bb","abc","acbd","dert","absq","abab")
> myfun(s, vc)
[1] "absq"
# And there's no infinite recursion if there's no match
> myfun("q", "a")
character(0)
myfun只是一个注释,在事实发生很久之后,这个包现在已经存在了;查找最长或部分匹配非常、非常有效且用户友好。s是字符串,vc是字符向量,结果是本例数据的预期结果HM,我认为在vc
中将s
截断为最大nchar
也有意义。s和vc都可以截断为pmin(长度,最大(vc)).+1您的解决方案是正确的,但我想知道是否有可能使用正则表达式解决此问题。@WojciechSobala,我的直觉告诉我截断vc
没有好处,但是如果s
比max(char(vc))
长得多,截断s
将加快此函数的速度(请参见上面的编辑)<代码>grep
确实使用正则表达式!在我的数据集中,我必须将~60个字符串中的每一个与长度~500的向量进行比较,因此速度不是问题。是的,您的函数使用grep,但我想知道是否可以编写单模式字符串。@WojciechSobala,刚刚添加了一个正则表达式解决方案。是的,这就是我要找的。我认为模式字符串可以简化为gsub(“()”,“\\1?”,s)。@WojciechSobala,似乎不是这样,尝试vc获取我的数据它可以工作,但一般情况下你是对的。我认为paste0(gsub(“(.”,“(\\1”,s),gsub(“.”,“?)”,s))应该更好。
myfun <- function(s, vc) {
notDone <- TRUE
maxChar <- max(nchar(vc)) # EDIT: these two lines truncate s to
s <- substr(s, 1, maxChar) # the maximum number of chars in vc
subN <- nchar(s)
while(notDone & subN > 0){
ss <- substr(s, 1, subN)
ans <- grep(sprintf("^%s", ss), vc, val = TRUE)
if(length(ans)) {
notDone <- FALSE
} else {
subN <- subN - 1
}
}
return(ans)
}
s <- "abs"
# Updated vc from @Julius's answer
vc <- c("ab","bb","abc","acbd","dert","absq","abab")
> myfun(s, vc)
[1] "absq"
# And there's no infinite recursion if there's no match
> myfun("q", "a")
character(0)