R 如何匹配最长的匹配字符串

R 如何匹配最长的匹配字符串,r,R,我有字符串和字符向量。我想找到所有字符串在字符向量匹配尽可能多的字符从字符串开始。 例如: s <- "abs" vc <- c("ab","bb","abc","acbd","dert") result <- c("ab","abc") s另一种解释: s <- "abs" # Updated vc vc <- c("ab","bb","abc","acbd","dert","abwabsabs") st <- strsplit(s, "")[[1]]

我有字符串和字符向量。我想找到所有字符串在字符向量匹配尽可能多的字符从字符串开始。 例如:

s <- "abs"
vc <- c("ab","bb","abc","acbd","dert")

result <- c("ab","abc")

s另一种解释:

s <- "abs"
# Updated vc
vc <- c("ab","bb","abc","acbd","dert","abwabsabs")

st <- strsplit(s, "")[[1]]
mtc <- sapply(strsplit(substr(vc, 1, nchar(s)), ""), 
              function(i) {
                m <- i == st[1:length(i)]
                sum(m * cumsum(m))})

vc[mtc == max(mtc)]
#[1] "ab"        "abc"       "abwabsabs"

# Another vector vc
vc <- c("ab","bb","abc","acbd","dert","absq","abab")
....
vc[mtc == max(mtc)]
#[1] "absq"

这里有一个函数,它使用
grep
检查给定字符串
s
是否与
vc
中任何字符串的开头匹配,递归地从
s
的结尾删除一个字符:

myfun <- function(s, vc) {
  notDone <- TRUE
  maxChar <- max(nchar(vc))  # EDIT: these two lines truncate s to
  s <- substr(s, 1, maxChar) # the maximum number of chars in vc
  subN <- nchar(s)
  while(notDone & subN > 0){
    ss <- substr(s, 1, subN)
    ans <- grep(sprintf("^%s", ss), vc, val = TRUE)
    if(length(ans)) {
      notDone <- FALSE
    } else {
      subN <- subN - 1
    }
  }
  return(ans)
}

s <- "abs"
# Updated vc from @Julius's answer
vc <- c("ab","bb","abc","acbd","dert","absq","abab")

> myfun(s, vc)
[1] "absq"

# And there's no infinite recursion if there's no match
> myfun("q", "a")
character(0)

myfun只是一个注释,在事实发生很久之后,这个包现在已经存在了;查找最长或部分匹配非常、非常有效且用户友好。

s是字符串,vc是字符向量,结果是本例数据的预期结果HM,我认为在
vc
中将
s
截断为最大
nchar
也有意义。s和vc都可以截断为pmin(长度,最大(vc)).+1您的解决方案是正确的,但我想知道是否有可能使用正则表达式解决此问题。@WojciechSobala,我的直觉告诉我截断
vc
没有好处,但是如果
s
max(char(vc))
长得多,截断
s
将加快此函数的速度(请参见上面的编辑)<代码>grep
确实使用正则表达式!在我的数据集中,我必须将~60个字符串中的每一个与长度~500的向量进行比较,因此速度不是问题。是的,您的函数使用grep,但我想知道是否可以编写单模式字符串。@WojciechSobala,刚刚添加了一个正则表达式解决方案。是的,这就是我要找的。我认为模式字符串可以简化为gsub(“()”,“\\1?”,s)。@WojciechSobala,似乎不是这样,尝试
vc获取我的数据它可以工作,但一般情况下你是对的。我认为paste0(gsub(“(.”,“(\\1”,s),gsub(“.”,“?)”,s))应该更好。
myfun <- function(s, vc) {
  notDone <- TRUE
  maxChar <- max(nchar(vc))  # EDIT: these two lines truncate s to
  s <- substr(s, 1, maxChar) # the maximum number of chars in vc
  subN <- nchar(s)
  while(notDone & subN > 0){
    ss <- substr(s, 1, subN)
    ans <- grep(sprintf("^%s", ss), vc, val = TRUE)
    if(length(ans)) {
      notDone <- FALSE
    } else {
      subN <- subN - 1
    }
  }
  return(ans)
}

s <- "abs"
# Updated vc from @Julius's answer
vc <- c("ab","bb","abc","acbd","dert","absq","abab")

> myfun(s, vc)
[1] "absq"

# And there's no infinite recursion if there's no match
> myfun("q", "a")
character(0)