Regex R:如何在元素上的向量中向外搜索正则表达式?
在R中是否可以像所有元素都是折叠的单个元素一样在向量中搜索正则表达式?如果我们将所有元素合并为一个元素来执行此操作,那么在搜索之后就不可能将它们恢复为元素形式 这是一个向量Regex R:如何在元素上的向量中向外搜索正则表达式?,regex,r,Regex,R,在R中是否可以像所有元素都是折叠的单个元素一样在向量中搜索正则表达式?如果我们将所有元素合并为一个元素来执行此操作,那么在搜索之后就不可能将它们恢复为元素形式 这是一个向量 vector<-c("I", "met", "a", "cow") 有可能这样做吗?请提供帮助。基于Carl Witthoft的评论,我的解决方案不是使用正则表达式,而是使用基本匹配: # A slightly longer vector v = c("I", "met", "a", "cow", "today",
vector<-c("I", "met", "a", "cow")
有可能这样做吗?请提供帮助。基于Carl Witthoft的评论,我的解决方案不是使用正则表达式,而是使用基本匹配:
# A slightly longer vector
v = c("I", "met", "a", "cow", "today",
"You", "met", "a", "cow", "today")
# Create the combinations of each pair
temp1 = sapply(1:(length(v)-1),
function(x) paste0(v[x], v[x+1]))
# Grab the index of the desired search term
temp2 = which(temp1 %in% "meta")
# The following also works.
# Don't know what's faster/better.
# temp2 = grep("meta", temp1)
# Do some manual substitution and deletion
v[temp2] <- "meta"
v <- v[-(temp2+1)]
#稍微长一点的向量
v=c(“我”、“遇见”、“a”、“奶牛”、“今天”,
“你”、“遇见”、“a”、“奶牛”、“今天”)
#创建每对的组合
temp1=sapply(1:(长度(v)-1),
函数(x)0(v[x],v[x+1]))
#获取所需搜索词的索引
temp2=哪个(temp1%在%“meta”中)
#下面的方法也适用。
#不知道什么更快/更好。
#temp2=grep(“meta”,temp1)
#做一些手动替换和删除
v[temp2]如果只合并完整的元素,您可以尝试以下方法:
mergeRegExpr <- function(x, pattern) {
str <- paste(x, sep="", collapse="")
## find starting position of each word
wordStart <- head(cumsum(c(1, nchar(x))), -1)
## look for pattern
rx <- regexpr(pattern=pattern, text=str, fixed=TRUE)
## pos of matching pattern == rx+nchar(pattern)-1
rxEnd <- rx+attr(rx, "match.length")-1
## which vector elements doesn't match pattern
sel <- wordStart < rx | wordStart > rxEnd
## insert merged elements
return(append(x[sel], paste(x[!sel], collapse=""), rx-1))
}
vector <- c("I", "met", "a", "cow")
mergeRegExpr(vector, "meta")
# "I" "meta" "cow"
mergeRegExpr(vector, "acow")
# "I" "met" "acow"
mergeRegExpr(vector, "Imeta")
# "Imeta" "cow"
## partial matching doesn't work
mergeRegExpr(vector, "taco")
# "I" "metacow"
mergeRegExpr如果您想要与“meta”
匹配但不与“taco”
匹配的内容,这将实现以下功能:
myFun <- function(vector, word) {
D <- "UnLiKeLyStRiNg"
## Construct a string on which you'll perform regex-search
xx <- paste0(paste0(D, vector, collapse=""), D)
## Construct the regex pattern
start <- paste0("(?<=", D, ")")
mid <- paste0(strsplit(word, "")[[1]], collapse=paste0("(", D, ")?"))
end <- paste0("(?=", D, ")")
pat <- paste0(start, mid, end)
## Use it
strsplit(gsub(pat, word, xx, perl=TRUE), D)[[1]][-1]
}
vector <- c("I", "met", "a", "cow")
myFun(vector, "meta")
# [1] "I" "meta" "cow"
myFun(vector, "taco")
# [1] "I" "met" "a" "cow"
myFun(vector, "Imet")
# [1] "Imet" "a" "cow"
myFun(vector, "Ime")
# [1] "I" "met" "a" "cow"
myFun为了澄清,您需要:首先,搜索字符串是否存在,然后,返回一个包含合并字符串的新向量。这是否正确?另外,搜索是否总是跨越源中的两个或多个完整字符串?也就是说,您不会搜索“taco”(可以从“met”的最后一个字母开始找到它)?@mrdwab提出了一个很好的观点:如果您想找到“taco”,那么您将有“剩余”字符串,并且您还没有说您将如何处理它们。因此,如果您只想拟合完整的字符串,而不是折叠整个列表,只需成对折叠,例如paste(vector[j],vector[j+1],collapse='')
并对其执行regexp。事实上,如果您希望“taco”的结果为“metacow”,则对结果向量稍作修改仍然可以满足您的要求。@mrdwab很抱歉,我没有提到这一点,搜索总是跨越完整的字符串。谢谢。很明显,部分匹配确实有效,因为“I metacow”是一个很棒的句子。再次感谢你的帮助,Josh。得到了我想要的。
myFun <- function(vector, word) {
D <- "UnLiKeLyStRiNg"
## Construct a string on which you'll perform regex-search
xx <- paste0(paste0(D, vector, collapse=""), D)
## Construct the regex pattern
start <- paste0("(?<=", D, ")")
mid <- paste0(strsplit(word, "")[[1]], collapse=paste0("(", D, ")?"))
end <- paste0("(?=", D, ")")
pat <- paste0(start, mid, end)
## Use it
strsplit(gsub(pat, word, xx, perl=TRUE), D)[[1]][-1]
}
vector <- c("I", "met", "a", "cow")
myFun(vector, "meta")
# [1] "I" "meta" "cow"
myFun(vector, "taco")
# [1] "I" "met" "a" "cow"
myFun(vector, "Imet")
# [1] "Imet" "a" "cow"
myFun(vector, "Ime")
# [1] "I" "met" "a" "cow"