R 从字符串中提取函数实例和参数_R_Regex

R 从字符串中提取函数实例和参数

r regex

R 从字符串中提取函数实例和参数,r,regex,R,Regex,我想要一个能够在字符串中查找函数实例、提取原始参数并用占位符替换它们的函数。不幸的是，我的正则表达式技能并没有让我走得更远我想要以下行为： extract_fun("max(7*xy,b=z)+maximum+max(j)",fun="max") # $modified_string # [1] "{F[[1]]}+maximum+{F[[2]]}" # # $params # $params[[1]] # [1] "7*xy" "b=z" # # $params[[2]] # [1]

我想要一个能够在字符串中查找函数实例、提取原始参数并用占位符替换它们的函数。不幸的是，我的正则表达式技能并没有让我走得更远

我想要以下行为：

extract_fun("max(7*xy,b=z)+maximum+max(j)",fun="max")
# $modified_string
# [1] "{F[[1]]}+maximum+{F[[2]]}"
# 
# $params
# $params[[1]]
# [1] "7*xy" "b=z" 
# 
# $params[[2]]
# [1] "j"

编辑：

更复杂的用例：

extract_fun("max(7*xy,b=min(1,3))+maximum+max(j)",fun="max")
    # $modified_string
    # [1] "{F[[1]]}+maximum+{F[[2]]}"
    # 
    # $params
    # $params[[1]]
    # [1] "7*xy" "b=min(1,3)" 
    # 
    # $params[[2]]
    # [1] "j"

以下是一些让您开始学习的内容：

您的函数应该有两个参数：

 fun = "max"
 string = "max(7*xy,b=z)+maximum+max(j)"

正则表达式捕获

（

，

）

中的任何内容，前面是

fun

，它是惰性的

？

regex = paste0(fun, "\\((.*?)\\)")
regex
#output
"max\\((.*?)\\)"

matcher = stringr::str_match_all(string, regex)
matcher = do.call(rbind, matcher)
matcher
#output
     [,1]         [,2]      
[1,] "(7*xy,b=z)" "7*xy,b=z"
[2,] "(j)"        "j"       

#extract arguments from captured groups in matcher[,2]
params = strsplit(matcher[,2], " {0,}, {0,}" ) #, with possible white spaces before and after
#output
[[1]]
[1] "7*xy" "b=z" 

[[2]]
[1] "j"

#generate a modified_string
Fs = 1:nrow(matcher)
replacer = paste0("{F[[", Fs, "]]}")
regex2 = paste(matcher[,1])
out =  string
for (i in 1:length(replacer)){
  out= gsub(regex2[i], replacer[i], out , fixed = TRUE)
}
out
#output
"{F[[1]]}+maximum+{F[[2]]}"

编辑：以下是我到目前为止对更新问题的看法：

我的想法是将字符串的一部分与感兴趣的函数隔离，而不是仅操纵这一部分

string = "max(7*xy,b=min(1,3))+maximum+max(j)"

在

max（

把它清理一下

opening = lapply(opening, function(x){
  return((x[,1]))
})
ending = lapply(ending, function(x){
  return((x[,1]))
})

找出在什么位置结束括号的数量等于打开括号的数量。我们对第一个匹配感兴趣

out = list()
for (i in 1: length(ending)){
  end = ending[[i]]
  open = opening[[i]]
  sumer = vector()
  for(z in end){
    sumi=  sum(open < z) == sum(end<=z)
    sumer = c(sumer, sumi)
  }
  out[[i]] = sumer
  }

spliter_end = purrr::map2(ending, out, function(x, y){
  return(x[y])
})

让我们举一个更卑鄙的例子

string2 = "max(7*xy,b=min(1,3),z=sum(x*y)),mean(x+y)+maximum+max(j)"
  #copy above code with `string2` instead of `string`
fun_isolate
[[1]]
[1] "7*xy,b=min(1,3),z=sum(x*y)"

[[2]]
[1] "j"

甚至更强硬：

string3 = "max(7*xy,b=min(1,3, head(z)),z=sum(x*y+mean(x+y))),mean(x+y)+maximum+max(j)"

#output
[[1]]
[1] "7*xy,b=min(1,3, head(z)),z=sum(x*y+mean(x+y))"

[[2]]
[1] "j"

现在只需要在

处拆分，

不被

（

）

包围

如果在其内部调用相同的函数，则它将不起作用，如：

"max(7*xy,b=min(1,3),z=max(x*y)),mean(x+y)+maximum+max(j)"

但是，如果您在第一个strsplit之前排除所有的

（

）

，就像

，

示例中那样，那么即使这样也可以管理

通过以下方式进行测试：

"max(7*xy,b=min(1,3, head(z)),z=sum(x*y+mean(x+y))),mean(x+y)+maximum+max(j)"
"max(7*xy,b=min(1,3, head(z)),z=sum(x*y+mean(x+y))),mean(x+y)+maximum+max(j*z+sum(a*b^sum(z)), drop = 72)"
"max(7*xy,b=min(1,3, head(z)),z=sum(x*y, mean(x+y))),mean(x+y)+maximum+max(j*z+sum(a*b^sum(z)), drop = 72)"