R 如何替换字符串中的匹配项并为每个匹配项编制索引_R_String_Substring_Stringr_Stringi

R 如何替换字符串中的匹配项并为每个匹配项编制索引

r string

R 如何替换字符串中的匹配项并为每个匹配项编制索引,r,string,substring,stringr,stringi,R,String,Substring,Stringr,Stringi,一个特定字符串可以包含我试图匹配的模式的多个实例。例如，如果我的模式为，字符串为，则有两个匹配项。我想用一个替换项替换每个匹配项，该替换项包括要替换的匹配项的索引所以在我的字符串中，我的名字是，他的名字是，我想将字符串改为我的名字是[name 1]，他的名字是[name 2] 我如何实现这一点，最好是使用单个函数？最好使用stringr或stringi中的函数？这里有一个依赖于gsubfn和proto包的解决方案定义将应用函数的字符串我的字符串这里有一个依赖于gsubfn和proto包的

一个特定字符串可以包含我试图匹配的模式的多个实例。例如，如果我的模式为，字符串为，则有两个匹配项。我想用一个替换项替换每个匹配项，该替换项包括要替换的匹配项的索引

所以在我的字符串中，我的名字是，他的名字是，我想将字符串改为我的名字是[name 1]，他的名字是[name 2]

我如何实现这一点，最好是使用单个函数？最好使用stringr或stringi中的函数？

这里有一个依赖于gsubfn和proto包的解决方案

定义将应用函数的字符串

我的字符串这里有一个依赖于gsubfn和proto包的解决方案

定义将应用函数的字符串 my_string您可以使用基本R中的gregexpr和regmatches执行此操作：

my_string = "My name is <N Timon N> and his name is <N Pumba N>"

# Get the positions of the matches in the string
m = gregexpr("<N(.+?)N>", my_string, perl = TRUE)

# Index each match and replace text using the indices
match_indices = 1:length(unlist(m))

regmatches(my_string, m) = list(paste0("[Name #", match_indices, "]"))

注:

如果同一匹配项出现多次，则此解决方案将其视为不同的名称。例如：

my_string = "My name is <N Timon N> and his name is <N Pumba N>, <N Timon N> again"


m = gregexpr("<N(.+?)N>", my_string, perl = TRUE)

match_indices = 1:length(unlist(m))

regmatches(my_string, m) = list(paste0("[Name #", match_indices, "]"))

string = "My name is <N Timon N> and his name is <N Pumba N>, <N Timon N> again"

string %>%
  str_extract_all("<N(.+?)N>") %>%
  unlist() %>%
  setNames(paste0("[Name #", 1:length(.), "]"), .) %>%
  str_replace_all(string, .)

可以使用基本R中的gregexpr和regmatches执行此操作：

my_string = "My name is <N Timon N> and his name is <N Pumba N>"

# Get the positions of the matches in the string
m = gregexpr("<N(.+?)N>", my_string, perl = TRUE)

# Index each match and replace text using the indices
match_indices = 1:length(unlist(m))

regmatches(my_string, m) = list(paste0("[Name #", match_indices, "]"))

注:

如果同一匹配项出现多次，则此解决方案将其视为不同的名称。例如：

my_string = "My name is <N Timon N> and his name is <N Pumba N>, <N Timon N> again"


m = gregexpr("<N(.+?)N>", my_string, perl = TRUE)

match_indices = 1:length(unlist(m))

regmatches(my_string, m) = list(paste0("[Name #", match_indices, "]"))

string = "My name is <N Timon N> and his name is <N Pumba N>, <N Timon N> again"

string %>%
  str_extract_all("<N(.+?)N>") %>%
  unlist() %>%
  setNames(paste0("[Name #", 1:length(.), "]"), .) %>%
  str_replace_all(string, .)

简单，也许慢，但应该可以：

ct <- 1
while(TRUE) {
 old_string <- my_string; 
 my_string <- stri_replace_first_regex(my_string, '\\<N.*?N\\>', 
       paste0('[name', ct, ,']')); 
  if (old_string == my_string) break 
  ct <- ct + 1
}

简单，也许慢，但应该可以：

ct <- 1
while(TRUE) {
 old_string <- my_string; 
 my_string <- stri_replace_first_regex(my_string, '\\<N.*?N\\>', 
       paste0('[name', ct, ,']')); 
  if (old_string == my_string) break 
  ct <- ct + 1
}

以下是使用dplyr+stringr的不同方法：

产出：

> my_string
[1] "My name is [Name #1] and his name is [Name #2], [Name #3] again"

[1] "My name is [Name #1] and his name is [Name #2], [Name #1] again"

以下是使用dplyr+stringr的不同方法：

产出：

> my_string
[1] "My name is [Name #1] and his name is [Name #2], [Name #3] again"

[1] "My name is [Name #1] and his name is [Name #2], [Name #1] again"

你可能会对glue软件包感兴趣：它的语法与Hiya类似，我是{Timon}，如果我正确理解了vignette，我自己从来没有使用过它。谢谢-我认为这是一个很好的建议。我很想用胶水来做这个，但我还没弄明白。我想计数部分会有点难用胶水，胶水sub，{\\1}，我的字符串，Timon=[Name 1]，Pumba=[Name 2]我的名字是[Name 1]，他的名字是[Name 2]，你可能会对胶水包感兴趣：它有类似Hiya的语法，我是{Timon}，如果我正确理解了这个小插曲，我自己从来没有用过。谢谢-我认为这是一个好建议。我很想用胶水来做这个，但我还没弄明白。我想计数部分会有点难，gluegsub，{\\1}，我的字符串，Timon=[Name 1]，Pumba=[Name 2]我的名字是[Name 1]，他的名字是[Name 2]。@用户的解决方案更好！此方法适用于问题中的特定示例，但是否有效取决于正则表达式和替换。例如，如果正则表达式类似于空格或单词边界，例如\\w+，而替换项没有删除匹配项，则用户将陷入无休止的循环中。@用户的解决方案更好！此方法适用于问题中的特定示例，但是否有效取决于正则表达式和替换。例如，如果正则表达式类似于空格或单词边界，例如\\w+，而替换项没有删除匹配项，则用户将陷入一个无休止的循环中。@BIQS感谢您的编辑，但我更喜欢在这些类型的问题中保持简单。我想我们不同意什么是最简单的，但你当然有权选择你的答案。我喜欢你的方法，并且可能会接受它作为答案，这取决于其他的方法。@BIQ比创建更少的中间变量更简单，这会使我的工作区变得混乱。我同意你的编辑可能更具可读性，但不一定足够简单。我认为最后的答案简单易懂，所以我很满意。我喜欢它。你能把这个作为一个单独的答复提交吗？这是一种完全不同的方法，它将给出与常见用例中的原始答案不同的结果。例如，这两种方法将为字符串My name is和his name is产生不同的结果。基本R方法将产生我的名字是[name 1]，他的名字也是[name 2]。虽然tidyverse方法会产生我的名字是[name 1]，他的名字也是[name 1]。@BIQS感谢您的编辑，但我更喜欢在这些类型的问题中保持简单。我认为我们不同意什么是最简单的，但您当然有权选择答案。我喜欢你的方法，并且可能会接受它作为答案，这取决于其他的方法。@BIQ比创建更少的中间变量更简单，这会使我的工作区变得混乱。我同意你的编辑可能更具可读性，但不一定足够简单。我认为最后的答案简单易懂，所以我很满意。我喜欢它。你能把这个作为一个单独的答复提交吗？这是一种完全不同的方法，它将给出与常见用例中的原始答案不同的结果。例如，这两种方法将为字符串My name is和his name is产生不同的结果。基本R方法将产生我的名字是[name 1]，他的名字也是[name 2]。而tidyverse方法会产生我的名字是[name 1]和他的名字也是[Name 1]。