R：在正则表达式中使用\\b和\\b_R_Regex_String

R：在正则表达式中使用\\b和\\b

r regex string

R：在正则表达式中使用\\b和\\b,r,regex,string,R,Regex,String,我读到关于regex的文章，就跨越了词界。我找到了一个关于\b和\b之间差异的答案。使用此问题中的代码不会给出预期的输出。在这里： grep("\\bcat\\b", "The cat scattered his food all over the room.", value= TRUE) # I expect "cat" but it returns the whole string. grep("\\B-\\B", "Please enter the nine-digit id as it

我读到关于regex的文章，就跨越了词界。我找到了一个关于

\b

和

\b

之间差异的答案。使用此问题中的代码不会给出预期的输出。在这里：

grep("\\bcat\\b", "The cat scattered his food all over the room.", value= TRUE)
# I expect "cat" but it returns the whole string.

grep("\\B-\\B", "Please enter the nine-digit id as it appears on your color - coded pass-key.", value= TRUE)
# I expect "-" but it returns the whole string.

我使用问题中描述的代码，但建议使用两个反斜杠。使用一个反斜杠也不起作用。我做错了什么？

grep

返回整个字符串，因为它只是查看字符串中是否存在匹配项。如果要提取

cat

，则需要使用其他功能，如

stru extract

from package

stringr

：

str_extract("The cat scattered his food all over the room.", "\\bcat\\b") 
[1] "cat"

和

之间的区别在于

标记单词边界，而

是它的否定。也就是说，

\\bcat\\b

仅在

cat

用空格分隔时匹配，而

\\bcat\\b

仅在

cat

位于单词内部时匹配。例如：

str_extract_all("The forgot his education and scattered his food all over the room.", "\\Bcat\\B") 
[[1]]
[1] "cat" "cat"

这两个匹配项来自

education

和

scattered

grep

返回整个字符串，因为它只是查看字符串中是否存在匹配项。如果要提取

cat

，则需要使用其他功能，如

stru extract

from package

stringr

：

str_extract("The cat scattered his food all over the room.", "\\bcat\\b") 
[1] "cat"

和

之间的区别在于

标记单词边界，而

是它的否定。也就是说，

\\bcat\\b

仅在

cat

用空格分隔时匹配，而

\\bcat\\b

仅在

cat

位于单词内部时匹配。例如：

str_extract_all("The forgot his education and scattered his food all over the room.", "\\Bcat\\B") 
[[1]]
[1] "cat" "cat"

这两个匹配项来自

education

和

scattered

您可以使用

regexpr

和

regmatches

获得匹配项<代码>grep给出了它的位置。您也可以使用

sub

x <- "The cat scattered his food all over the room."
regmatches(x, regexpr("\\bcat\\b", x))
#[1] "cat"
sub(".*(\\bcat\\b).*", "\\1", x)
#[1] "cat"

x <- "Please enter the nine-digit id as it appears on your color - coded pass-key."
regmatches(x, regexpr("\\B-\\B", x))
#[1] "-"
sub(".*(\\B-\\B).*", "\\1", x)
#[1] "-"

您可以使用

regexpr

和

regmatches

获取匹配项<代码>grep给出了它的位置。您也可以使用

sub

x <- "The cat scattered his food all over the room."
regmatches(x, regexpr("\\bcat\\b", x))
#[1] "cat"
sub(".*(\\bcat\\b).*", "\\1", x)
#[1] "cat"

x <- "Please enter the nine-digit id as it appears on your color - coded pass-key."
regmatches(x, regexpr("\\B-\\B", x))
#[1] "-"
sub(".*(\\B-\\B).*", "\\1", x)
#[1] "-"

我喜欢你的答案使用R基。我有另一个例子，其中模式在字符串中多次出现。在

stringr

中，我们可以对多次发生（将返回1和2）执行

str\u extract\u all（“1abc2”，“0-9]”）操作。但是regmatches（“1abc2”，regexpr（[0-9]，“1abc2”））
只返回1。您的方法是否有办法做到这一点？是的：使用gregexpr
而不是regexpr
。我在答案中加了它。我喜欢你的答案使用R基。我有另一个例子，其中模式在字符串中多次出现。在stringr
中，我们可以对多次发生（将返回1和2）执行str\u extract\u all（“1abc2”，“0-9]”）操作。但是regmatches（“1abc2”，regexpr（[0-9]，“1abc2”））
只返回1。您的方法是否有办法做到这一点？是的：使用gregexpr
而不是regexpr
。我在答案中加了它。