在R中拆分文本字符串的正则表达式_R_Regex

在R中拆分文本字符串的正则表达式

r regex

在R中拆分文本字符串的正则表达式,r,regex,R,Regex,我有一个很长的字符串，比如下面的示例bellow，我正在努力找到一个正则表达式，根据patrn将其拆分为多个部分，例如：“1”。美洲国家组织/AC'和'2。美洲国家组织/非洲发展组织' 此文本片段具有： 1）开始时的变化数 2）从A到Z的两个大写字母我试过这个： x <- stringr::str_split(have, "([1-9])( OAS / )([A-Z]{2})") x我们可以通过积极的前瞻来实现这一点，寻找一个数字的模式，然后是一个peroid： str_split

我有一个很长的字符串，比如下面的示例bellow，我正在努力找到一个正则表达式，根据patrn将其拆分为多个部分，例如：“1”。美洲国家组织/AC'和'2。美洲国家组织/非洲发展组织'

此文本片段具有：

1）开始时的变化数

2）从A到Z的两个大写字母

我试过这个：

x <- stringr::str_split(have, "([1-9])( OAS / )([A-Z]{2})")

x我们可以通过积极的前瞻来实现这一点，寻找一个数字的模式，然后是一个peroid：
str_split(have, "(?=\\d+\\.)")

[1] ""                                                             "1. OAS / AC 12345/this is a test string to regex, "          
[3] "2. OAS / AD     79856/this is another test string to regex, " "3. OAS / AE 87987/this is a new test string to regex. "      
[5] "4. OAS / AZ 78798456/this is one mode test string to regex."

我们可以进一步清理：
str_split(have, "(?=\\d{1,2}\\.)") %>% unlist() %>% .[-1]

[1] "1. OAS / AC 12345/this is a test string to regex, "           "2. OAS / AD     79856/this is another test string to regex, "
[3] "3. OAS / AE 87987/this is a new test string to regex. "       "4. OAS / AZ 78798456/this is one mode test string to regex." 

你可以用
library(stringr)
have <- "1. OAS / AC 12345/this is a test string to regex, 2. OAS / AD     79856/this is another test string to regex, 3. OAS / AE 87987/this is a new test string to regex. 4. OAS / AZ 78798456/this is one mode test string to regex."
r <- stringr::str_match_all(have, "(\\d+\\. OAS / [A-Z]{2})\\s*(.*?)(?=\\s*\\d+\\. OAS / [A-Z]{2}|\\z)")
res <- r[[1]][,3]
names(res) <- r[[1]][,2]

看
图案细节

（\d+\.OAS/[A-Z]{2}）-捕获组1：

\d+
-1+位
\.
-a
OAS/
-文本OAS/
子字符串
[A-Z]{2}
-两个大写字母

\s*
-0+空格
（.*）
-捕获组2：除换行符以外的任何0+字符，尽可能少
（？=\s*\d+\.OAS/[A-Z]{2}|\Z）-正向前瞻：在当前位置的右侧，必须有

\s*\d+\。OAS/[A-Z]{2}
-0+空格，1+位数，
，空格，//code>，空格，两个大写字母

|
-或
\z
-字符串结尾

您描述问题的方式有点不清楚，但如果您只想提取“OAS/AC”


要使上述函数起作用，句子应该是字符向量中的单个字符串
如果您的目标是在两个字母的子字符串和出现在“OAS”
之后的数字之间插入一个“=”
符号
尝试stringr:：str\u match\u all（have，“（\\d+\\.OAS/[A-Z]{2}）\\s*（.*）（=\\s*\\d+\.OAS/[A-Z]{2}\\\\Z）”Hi@WiktorStribiżew。我远远没有得到这样的解决办法。非常感谢你的帮助。很高兴它对你有用。请考虑通过点击来接受答案。✓ 如果我的回答对你有帮助的话，请点击左边（见），并向上投票（见）。非常感谢你的帮助。那会有很大帮助。
library(stringr)
have <- "1. OAS / AC 12345/this is a test string to regex, 2. OAS / AD     79856/this is another test string to regex, 3. OAS / AE 87987/this is a new test string to regex. 4. OAS / AZ 78798456/this is one mode test string to regex."
r <- stringr::str_match_all(have, "(\\d+\\. OAS / [A-Z]{2})\\s*(.*?)(?=\\s*\\d+\\. OAS / [A-Z]{2}|\\z)")
res <- r[[1]][,3]
names(res) <- r[[1]][,2]

dput(res)
# => structure(c("12345/this is a test string to regex,", "79856/this is another test string to regex,", 
#  "87987/this is a new test string to regex.", "78798456/this is one mode test string to regex."
#  ), .Names = c("1. OAS / AC", "2. OAS / AD", "3. OAS / AE", "4. OAS / AZ"
#  ))

library(qdap)
beg2char(have, " ", 4)#looks for the fourth occurrence of \\s and extracts everything before it.

gsub("([A-Z])\\s*([0-9])","\\1 = \\2",have,perl=T)