查找字符串的模式,分配给r中的新变量
我有以下数据查找字符串的模式,分配给r中的新变量,r,string,R,String,我有以下数据 Mydata <- data_frame( myfield = c("ABC MUVBC82668689230230", "C3 MVBT7927979279279.", "t4 MUDW0348737237907023.", "D18 MVGJH979247979027903") ) 到目前为止,我所尝试的: pattern <- str_locate(Mydata$myfie
Mydata <- data_frame(
myfield = c("ABC MUVBC82668689230230",
"C3 MVBT7927979279279.",
"t4 MUDW0348737237907023.",
"D18 MVGJH979247979027903")
)
到目前为止,我所尝试的:
pattern <- str_locate(Mydata$myfield, "\\d+\\-\\d+MU|\\d+\\-\\d+MV")
mydata$myfield2 <- str_extract(mydata$myfield2, pattern)
Mydata <- Mydata %>%
mutate(myfield2 = str_sub(Mydata$myfield2, pattern))
Mydata <- Mydata %>%
mutate(myfield2= str_extract(myfield, pattern = "MV\\d+"))
Mydata <- Mydata %>%
mutate(myfield2 = str_extract_all("(?<=^| )(MU|MV).*?(?=$| )"))
Mydata <- Mydata %>%
mutate(myfield2= str_extract(myfield, "Mv\\d+(_[A-Z]+)*"))
pattern我们可以使用stru-remove
library(dplyr)
library(stringr)
Mydata %>%
mutate(myfield = str_remove_all(myfield, ".*\\s+|\\.$"))
或使用str\u extract
Mydata %>%
mutate(myfield = str_extract(myfield, "\\bM[UV][[:alnum:]]+"))
# A tibble: 4 x 1
# myfield
# <chr>
#1 MUVBC82668689230230
#2 MVBT7927979279279
#3 MUDW0348737237907023
#4 MVGJH979247979027903
Mydata%>%
突变(myfield=str_extract(myfield,“\\bM[UV][:alnum:][]+”)
#一个tibble:4x1
#麦菲尔德
#
#1 MUVBC82668689230230
#2 MVBT79279
#3 MUDW0348737237907023
#4 MVGJH979247979027903
请注意,OP代码中的某些模式不匹配,因为“MV\\d+”
在“MV”后面暗示了一个或多个数字,但事实并非如此谢谢!神奇而简单的解决方案。作为r的新手,我觉得我总是让事情复杂化。在我的真实数据库中,结果只提取了前4个字母。有没有办法设置可变大小?@DanielleTravassos。你能在真实的数据库中显示一些值吗。他们有吗。在这种情况下,我们使用的字母数字以外的值可能是str_extract(myfield,\\bM[UV].$”
谢谢!我试试看。我刚刚将“\\bM[UV][[:alnum:][]+”替换为“\\bM[UVvu][[:all:][]+”,效果也不错。好极了非常感谢。
Mydata %>%
mutate(myfield = str_extract(myfield, "\\bM[UV][[:alnum:]]+"))
# A tibble: 4 x 1
# myfield
# <chr>
#1 MUVBC82668689230230
#2 MVBT7927979279279
#3 MUDW0348737237907023
#4 MVGJH979247979027903