R 如何用看起来像Aa*的单词分隔字符串?
我有这样一个df:R 如何用看起来像Aa*的单词分隔字符串?,r,regex,R,Regex,我有这样一个df: df<-structure(list(col3 = c("Text or A ny V alue", "Text or A ny V alue", "Text or A ny V alue", "Categorical select multiple", "Categorical select one (nominal) 3", "Categorical s
df<-structure(list(col3 = c("Text or A ny V alue", "Text or A ny V alue",
"Text or A ny V alue", "Categorical select multiple", "Categorical select one (nominal) 3",
"Categorical select one (nominal) 13", "Categorical select one (nominal) 71",
"CHMUNIT Text or A ny V alue", "Categorical select one (nominal) 71",
"Text or A ny V alue", "Categorical select one (nominal) 3",
"Categorical select one (nominal) 3", "Categorical select one (nominal) 3",
"Text or A ny V alue", "Categorical yes/no (dichotomous) 3",
"Text or A ny V alue", "Categorical select one (nominal) 3",
"Categorical select one (nominal) 71", "DSMETA DT Date", "DSMETA ST Text or A ny V alue",
"Categorical yes/no (dichotomous) 3", "DSPA THDT Date", "Categorical yes/no (dichotomous) 3",
"Text or A ny V alue", "Text or A ny V alue", "Text or A ny V alue",
"Categorical yes/no (dichotomous) 3", "Categorical yes/no (dichotomous) 3",
"Categorical select one (nominal) 71", "V DCO V O S Text or A ny V alue",
"V DCO V O S Text or A ny V alue", "V DCO V O S Text or A ny V alue",
"Categorical select multiple 44", "Categorical select one (nominal) 3"
)), row.names = c(NA, -34L), class = "data.frame")
df我们可以在这里使用sub
和grepl
:
df$New\u Var一个更简单的解决方案是:
library(stringr)
df$NewVar <- str_extract(df$col3, "^[A-Z\\s]{2,}(?![a-z])")
好主意@Stataq\1
(或者在其他编程语言中有时是$1
)指的是在regex中定义的第一个捕获组。在本例中,它是从每个匹配开始的所有大写单词的序列。顺便说一句,后续问题的模式可能是:^.*([A-Z]+\\b(?[A-Z]+\\b)*)$
。。。这将捕获所有出现在某个专栏末尾的大写字母。非常感谢。^
不是字符串的开头吗?这需要我们得到一个从头到尾都是大写的完整col2吗?如果你想使用一个捕获组,那么你需要从一开始就匹配整个列,即使你真的只想捕获结尾。请注意,你的正则表达式模式也会匹配像CASH$
。。。
df
col3 NewVar
1 Text or A ny V alue <NA>
2 Text or A ny V alue <NA>
3 Text or A ny V alue <NA>
4 Categorical select multiple <NA>
5 Categorical select one (nominal) 3 <NA>
6 Categorical select one (nominal) 13 <NA>
7 Categorical select one (nominal) 71 <NA>
8 CHMUNIT Text or A ny V alue CHMUNIT
9 Categorical select one (nominal) 71 <NA>
10 Text or A ny V alue <NA>
11 Categorical select one (nominal) 3 <NA>
12 Categorical select one (nominal) 3 <NA>
13 Categorical select one (nominal) 3 <NA>
14 Text or A ny V alue <NA>
15 Categorical yes/no (dichotomous) 3 <NA>
16 Text or A ny V alue <NA>
17 Categorical select one (nominal) 3 <NA>
18 Categorical select one (nominal) 71 <NA>
19 DSMETA DT Date DSMETA DT
20 DSMETA ST Text or A ny V alue DSMETA ST
21 Categorical yes/no (dichotomous) 3 <NA>
22 DSPA THDT Date DSPA THDT
23 Categorical yes/no (dichotomous) 3 <NA>
24 Text or A ny V alue <NA>
25 Text or A ny V alue <NA>
26 Text or A ny V alue <NA>
27 Categorical yes/no (dichotomous) 3 <NA>
28 Categorical yes/no (dichotomous) 3 <NA>
29 Categorical select one (nominal) 71 <NA>
30 V DCO V O S Text or A ny V alue V DCO V O S
31 V DCO V O S Text or A ny V alue V DCO V O S
32 V DCO V O S Text or A ny V alue V DCO V O S
33 Categorical select multiple 44 <NA>
34 Categorical select one (nominal) 3 <NA>