删除字符串中空格后的字符-R Studio数据清理
我正在尝试清理R Studio中的一些数据 以下是我的数据示例删除字符串中空格后的字符-R Studio数据清理,r,regex,data-cleaning,R,Regex,Data Cleaning,我正在尝试清理R Studio中的一些数据 以下是我的数据示例 LSOA name: York 009A Wychavon 014A Bath and North East Somerset 001A Aylesbury Vale 008C Central Bedfordshire 030C 我希望能够删除每个结尾的代码。结果数据如下所示: LSOA name: York Wychavon Bath and North East Somerset Aylesbury Vale Central
LSOA name:
York 009A
Wychavon 014A
Bath and North East Somerset 001A
Aylesbury Vale 008C
Central Bedfordshire 030C
我希望能够删除每个结尾的代码。结果数据如下所示:
LSOA name:
York
Wychavon
Bath and North East Somerset
Aylesbury Vale
Central Bedfordshire
我对regex很陌生,所以觉得这很难。据我所知,由于代码前面的字数是可变的,所以不可能简单地删除空格后面的字符
任何帮助都将不胜感激 我们可以使用
sub
匹配一个或多个空格,后跟一个或多个数字(\\d+
)和字符串末尾($
)的大写字母([A-Z]
),并将其替换为空白(”
)
数据
df1您还可以使用lookahead(?=\\s\\d+)
和backreference\\1
:
sub("(.*)(?=\\s\\d+).*", "\\1", df1$name, perl = T)
[1] "York" "Wychavon" "Bath and North East Somerset" "Aylesbury Vale"
[5] "Central Bedfordshire"
另一个选项是str_extract
和否定字符类\\D
,它匹配任何非数字字符(trimws
删除空白)
这非常有效,谢谢:)
df1
# name
#1 York
#2 Wychavon
#3 Bath and North East Somerset
#4 Aylesbury Vale
#5 Central Bedfordshire
df1 <- structure(list(name = c("York 009A", "Wychavon 014A",
"Bath and North East Somerset 001A",
"Aylesbury Vale 008C", "Central Bedfordshire 030C")), class = "data.frame",
row.names = c(NA,
-5L))
sub("(.*)(?=\\s\\d+).*", "\\1", df1$name, perl = T)
[1] "York" "Wychavon" "Bath and North East Somerset" "Aylesbury Vale"
[5] "Central Bedfordshire"
library(stringr)
trimws(str_extract(df1$name, "\\D+"))