删除字符串中空格后的字符-R Studio数据清理_R_Regex_Data Cleaning

删除字符串中空格后的字符-R Studio数据清理

r regex

删除字符串中空格后的字符-R Studio数据清理,r,regex,data-cleaning,R,Regex,Data Cleaning,我正在尝试清理R Studio中的一些数据以下是我的数据示例 LSOA name: York 009A Wychavon 014A Bath and North East Somerset 001A Aylesbury Vale 008C Central Bedfordshire 030C 我希望能够删除每个结尾的代码。结果数据如下所示： LSOA name: York Wychavon Bath and North East Somerset Aylesbury Vale Central

我正在尝试清理R Studio中的一些数据

以下是我的数据示例

LSOA name:
York 009A
Wychavon 014A
Bath and North East Somerset 001A
Aylesbury Vale 008C
Central Bedfordshire 030C

我希望能够删除每个结尾的代码。结果数据如下所示：

LSOA name:
York
Wychavon
Bath and North East Somerset
Aylesbury Vale 
Central Bedfordshire

我对regex很陌生，所以觉得这很难。据我所知，由于代码前面的字数是可变的，所以不可能简单地删除空格后面的字符

任何帮助都将不胜感激

我们可以使用

sub

匹配一个或多个空格，后跟一个或多个数字（

\\d+

）和字符串末尾（

）的大写字母（

[A-Z]

），并将其替换为空白（

”

）

数据

df1您还可以使用lookahead（？=\\s\\d+）
和backreference\\1
：
sub("(.*)(?=\\s\\d+).*", "\\1", df1$name, perl = T)
[1] "York"                         "Wychavon"                     "Bath and North East Somerset" "Aylesbury Vale"              
[5] "Central Bedfordshire"

另一个选项是str_extract
和否定字符类\\D
，它匹配任何非数字字符（trimws
删除空白）
这非常有效，谢谢：）
df1
#                          name
#1                         York
#2                     Wychavon
#3 Bath and North East Somerset
#4               Aylesbury Vale
#5         Central Bedfordshire

df1 <- structure(list(name = c("York 009A", "Wychavon 014A", 
"Bath and North East Somerset 001A", 
"Aylesbury Vale 008C", "Central Bedfordshire 030C")), class = "data.frame",
row.names = c(NA, 
-5L))

sub("(.*)(?=\\s\\d+).*", "\\1", df1$name, perl = T)
[1] "York"                         "Wychavon"                     "Bath and North East Somerset" "Aylesbury Vale"              
[5] "Central Bedfordshire"

library(stringr)
trimws(str_extract(df1$name, "\\D+"))