Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/84.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
删除字符串中空格后的字符-R Studio数据清理_R_Regex_Data Cleaning - Fatal编程技术网

删除字符串中空格后的字符-R Studio数据清理

删除字符串中空格后的字符-R Studio数据清理,r,regex,data-cleaning,R,Regex,Data Cleaning,我正在尝试清理R Studio中的一些数据 以下是我的数据示例 LSOA name: York 009A Wychavon 014A Bath and North East Somerset 001A Aylesbury Vale 008C Central Bedfordshire 030C 我希望能够删除每个结尾的代码。结果数据如下所示: LSOA name: York Wychavon Bath and North East Somerset Aylesbury Vale Central

我正在尝试清理R Studio中的一些数据

以下是我的数据示例

LSOA name:
York 009A
Wychavon 014A
Bath and North East Somerset 001A
Aylesbury Vale 008C
Central Bedfordshire 030C
我希望能够删除每个结尾的代码。结果数据如下所示:

LSOA name:
York
Wychavon
Bath and North East Somerset
Aylesbury Vale 
Central Bedfordshire 
我对regex很陌生,所以觉得这很难。据我所知,由于代码前面的字数是可变的,所以不可能简单地删除空格后面的字符


任何帮助都将不胜感激

我们可以使用
sub
匹配一个或多个空格,后跟一个或多个数字(
\\d+
)和字符串末尾(
$
)的大写字母(
[A-Z]
),并将其替换为空白(

数据
df1您还可以使用lookahead
(?=\\s\\d+)
和backreference
\\1

sub("(.*)(?=\\s\\d+).*", "\\1", df1$name, perl = T)
[1] "York"                         "Wychavon"                     "Bath and North East Somerset" "Aylesbury Vale"              
[5] "Central Bedfordshire"
另一个选项是
str_extract
和否定字符类
\\D
,它匹配任何非数字字符(
trimws
删除空白)


这非常有效,谢谢:)
df1
#                          name
#1                         York
#2                     Wychavon
#3 Bath and North East Somerset
#4               Aylesbury Vale
#5         Central Bedfordshire
df1 <- structure(list(name = c("York 009A", "Wychavon 014A", 
"Bath and North East Somerset 001A", 
"Aylesbury Vale 008C", "Central Bedfordshire 030C")), class = "data.frame",
row.names = c(NA, 
-5L))
sub("(.*)(?=\\s\\d+).*", "\\1", df1$name, perl = T)
[1] "York"                         "Wychavon"                     "Bath and North East Somerset" "Aylesbury Vale"              
[5] "Central Bedfordshire"
library(stringr)
trimws(str_extract(df1$name, "\\D+"))