R、 stringr-替换datframe中行中的多个字符
我的地址存储在存储数据框的“地址”列中,我想创建一个新列,对现有地址进行以下更正:R、 stringr-替换datframe中行中的多个字符,r,dataframe,str-replace,stringr,R,Dataframe,Str Replace,Stringr,我的地址存储在存储数据框的“地址”列中,我想创建一个新列,对现有地址进行以下更正: {"ST": "STREET", "RD": "ROAD", "AVE": "AVENUE", "N": "NORTH", "W": "WEST", "S": "SOUTH", "E": "EAST", "STE": "SUITE", "HWY": "HIGHWAY", "DR": "DRIVE", "NW": "NORTH WEST", "NE": "NORTH EA
{"ST": "STREET",
"RD": "ROAD",
"AVE": "AVENUE",
"N": "NORTH",
"W": "WEST",
"S": "SOUTH",
"E": "EAST",
"STE": "SUITE",
"HWY": "HIGHWAY",
"DR": "DRIVE",
"NW": "NORTH WEST",
"NE": "NORTH EAST",
"SW": "SOUTH WEST",
"SE": "SOUTH EAST",
"LN": "LANE",
"WAY": "WAY"}
我应该如何推进这一进程
预期产出:
101 ST LN->101 STREET LANE解决此问题的一种方法是使用
stri\u replace\u all\u regex
fromstringi
。它接受矢量化的模式和替换
我们可以使用\b
通配符作为单词边界,它本身需要转义到\\b
。为了处理缩写以
结尾的情况,我们可以将文本
或\b
与(\\.\124;\\ b)
匹配
我根据答案末尾的数据制作模式和替换向量
library(stringi)
stri_replace_all_regex("101 ST. LN",pattern = terms[[1]], replacement = terms[[2]],vectorize_all = FALSE)
[1] "101 STREET LANE"
这同样适用于要进行替换的字符串向量
data <- data.frame(address = c("1 N ST", "2 E AVE", "3 S RD", "4 SE LN"))
stri_replace_all_regex(data$address,pattern = terms[[1]], replacement = terms[[2]],vectorize_all = FALSE)
#[1] "1 NORTH STREET" "2 EAST AVENUE" "3 SOUTH ROAD" "4 SOUTH EAST LANE"
data这应该有效,从包stringr
中选择str\u replace\u all
:
df <- data.frame(address = c("12 ST W", "333 AVE", "45 RD", "666 STE E"))
str_replace_all(df$address,c("\\bST\\b" = "STREET",
"\\bRD\\b" = "ROAD",
"\\bAVE\\b" = "AVENUE",
"\\bN\\b" = "NORTH",
"\\bW\\b" = "WEST",
"\\bE\\b" = "EAST",
"\\bSTE\\b" = "SUITE"))
[1] "12 STREET WEST" "333 AVENUE" "45 ROAD" "666 SUITE EAST"
df您能显示导出的ouput@akrun例如,如果我们将“101 ST LN”作为现有地址,我希望新地址为“101 STREET LANE”,请尝试将数据转换为命名向量。然后使用stringr::str\u replace\u all
。感谢您的帮助!很抱歉,如果您遇到类似“ST”的字符串可以以“.”结尾的情况,您会怎么做?例如,我的存储数据框中有地址为101 N ST.->这将转换为->101 North ST.您可以使用(\\\.\124;\\ b)
匹配句点或单词边界。我编辑了我的答案。
df <- data.frame(address = c("12 ST W", "333 AVE", "45 RD", "666 STE E"))
str_replace_all(df$address,c("\\bST\\b" = "STREET",
"\\bRD\\b" = "ROAD",
"\\bAVE\\b" = "AVENUE",
"\\bN\\b" = "NORTH",
"\\bW\\b" = "WEST",
"\\bE\\b" = "EAST",
"\\bSTE\\b" = "SUITE"))
[1] "12 STREET WEST" "333 AVENUE" "45 ROAD" "666 SUITE EAST"