如何用一个表达式替换regexp的向量?

如何用一个表达式替换regexp的向量?,r,replace,gsub,R,Replace,Gsub,我的材料中有大约100多个不同的村庄。为了让我的可视化有意义,我需要将它们分成22个市镇,如下所示: TROLLHÄTTAN<-toupper(c("Trollhättan","Sjuntorp","Velanda","Åsaka","Upphärad")) UDDEVALLA<-toupper(c("UDDEVALLA","KURVERÖD","AMMENÄS","FAGERHULT","LANESUND OCH ÖVERBY", "LANESUND","ÖVERBY","REST

我的材料中有大约100多个不同的村庄。为了让我的可视化有意义,我需要将它们分成22个市镇,如下所示:

TROLLHÄTTAN<-toupper(c("Trollhättan","Sjuntorp","Velanda","Åsaka","Upphärad"))
UDDEVALLA<-toupper(c("UDDEVALLA","KURVERÖD","AMMENÄS","FAGERHULT","LANESUND OCH ÖVERBY",
"LANESUND","ÖVERBY","RESTENÄS OCH ULVESUND","RESTENÄS","ULVESUND","STRAND","UTBY","HOGSTORP","SUND","SMEDSERÖD"))
VÄNERSBORG<-toupper(c("Vänersborg","Vargön","Brålanda","Frändefors","Nordkroken","Katrinedal"))
LYSEKIL<-toupper(c("Lysekil", "Brastad", "Grundsund", "Fiskebäckskil"))
FÄRGELANDA<-toupper(c("Färgelanda","Högsäter","Ödeborg","Stigen"))
MELLERUD<-toupper(c("Mellerud","Dals Rostock","Åsensbruk"))
ED<-toupper(c("Ed"))
BENGTSFORS<-toupper(c("Bengtsfors","Dals Långed","Billingsfors","Bäckefors","Skåpafors"))
ÅMÅL<-toupper(c("Åmål","Tösse","Fengersfors"))
STRÖMSTAD<-toupper(c("Strömstad","Skee","Kebal","Stare"))
TANUM<-toupper(c("Grebbestad","Tanumshede","Fjällbacka","Hamburgsund","Rabbalshede"))
SOTENÄS<-toupper(c("Hunnebostrand","Kungshamn","Smögen","Malmön","Bovallstrand"))
MUNKEDAL<-toupper(c("Munkedal","Dingle","Hällevadsholm","Hedekas","Torreby"))
ORUST<-toupper(c("Svanesund","Ellös","Hälleviksstrand","Mollösund","Henån","Höggeröd","Vindön och Töllås","Varekil","Vindön","Töllås"))
LILLA_EDET<-toupper(c("Lilla Edet","Lödöse","Lilla Edet västra","Göta","Nygård","Hjärtum"))
ALE<-toupper(c("Ale","Nödinge-Nol","Surte","Älvängen","Skepplanda","Alvhem"))
STENUNGSUND<-toupper(c("Jörlanda","Stora Höga","Timmervik","Spekeröd","Stenungsund","Stenungsön","Svartehallen","Svenshögen","Ucklum","Ödsmål"))
TJÖRN<-toupper(c("Bleket","Djupvik och Fagerfjäll","Höviksnäs","Klövedal","Kyrkesund och Bö","Kållekärr","Myggenäs","Rönnäng","Skärhamn","Stora Dyrön",
"Djupvik","Fagerfjäll","Kyrkesund","Bö"))
KUNGÄLV<-toupper(c("Aröd och Timmervik","Diseröd","Duvesjön","Harestad och Nereby","Kareby","Kode","Kovikshamn","Kungälv","Kärna",
"Lundby","Marstrand","Marstrand", "Arvidsvik","Risby","Rishammar","Signehög och Norrmannebo","Solberga","Tjuvkil","Ödsmål och Åseby",
"Ödsmåls mosse och Rörtången","Aröd","Timmervik","Harestad","Nereby","Signehög","Norrmannebo","Ödsmål","Åseby","Ödsmåls mosse","Rörtången"))
ALINGSÅS<-toupper(c("Alingsås","Ingared","Sollebrunn","Västra Bodarna","Gräfsnäs","Hemsjö","Stora Mellby","Hjälmared","Långared","Svanvik",
"Ryd","Magra"))
VARA<-toupper(c("Vara","Kvänum","Tråvad","Jung","Vedum","Larv","Stora Levene","Emtunga","Arentorp"))
ESSUNGA<-toupper(c("Nossebro","Främmestad","Jonslund"))
VÅRGÅRDA<-toupper(c("Vårgårda","Östadkulle","Horla"))
GRÄSTORP<-toupper(c("GRÄSTORP"))
LIDKÖPING<-toupper(c("Lidköping","Lidköping norra","Vinninga","Järpås","Filsbäck","Örslösa","Saleby"))
GÖTEBORG<-toupper(c("Göteborg","Gunnared och Hammarkullen","Torslanda","Billdal","Olofstorp","Donsö","Nolvik","Styrsö","Angered",
"Brännö","Säve","Helgered","Tumlehed","Asperö","Stenared","Vrångö","Gundal och Högås","Gunnared","Hammarkullen","Gundal","Högås"))
我试图通过mgsub textclean版本将村庄名称向量替换为市政名称,但遇到了问题。例如,村庄名称也是一种流行的后缀。这意味着HUNNEBOSTRAND被转换为Hunnebouddevella,这当然不是最优的

我尝试用正则表达式编写向量:

LYSEKIL<-toupper(c("^Lysekil$", "^Brastad$", "^Grundsund$", "^Fiskebäckskil$"))
我发现textclean版本的mgsub无法处理正则表达式。我改为mgsub包,它希望向量的长度相同,这不是我想要的。mgsub的qdap版本似乎以类似的方式运行

这有什么办法吗

删除敏感部分的原始数据

structurelistCITY=cHENÅN,NA,HENÅN, 阿尔文根,纳,特罗尔赫坦 ,ZIPCODE=c47395L,NA,47332L,44636L,NA,46157L,COURSEOFFERING_ID=c97113L, 97113L,97113L,97113L,97113L,97113L,row.names=c1L,5L,
9L、12L、15L、18L,class=data.frame

为了避免村庄名称也作为后缀的问题,您可以使用“^”和“$”来锚定这些名称的开头和结尾。您的想法是正确的。但是,要用相应的市镇名称替换村庄名称,需要使用函数gsub或stringr::str_replace_all。为了安全起见,不必担心哪些名字会成为问题,只需使用^和$锚定所有村庄的名字即可

这里有一个选项:

创建一个包含100多个村庄名称的向量,我使用前两个向量作为示例: 你最终会得到一个100多个元素的向量,每个元素对应于你最初的村庄,但是这个向量将只由你的22个城市名称组成

根据我使用的示例数据,这将为您提供:

[1] "TROLLHÄTTAN" "TROLLHÄTTAN" "TROLLHÄTTAN" "TROLLHÄTTAN" "TROLLHÄTTAN"
 [6] "UDDEVALLA"   "UDDEVALLA"   "UDDEVALLA"   "UDDEVALLA"   "UDDEVALLA"  
[11] "UDDEVALLA"   "UDDEVALLA"   "UDDEVALLA"   "UDDEVALLA"   "UDDEVALLA"  
[16] "UDDEVALLA"   "UDDEVALLA"   "UDDEVALLA"   "UDDEVALLA"   "UDDEVALLA"  

在没有锚定的情况下,兰森德变成了兰尤德瓦拉,因为桑德变成了乌德瓦拉。但是锚定阻止了这一点。

评论不用于扩展讨论;这段对话已经结束。
library(dplyr)

all_village_names %>%
  gsub("^Trollhättan$|^Sjuntorp$|^Velanda$|^Åsaka$|^Upphärad$", "TROLLHÄTTAN", .) %>%
  gsub("^UDDEVALLA$|^KURVERÖD$|^AMMENÄS$|^FAGERHULT$|^LANESUND OCH ÖVERBY$|^LANESUND$|^ÖVERBY$|^RESTENÄS OCH ULVESUND$|^RESTENÄS$|^ULVESUND$|^STRAND$|^UTBY$|^HOGSTORP$|^SUND$|^SMEDSERÖD$", "UDDEVALLA", .)
[1] "TROLLHÄTTAN" "TROLLHÄTTAN" "TROLLHÄTTAN" "TROLLHÄTTAN" "TROLLHÄTTAN"
 [6] "UDDEVALLA"   "UDDEVALLA"   "UDDEVALLA"   "UDDEVALLA"   "UDDEVALLA"  
[11] "UDDEVALLA"   "UDDEVALLA"   "UDDEVALLA"   "UDDEVALLA"   "UDDEVALLA"  
[16] "UDDEVALLA"   "UDDEVALLA"   "UDDEVALLA"   "UDDEVALLA"   "UDDEVALLA"