R:“;“vlookup”;基于R中的部分字符串匹配
我有两个数据帧: 一, 二,R:“;“vlookup”;基于R中的部分字符串匹配,r,string-matching,R,String Matching,我有两个数据帧: 一, 二, name address SIBLEY B SOME ADDRESS 1 STEWART C;KOCH A SOME ADDRESS 2 HILL GM;LEE A;SMITH E SOME ADDRESS 3 DAVIS L
name address
SIBLEY B SOME ADDRESS 1
STEWART C;KOCH A SOME ADDRESS 2
HILL GM;LEE A;SMITH E SOME ADDRESS 3
DAVIS L SOME ADDRESS 4
MERCIER K;SMITH E;GIBBONE A SOME ADDRESS 5
DAVIDSON S;BEKIARI A SOME ADDRESS 6
我希望能够将第一个表中的名称
与第二个表中的名称
字符串匹配的实例相匹配,然后添加地址
列中的数据,有点像vlookup。它还必须处理相同名称的多个实例。在上面的示例中,姓名SMITH E
(不同的人)将提供匹配,并给出以下结果:
NAME ADDRESS 1 ADDRESS 2
1 SMALL H
2 ZITT M
3 SMITH E SOME ADDRESS 5 SOME ADDRESS 3
4 GLANZEL W
5 HUANG MH
6 THIJS B
下面是一个
tidyverse
解决方案。我首先通过将条目拆分为单独的名称来清理第二个表。我们可以使用left\u join
来匹配条目:
library(tidyverse)
df2_clean <- df2 %>%
mutate(name = str_split(name, ";")) %>%
unnest(name)
df1 %>%
left_join(df2_clean, by = c("NAME" = "name"))
#> NAME address
#> 1 SMALL H <NA>
#> 2 ZITT M <NA>
#> 3 SMITH E SOME ADDRESS 3
#> 4 SMITH E SOME ADDRESS 5
#> 5 GLANZEL W <NA>
#> 6 HUANG MH <NA>
#> 7 THIJS B <NA>
库(tidyverse)
df2_清洁度%
突变(name=str_split(name,;”)%%>%
unnest(名称)
df1%>%
左连接(df2_clean,by=c(“NAME”=“NAME”))
#>姓名地址
#>1小时
#>2吨米
#>史密斯给我一些地址
#>史密斯:给我一些地址
#>5格兰泽尔W
#>6黄MH
#>7 THIJS B
如果您真的愿意,您可以将Smith的两个地址分成两列,但我建议在这里使用长格式:
df1 %>%
left_join(df2_clean, by = c("NAME" = "name")) %>%
group_by(NAME) %>%
mutate(add_c = row_number()) %>%
pivot_wider(id_cols = NAME, names_from = add_c, names_prefix = "address_", values_from = address)
#> # A tibble: 6 x 3
#> # Groups: NAME [6]
#> NAME address_1 address_2
#> <chr> <chr> <chr>
#> 1 SMALL H <NA> <NA>
#> 2 ZITT M <NA> <NA>
#> 3 SMITH E SOME ADDRESS 3 SOME ADDRESS 5
#> 4 GLANZEL W <NA> <NA>
#> 5 HUANG MH <NA> <NA>
#> 6 THIJS B <NA> <NA>
df1%>%
左连接(df2清洁,由=c(“名称”=“名称”))%>%
分组单位(名称)%>%
变异(添加行数())%>%
pivot\u wide(id\u cols=NAME,names\u from=add\u c,names\u prefix=“address”,values\u from=address)
#>#tibble:6 x 3
#>#组:名称[6]
#>姓名地址\u 1地址\u 2
#>
#>1小时
#>2吨米
#>3史密斯E某个地址3某个地址5
#>4格兰泽尔W
#>5黄MH
#>6 THIJS B
数据
df1下次提问时,您可以使用dput(df)
并将控制台的输出复制到您的问题中。这样,其他人运行您的示例会更容易一些。作为友好的提示。
df1 %>%
left_join(df2_clean, by = c("NAME" = "name")) %>%
group_by(NAME) %>%
mutate(add_c = row_number()) %>%
pivot_wider(id_cols = NAME, names_from = add_c, names_prefix = "address_", values_from = address)
#> # A tibble: 6 x 3
#> # Groups: NAME [6]
#> NAME address_1 address_2
#> <chr> <chr> <chr>
#> 1 SMALL H <NA> <NA>
#> 2 ZITT M <NA> <NA>
#> 3 SMITH E SOME ADDRESS 3 SOME ADDRESS 5
#> 4 GLANZEL W <NA> <NA>
#> 5 HUANG MH <NA> <NA>
#> 6 THIJS B <NA> <NA>
df1 <- read.delim(text = "NAME
SMALL H
ZITT M
SMITH E
GLANZEL W
HUANG MH
THIJS B", stringsAsFactors = FALSE)
df2 <- read.delim(text = "name,address
SIBLEY B,SOME ADDRESS 1
STEWART C;KOCH A,SOME ADDRESS 2
HILL GM;LEE A;SMITH E,SOME ADDRESS 3
DAVIS L,SOME ADDRESS 4
MERCIER K;SMITH E;GIBBONE A,SOME ADDRESS 5
DAVIDSON S;BEKIARI A,SOME ADDRESS 6", sep = ",", stringsAsFactors = FALSE)