R 如何合并多个变量并使其中一个变量处于模糊匹配中
在前一篇文章中,我最初在做模糊匹配时得到了帮助 感谢:@Ronak Shah、@r2evans和@akrun的帮助 这很有帮助,我根据这两个数据集得到了我想要的模糊匹配R 如何合并多个变量并使其中一个变量处于模糊匹配中,r,join,dplyr,R,Join,Dplyr,在前一篇文章中,我最初在做模糊匹配时得到了帮助 感谢:@Ronak Shah、@r2evans和@akrun的帮助 这很有帮助,我根据这两个数据集得到了我想要的模糊匹配 structure(list(ID = 1:8, Address = c("Canal and Broadway", "55 water street room number 73", "Mulberry street", "Front street an
structure(list(ID = 1:8, Address = c("Canal and Broadway", "55 water street room number 73",
"Mulberry street", "Front street and Fulton", "62nd street ",
"wythe street", "vanderbilt avenue", "South Beach avenue")), class = "data.frame", row.names = c(NA,
-8L))
及
运行
fuzzyjoin::stringdist_left_join(df1, df2, by = 'Address', max_dist = 5)
给我
structure(list(ID = 1:8, Address.x = c("Canal and Broadway",
"55 water street room number 73", "Mulberry street", "Front street and Fulton",
"62nd street ", "wythe street", "vanderbilt avenue", "South Beach avenue"
), ID2 = c(1L, NA, 3L, NA, 8L, 8L, 7L, 5L), Address.y = c("Canal & Broadway",
NA, "Mulberry street", NA, "62nd street", "62nd street", "vanderbilt ave",
"south beach avenue")), row.names = c(NA, -8L), class = "data.frame")
这场比赛做得很好,我接受这一点。我接下来要做的是匹配df1_new和df2_new
df1
和df 2
structure(list(ID2 = 1:8, Address = c("Canal & Broadway", "Somewhere around 55 water street",
"Mulberry street", "Front street and close to Fulton", "south beach avenue",
"along wythe street on the southwest ", "vanderbilt ave", "62nd street"
), Age = c(32L, 33L, 37L, 39L, 42L, 50L, 60L, 35L), Name = c("John",
"Adam", "Ryan", "Greg", "Mark", "Anthony", "Mike", "Phil")), class = "data.frame", row.names = c(NA,-8L))
通常我会跑步
df3<-df1 %>% left_join(df2, by=c("Address","Age","Name")
请注意,尽管62街和桑树街在模糊匹配上匹配,但它们没有相同的对应年龄和名称
fuzzyjoin::stringdist_left_join(df1_new, df2_new ['Address'], by = 'Address', max_dist
= 5) %>%
mutate(Address.z=Address.y) %>% left_join(df2_new %>%
mutate(Address.z=Address),by=c("Age","Name", "Address.z"))
这让我得到了我想要的结果。可能重复:
df3<-df1 %>% left_join(df2, by=c("Address","Age","Name")
ID Address.x D2 Address.y Age Name
1 Canal and Broadway 1 Canal & Broadway 32 John
2 55 water street room number 73
3 Mulberry street
4 Front street and Fulton
5 62nd street 8 62nd street
6 wythe street
7 vanderbilt avenue 7 vanderbilt ave 60 Mike
8 South Beach avenue 5 south beach avenue 42 Mark
fuzzyjoin::stringdist_left_join(df1_new, df2_new ['Address'], by = 'Address', max_dist
= 5) %>%
mutate(Address.z=Address.y) %>% left_join(df2_new %>%
mutate(Address.z=Address),by=c("Age","Name", "Address.z"))