R 将2个数据帧与其中一个数据帧中用逗号分隔的值合并
我有两个这样的数据帧R 将2个数据帧与其中一个数据帧中用逗号分隔的值合并,r,datatable,dplyr,R,Datatable,Dplyr,我有两个这样的数据帧 df1 <- data.frame(Colors = c("Yellow","Pink","Green","Blue","White","Red" ,"Cyan","Brown","Violet","Orange","Gray")) df2 <- data.frame(Colors = c("Yellow,Pink","Green","Gold","White","Red,Cyan,Brown",
df1 <- data.frame(Colors = c("Yellow","Pink","Green","Blue","White","Red"
,"Cyan","Brown","Violet","Orange","Gray"))
df2 <- data.frame(Colors = c("Yellow,Pink","Green","Gold","White","Red,Cyan,Brown",
"Violet","Magenta","Gray"))
如果我在每个拆分项目上使用
pmatch
进行df基本R
解决方案:
split_list <- strsplit(as.character(df2$Colors),",")
keep_lgl <- sapply(split_list,function(x) !anyNA(pmatch(x,df1$Colors)))
df2[keep_lgl,,drop=FALSE]
# Colors
# 1 Yellow,Pink
# 2 Green
# 4 White
# 5 Red,Cyan,Brown
# 6 Violet
# 8 Gray
您可以使用fuzzyjoin
包中的regex\u internal\u join
来加入df1
和df2
。最后,从df2
列中选择唯一的行
library(dplyr)
library(fuzzyjoin)
regex_inner_join(df2, df1, by=c(Colors = "Colors")) %>%
select(Colors = Colors.x) %>% distinct()
# Colors
# 1 Yellow,Pink
# 2 Green
# 3 White
# 4 Red,Cyan,Brown
# 5 Violet
# 6 Gray
# Just to demonstrate, result of joined tables using regex_inner_join. One,
# can work-out to convert data in desired format afterwards.
regex_inner_join(df2, df1, by=c(Colors = "Colors"))
# Colors.x Colors.y
# 1 Yellow,Pink Yellow
# 2 Yellow,Pink Pink
# 3 Green Green
# 4 White White
# 5 Red,Cyan,Brown Red
# 6 Red,Cyan,Brown Cyan
# 7 Red,Cyan,Brown Brown
# 8 Violet Violet
# 9 Gray Gray
“黄色、粉色”与“黄色”、“粉色”不同,因此不会返回。红色、青色、棕色也一样。本质上,您正试图在这两个位置上连接两个不同的字符串。联接通过匹配完全相同的ID进行操作
library(tidyverse)
df2 %>% mutate(keep=Colors) %>%
separate_rows(Colors) %>%
add_count(keep) %>%
inner_join(df1) %>%
add_count(keep) %>% # doesn't do anything here but important in general
filter(n==nn) %>% # same
distinct(keep) %>%
rename(Colors=keep)
# # A tibble: 6 x 1
# Colors
# <fctr>
# 1 Yellow,Pink
# 2 Green
# 3 White
# 4 Red,Cyan,Brown
# 5 Violet
# 6 Gray
df2 %>% mutate(keep=Colors) %>%
separate_rows(Colors) %>%
left_join(df1 %>% mutate(Colors2=Colors,.)) %>%
group_by(keep) %>%
summarize(filt=anyNA(Colors2)) %>%
filter(!filt) %>%
select(-2)
# # A tibble: 6 x 1
# keep
# <fctr>
# 1 Gray
# 2 Green
# 3 Red,Cyan,Brown
# 4 Violet
# 5 White
# 6 Yellow,Pink
library(dplyr)
library(fuzzyjoin)
regex_inner_join(df2, df1, by=c(Colors = "Colors")) %>%
select(Colors = Colors.x) %>% distinct()
# Colors
# 1 Yellow,Pink
# 2 Green
# 3 White
# 4 Red,Cyan,Brown
# 5 Violet
# 6 Gray
# Just to demonstrate, result of joined tables using regex_inner_join. One,
# can work-out to convert data in desired format afterwards.
regex_inner_join(df2, df1, by=c(Colors = "Colors"))
# Colors.x Colors.y
# 1 Yellow,Pink Yellow
# 2 Yellow,Pink Pink
# 3 Green Green
# 4 White White
# 5 Red,Cyan,Brown Red
# 6 Red,Cyan,Brown Cyan
# 7 Red,Cyan,Brown Brown
# 8 Violet Violet
# 9 Gray Gray