检查R中的列表中是否存在dataframe列值
我有一个大师的颜色如下表检查R中的列表中是否存在dataframe列值,r,dplyr,tidyr,R,Dplyr,Tidyr,我有一个大师的颜色如下表 master <- list("Beige" = c("light brown", "light golden", "skin"), "off-white" = c("off white", "cream", "light cream", "dirty
master <- list("Beige" = c("light brown", "light golden", "skin"),
"off-white" = c("off white", "cream", "light cream", "dirty white"),
"Metallic" = c("steel","silver"),
"Multi-colored" = c("multi color", "mixed colors", "mix", "rainbow"),
"Purple" = c("lavender", "grape", "jam", "raisin", "plum", "magenta"),
"Red" = c("cranberry", "strawberry", "raspberry", "dark cherry", "cherry","rosered"),
"Turquoise" = c("aqua marine", "jade green"),
"Yellow" = c("fresh lime")
)
编辑此
c中的任何值(“多色”、“混合色”、“混合色”、“彩虹”、“多色”、“多色”、“多色”、“多色”、“多色”、“多色”)
应视为多色
可能是我们可以在堆栈将列表添加到单个data.frame之后执行字符串连接
library(dplyr)
library(fuzzyjoin)
library(tibble)
enframe(master, value = 'color') %>%
unnest(c(color)) %>%
type.convert(as.is = TRUE) %>%
stringdist_right_join(df %>%
mutate(rn = row_number()), max_dist = 3) %>%
transmute(color = color.y, output = coalesce(name, color.y))
# A tibble: 19 x 2
# color output
# <chr> <chr>
# 1 multi color Multi-colored
# 2 purple purple
# 3 steel Metallic
# 4 metallic metallic
# 5 off white off-white
# 6 raisin Purple
# 7 strawberry Red
# 8 strawberry Red
# 9 magenta Purple
#10 skin Beige
#11 skin Multi-colored
#12 Beige Beige
#13 Jade Green Turquoise
#14 cream off-white
#15 cream Purple
#16 multi-colored Multi-colored
#17 offwhite off-white
#18 rosered Red
#19 light cream off-white
库(dplyr)
库(模糊连接)
图书馆(tibble)
enframe(主控,值='颜色')%>%
unnest(c(颜色))%>%
type.convert(as.is=TRUE)%>%
stringdist\u右\u连接(df%>%
变异(rn=行数()),最大距离=3)%>%
转化(color=color.y,output=coalesce(name,color.y))
#一个tibble:19x2
#颜色输出
#
#1多色多色
#2紫色
#3钢制金属
#4金属
#5米白色米白色
#葡萄干紫
#7草莓红
#8草莓红
#9品红紫色
#10皮米色
#11皮肤多色
#12米色
#13翡翠绿松石
#14乳白色
#15奶油紫
#16多色多色
#17米白色
#18玫瑰红
#19浅奶油色米白色
数据
df以下显示的输出可能与预期不同,因为您的主列表未显示所有元素。例如,对于紫色,它显示为[1]“薰衣草”“葡萄”“果酱”“葡萄干”“李子”“洋红”
。“紫色”“薰衣草”“葡萄”“果酱”“葡萄干”“李子”“洋红”中没有匹配的元素,如果存在这些值中的任何一个,则应将其视为紫色
是,但如果您查看输入,则第二个值为“紫色”,被紫色
是替换,这没问题,但是,您在下面的Jade Green
中提供的输出没有被Turquoise
替换,我已经用我可以拥有的模式更新了问题dplyr
在mutate_impl(.data,dots)中给了我这个错误:计算错误:参数2必须是整数类型,非字符和enframe
根据的要求给我这个,因为数据源没有公共变量返回:@jamesjoyce你能检查连接列的类型吗,即我有两个类型相同的列typeof(master)
islist
和typeof(df$color)
我在问题中提供的是字符
@jamesjoyce是的,主文件
列表
被转换为两列数据集,其中包含enframe
或堆栈
@jamesjoyce我添加了类型
df$output <- c("Multi-colored","Purple","Metallic","Metallic","off-white","Purple","Red","Purple","Beige","Beige","Turquoise","off-white","Multi-colored","off-white","Red","off-white")
library(dplyr)
library(fuzzyjoin)
library(tibble)
enframe(master, value = 'color') %>%
unnest(c(color)) %>%
type.convert(as.is = TRUE) %>%
stringdist_right_join(df %>%
mutate(rn = row_number()), max_dist = 3) %>%
transmute(color = color.y, output = coalesce(name, color.y))
# A tibble: 19 x 2
# color output
# <chr> <chr>
# 1 multi color Multi-colored
# 2 purple purple
# 3 steel Metallic
# 4 metallic metallic
# 5 off white off-white
# 6 raisin Purple
# 7 strawberry Red
# 8 strawberry Red
# 9 magenta Purple
#10 skin Beige
#11 skin Multi-colored
#12 Beige Beige
#13 Jade Green Turquoise
#14 cream off-white
#15 cream Purple
#16 multi-colored Multi-colored
#17 offwhite off-white
#18 rosered Red
#19 light cream off-white
df <- structure(list(color = c("multi color", "purple", "steel", "metallic",
"off white", "raisin", "strawberry", "magenta", "skin", "Beige",
"Jade Green", "cream", "multi-colored", "offwhite", "rosered",
"light cream")), class = "data.frame", row.names = c(NA, -16L
))