R 在两个数据集之间匹配列表中的值
我有两个数据集,我正在工作。第一个是:R 在两个数据集之间匹配列表中的值,r,dplyr,vlookup,R,Dplyr,Vlookup,我有两个数据集,我正在工作。第一个是: data_1 <- tribble( ~shop_name, ~sub_category, "A", "Blu-ray, DVDs, CD", "B", "Sneakers, Make-up, Blu-ray", "C", "Camera, Optic, DVDs", "D", "Flower, Notebooks, Make-up", ) 第二个是:
data_1 <- tribble(
~shop_name, ~sub_category,
"A", "Blu-ray, DVDs, CD",
"B", "Sneakers, Make-up, Blu-ray",
"C", "Camera, Optic, DVDs",
"D", "Flower, Notebooks, Make-up",
)
第二个是:
data_2 <- tribble(
~sub_category, ~main_category,
"Blu-ray", "Electronic",
"DVDs", "Electronic",
"CD", "Electronic",
"Sneakers", "Fashion",
"Make-up", "Fashion",
"Camera", "Electronic",
"Optic", "Health",
"Flower", "Home",
)
现在,我想执行左连接以在data_1中添加主类别。最终数据应如下所示:
merged_data <- tribble(
~shop_name, ~sub_category, ~main_category,
"A", "Blu-ray, DVDs, CD", "Electronic, Electronic, Electronic",
"B", "Sneakers, Make-up, Blu-ray", "Fashion, Fashion, Electronic",
"C", "Camera, Optic", "Electronic, Health",
"D", "Flower", "Home"
)
我的代码如下所示:
data3 <- left_join(data_1, data_2, by = "sub_category")
但不知何故,主_类别返回了NA。有人能帮我吗?提前感谢。您基本上需要从数据_1中拆分子类别字符串,然后加入,即
data_1 %>%
separate_rows(sub_category, sep = ', ') %>%
left_join(data_2, by = 'sub_category') %>%
group_by(shop_name) %>%
summarise_all(funs(toString))
这就给了,
如果您有更多的列,则需要将Summary_all替换为Summary_atvarscontains'category',funstoString以下是两个数据表解决方案,以供记录:
代码
您可以直接将数据_1的子类别中的每个字符串与其对应的数据_2的主类别相匹配:
或者,将数据_1转换为长格式,并与子_类别上的数据_2合并:
结果
谢谢你的评论。关于代码,只有一个问题。除了这两个类别,我有不同的专栏。在这种情况下,summary_allfunstoString不起作用。有没有办法只看两栏?编辑了我的答案。现在请看一下。您可以在contains中更改模式以适应您的情况
# A tibble: 4 x 3
shop_name sub_category main_category
<chr> <chr> <chr>
1 A Blu-ray, DVDs, CD Electronic, Electronic, Electronic
2 B Sneakers, Make-up, Blu-ray Fashion, Fashion, Electronic
3 C Camera, Optic, DVDs Electronic, Health, Electronic
4 D Flower, Notebooks, Make-up Home, NA, Fashion
require(data.table); setDT(data_1); setDT(data_2)
data_1[, main_category := sapply(sub_category, function(x){
str = unlist(strsplit(x, ', '))
match = as.numeric(sapply(str, function(x) data_2[, which(x == sub_category)]))
data_2[match, paste(main_category, collapse = ', ')]
})]
data_1 = data_1[, .(sub_category = unlist(strsplit(sub_category, ', '))), keyby = shop_name] # data_1 to long format
dt_final = merge(data_1, data_2, by = 'sub_category', all = T) # Join data_1 and data_2 on sub_category
dt_final = dt_final[, lapply(.SD, function(x) paste(x, collapse = ', ')), keyby = shop_name]
> data_1
shop_name sub_category main_category
1: A Blu-ray, DVDs, CD Electronic, Electronic, Electronic
2: B Sneakers, Make-up, Blu-ray Fashion, Fashion, Electronic
3: C Camera, Optic, DVDs Electronic, Health, Electronic
4: D Flower, Notebooks, Make-up Home, NA, Fashion
> dt_final
shop_name sub_category main_category
1: A Blu-ray, CD, DVDs Electronic, Electronic, Electronic
2: B Blu-ray, Make-up, Sneakers Electronic, Fashion, Fashion
3: C Camera, DVDs, Optic Electronic, Electronic, Health
4: D Flower, Make-up, Notebooks Home, Fashion, NA