R 选择分组数据框中具有公共ID的行
我正在寻找以下问题的更简单的解决方案。以下是我的设置:R 选择分组数据框中具有公共ID的行,r,dplyr,tidyverse,R,Dplyr,Tidyverse,我正在寻找以下问题的更简单的解决方案。以下是我的设置: test <- tibble::tribble( ~group_name, ~id_name, ~varA, ~varB, "groupA", "id_1", 1, "a", "groupA", "id_2", 4, "f", "groupA", "id_3", 5, "g", "groupA", "id_4", 6, "x",
test <- tibble::tribble(
~group_name, ~id_name, ~varA, ~varB,
"groupA", "id_1", 1, "a",
"groupA", "id_2", 4, "f",
"groupA", "id_3", 5, "g",
"groupA", "id_4", 6, "x",
"groupA", "id_4", 6, "h",
"groupB", "id_1", 2, "s",
"groupB", "id_2", 13, "y",
"groupB", "id_4", 14, "t",
"groupC", "id_1", 3, "d",
"groupC", "id_2", 7, "j",
"groupC", "id_3", 8, "k",
"groupC", "id_4", 9, "l",
"groupC", "id_5", 0, "o",
"groupC", "id_6", 12, "u"
)
(删除至少一个组中缺少ID的行)。理想情况下,我不希望输出的列在末尾连接。我希望“简单地”删除任何一个组中缺少的行,但保持数据帧的形状
我知道我可以从每个组中提取所有ID,然后将它们全部相交,以获得所有组中存在的唯一ID的列表,然后过滤主数据框中的这些ID。但这听起来像是很多工作;-)
任何提示都将不胜感激。在R基中,我们可以通过
组名查找公共id
然后子集
subset(test, id_name %in% Reduce(intersect, split(id_name, group_name)))
# group_name id_name varA varB
# <chr> <chr> <dbl> <chr>
# 1 groupA id_1 1 a
# 2 groupA id_2 4 f
# 3 groupA id_4 6 x
# 4 groupA id_4 6 h
# 5 groupB id_1 2 s
# 6 groupB id_2 13 y
# 7 groupB id_4 14 t
# 8 groupC id_1 3 d
# 9 groupC id_2 7 j
#10 groupC id_4 9 l
您可以按组名称将id\u名称的出现情况制成表格:
table(test$group_name,test$id_name)
如果每个组中都有id_name,则我们希望列的所有项都大于0。我们可以使用>0和colMeans的组合来简化此逻辑:
keep = names(which(colMeans(table(test$group_name,test$id_name)>0)==1))
使用此选项:
test[test$id_name %in% keep,]
太快了!太神了这些嵌入式管道确实很有用。是的,这是我认为可以做到的——如果我想让任意数量的组共用ID,这个解决方案很有用。谢谢
keep = names(which(colMeans(table(test$group_name,test$id_name)>0)==1))
test[test$id_name %in% keep,]