Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/c/66.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 选择分组数据框中具有公共ID的行_R_Dplyr_Tidyverse - Fatal编程技术网

R 选择分组数据框中具有公共ID的行

R 选择分组数据框中具有公共ID的行,r,dplyr,tidyverse,R,Dplyr,Tidyverse,我正在寻找以下问题的更简单的解决方案。以下是我的设置: test <- tibble::tribble( ~group_name, ~id_name, ~varA, ~varB, "groupA", "id_1", 1, "a", "groupA", "id_2", 4, "f", "groupA", "id_3", 5, "g", "groupA", "id_4", 6, "x",

我正在寻找以下问题的更简单的解决方案。以下是我的设置:

test <- tibble::tribble(
  ~group_name, ~id_name, ~varA, ~varB,
     "groupA",   "id_1",     1,   "a",
     "groupA",   "id_2",     4,   "f",
     "groupA",   "id_3",     5,   "g",
     "groupA",   "id_4",     6,   "x",
     "groupA",   "id_4",     6,   "h",
     "groupB",   "id_1",     2,   "s",
     "groupB",   "id_2",    13,   "y",
     "groupB",   "id_4",    14,   "t",
     "groupC",   "id_1",     3,   "d",
     "groupC",   "id_2",     7,   "j",
     "groupC",   "id_3",     8,   "k",
     "groupC",   "id_4",     9,   "l",
     "groupC",   "id_5",     0,   "o",
     "groupC",   "id_6",    12,   "u"
  )
(删除至少一个组中缺少ID的行)。理想情况下,我不希望输出的列在末尾连接。我希望“简单地”删除任何一个组中缺少的行,但保持数据帧的形状

我知道我可以从每个组中提取所有ID,然后将它们全部相交,以获得所有组中存在的唯一ID的列表,然后过滤主数据框中的这些ID。但这听起来像是很多工作;-)


任何提示都将不胜感激。

在R基中,我们可以通过
组名
查找公共
id
然后
子集

subset(test, id_name %in% Reduce(intersect, split(id_name, group_name)))

#   group_name id_name  varA varB 
#   <chr>      <chr>   <dbl> <chr>
# 1 groupA     id_1        1 a    
# 2 groupA     id_2        4 f    
# 3 groupA     id_4        6 x    
# 4 groupA     id_4        6 h    
# 5 groupB     id_1        2 s    
# 6 groupB     id_2       13 y    
# 7 groupB     id_4       14 t    
# 8 groupC     id_1        3 d    
# 9 groupC     id_2        7 j    
#10 groupC     id_4        9 l    

您可以按组名称将id\u名称的出现情况制成表格:

table(test$group_name,test$id_name)
如果每个组中都有id_name,则我们希望列的所有项都大于0。我们可以使用>0和colMeans的组合来简化此逻辑:

keep = names(which(colMeans(table(test$group_name,test$id_name)>0)==1))
使用此选项:

test[test$id_name %in% keep,]

太快了!太神了这些嵌入式管道确实很有用。是的,这是我认为可以做到的——如果我想让任意数量的组共用ID,这个解决方案很有用。谢谢
keep = names(which(colMeans(table(test$group_name,test$id_name)>0)==1))
test[test$id_name %in% keep,]