基于使用dplyr的重复值拆分基于因子级总结的数据帧

基于使用dplyr的重复值拆分基于因子级总结的数据帧,r,R,我有这样一个数据帧: df<- data.frame(region= c("1","1","1","1","1","1","1","1","2","2"), loc=c("A","A","A","B","B","B","C","D","E","F"), sp1= c("a","a","b","a","e","e","e","e","a","a"), sp2= c("b","b","c","b","f","f","f","f","b","b"), inter= c("a_b","

我有这样一个数据帧:

df<- data.frame(region= c("1","1","1","1","1","1","1","1","2","2"), 
  loc=c("A","A","A","B","B","B","C","D","E","F"), sp1= 
c("a","a","b","a","e","e","e","e","a","a"), sp2= 
c("b","b","c","b","f","f","f","f","b","b"), inter= 
c("a_b","a_b","b_c","a_b","e_f","e_f","e_f","e_f","a_b","a_b"))
我尝试了以下方法:

df %>%
group_by(region,inter) %>%
filter(duplicated(inter))

您可以筛选到每个
区域
inter
组合中有多行的组,然后使用
n_distinct
统计唯一位置的数量。我将物种变量作为组包含在数据集中

df %>%
     group_by(region, sp1, sp2, inter) %>%
     filter(n() > 1) %>%
     summarise( n = n_distinct(loc) )

# A tibble: 3 x 5
# Groups:   region, sp1, sp2 [?]
  region    sp1    sp2  inter     n
  <fctr> <fctr> <fctr> <fctr> <int>
1      1      a      b    a_b     2
2      1      e      f    e_f     3
3      2      a      b    a_b     2
df%>%
分组依据(地区、sp1、sp2、内部)%>%
过滤器(n()>1)%>%
总结(n=n_(loc))
#一个tibble:3x5
#组:区域、sp1、sp2[?]
区域sp1 sp2 INTERN
1 a b a_b 2
2 1 e f eu f 3
3 2 a b a_b 2
df %>%
     group_by(region, sp1, sp2, inter) %>%
     filter(n() > 1) %>%
     summarise( n = n_distinct(loc) )

# A tibble: 3 x 5
# Groups:   region, sp1, sp2 [?]
  region    sp1    sp2  inter     n
  <fctr> <fctr> <fctr> <fctr> <int>
1      1      a      b    a_b     2
2      1      e      f    e_f     3
3      2      a      b    a_b     2