基于使用dplyr的重复值拆分基于因子级总结的数据帧_R

基于使用dplyr的重复值拆分基于因子级总结的数据帧

基于使用dplyr的重复值拆分基于因子级总结的数据帧,r,R,我有这样一个数据帧： df<- data.frame(region= c("1","1","1","1","1","1","1","1","2","2"), loc=c("A","A","A","B","B","B","C","D","E","F"), sp1= c("a","a","b","a","e","e","e","e","a","a"), sp2= c("b","b","c","b","f","f","f","f","b","b"), inter= c("a_b","

我有这样一个数据帧：

df<- data.frame(region= c("1","1","1","1","1","1","1","1","2","2"), 
  loc=c("A","A","A","B","B","B","C","D","E","F"), sp1= 
c("a","a","b","a","e","e","e","e","a","a"), sp2= 
c("b","b","c","b","f","f","f","f","b","b"), inter= 
c("a_b","a_b","b_c","a_b","e_f","e_f","e_f","e_f","a_b","a_b"))

我尝试了以下方法：

df %>%
group_by(region,inter) %>%
filter(duplicated(inter))

您可以筛选到每个

区域

和

inter

组合中有多行的组，然后使用

n_distinct

统计唯一位置的数量。我将物种变量作为组包含在数据集中

df %>%
     group_by(region, sp1, sp2, inter) %>%
     filter(n() > 1) %>%
     summarise( n = n_distinct(loc) )

# A tibble: 3 x 5
# Groups:   region, sp1, sp2 [?]
  region    sp1    sp2  inter     n
  <fctr> <fctr> <fctr> <fctr> <int>
1      1      a      b    a_b     2
2      1      e      f    e_f     3
3      2      a      b    a_b     2

df%>%
分组依据（地区、sp1、sp2、内部）%>%
过滤器（n（）>1）%>%
总结（n=n_（loc））
#一个tibble:3x5
#组：区域、sp1、sp2[？]
区域sp1 sp2 INTERN
1 a b a_b 2
2 1 e f eu f 3
3 2 a b a_b 2

df %>%
     group_by(region, sp1, sp2, inter) %>%
     filter(n() > 1) %>%
     summarise( n = n_distinct(loc) )

# A tibble: 3 x 5
# Groups:   region, sp1, sp2 [?]
  region    sp1    sp2  inter     n
  <fctr> <fctr> <fctr> <fctr> <int>
1      1      a      b    a_b     2
2      1      e      f    e_f     3
3      2      a      b    a_b     2