基于使用dplyr的重复值拆分基于因子级总结的数据帧
我有这样一个数据帧:基于使用dplyr的重复值拆分基于因子级总结的数据帧,r,R,我有这样一个数据帧: df<- data.frame(region= c("1","1","1","1","1","1","1","1","2","2"), loc=c("A","A","A","B","B","B","C","D","E","F"), sp1= c("a","a","b","a","e","e","e","e","a","a"), sp2= c("b","b","c","b","f","f","f","f","b","b"), inter= c("a_b","
df<- data.frame(region= c("1","1","1","1","1","1","1","1","2","2"),
loc=c("A","A","A","B","B","B","C","D","E","F"), sp1=
c("a","a","b","a","e","e","e","e","a","a"), sp2=
c("b","b","c","b","f","f","f","f","b","b"), inter=
c("a_b","a_b","b_c","a_b","e_f","e_f","e_f","e_f","a_b","a_b"))
我尝试了以下方法:
df %>%
group_by(region,inter) %>%
filter(duplicated(inter))
您可以筛选到每个
区域
和inter
组合中有多行的组,然后使用n_distinct
统计唯一位置的数量。我将物种变量作为组包含在数据集中
df %>%
group_by(region, sp1, sp2, inter) %>%
filter(n() > 1) %>%
summarise( n = n_distinct(loc) )
# A tibble: 3 x 5
# Groups: region, sp1, sp2 [?]
region sp1 sp2 inter n
<fctr> <fctr> <fctr> <fctr> <int>
1 1 a b a_b 2
2 1 e f e_f 3
3 2 a b a_b 2
df%>%
分组依据(地区、sp1、sp2、内部)%>%
过滤器(n()>1)%>%
总结(n=n_(loc))
#一个tibble:3x5
#组:区域、sp1、sp2[?]
区域sp1 sp2 INTERN
1 a b a_b 2
2 1 e f eu f 3
3 2 a b a_b 2
df %>%
group_by(region, sp1, sp2, inter) %>%
filter(n() > 1) %>%
summarise( n = n_distinct(loc) )
# A tibble: 3 x 5
# Groups: region, sp1, sp2 [?]
region sp1 sp2 inter n
<fctr> <fctr> <fctr> <fctr> <int>
1 1 a b a_b 2
2 1 e f e_f 3
3 2 a b a_b 2