R 组之间共享的计数值

R 组之间共享的计数值,r,count,aggregate,shared,R,Count,Aggregate,Shared,以下是一些虚拟数据: class<-c("ab","ab","ad","ab","ab","ad","ab","ab","ad","ab","ad","ab","av") otu<-c("ab","ac","ad","ab","ac","ad","ab","ac","ad","ab","ad","ac","av") value<-c(0,1,12,13,300,1,2,3,4,0,0,2,4) type<-c("b","c","d","a","b","c","d","d"

以下是一些虚拟数据:

class<-c("ab","ab","ad","ab","ab","ad","ab","ab","ad","ab","ad","ab","av")
otu<-c("ab","ac","ad","ab","ac","ad","ab","ac","ad","ab","ad","ac","av")
value<-c(0,1,12,13,300,1,2,3,4,0,0,2,4)
type<-c("b","c","d","a","b","c","d","d","d","c","b","a","a")
location<-c("b","c","d","a","b","d","d","d","d","c","b","a","a")
datafr1<-data.frame(class,otu,value,type,location)
我有一个更大的数据帧,并尝试使用dplyr的建议,但我用完了RAM,所以我不知道它是否有效

下面@Akron提供的解决方案不计算丰度为0的情况,但它不能从该组中的其他复制中去除OTU。如果任何OTU的丰度为0,则该组之间不共享,我需要将其从丰度和OTU.freq计算中完全贴现

library(dplyr)    
so_many_shared3<-datafr1 %>% 
      group_by(class, location, type) %>% 
      summarise(abundance=sum(value)/sum(datafr1[['value']])*100, otu.freq=sum(value !=0))


   class location type  abundance  otu.freq
1    ab        a    a  4.3859649     2
2    ab        b    b 87.7192982     1
3    ab        c    c  0.2923977     1
4    ab        d    d  1.4619883     2
5    ad        b    b  0.0000000     0
6    ad        d    c  0.2923977     1
7    ad        d    d  4.6783626     2
8    av        a    a  1.1695906     1
库(dplyr)
所以很多人分享了3%
分组依据(类别、位置、类型)%>%
总结(丰度=总和(值)/总和(数据FR1[['value']])*100,otu.freq=总和(值!=0))
类位置类型丰度otu.freq
1 ab a 4.3859649 2
2 ab 87.7192982 1
3 ab c 0.2923977 1
4 ab d d 1.4619883 2
5 ad b 0.0000000
6 ad d c 0.2923977 1
7 ad d d 4.6783626 2
8 av a 1.1695906 1

聚合函数中存在错误。如果你想计算otu的频率,你应该把otu放在“~”号之前。之后,您可以使用
plyr
库中的
join
函数合并它们

abund_shared_freq<-aggregate(otu~class+location+type,datafr1,length)
library(plyr)
join(abund_shared, abund_shared_freq, by=c("class", "location","type"), type="left")

您可以使用
data.table

library(data.table)
val = sum(datafr1$value)
setDT(datafr1)[order(class,type), list(abundance = 
               sum(value)/val*100, otu.freq = .N), 
               by = .(class, location, type)]
或者使用
dplyr

library(dplyr)
datafr1 %>% 
     group_by(class, location, type) %>% 
     summarise(abundance=sum(value)/sum(datafr1[['value']])*100, otu.freq=n())
 #   class location type  abundance otu.freq
 #1    ab        a    a  4.3859649        2
 #2    ab        b    b 87.7192982        2
 #3    ab        c    c  0.2923977        2
 #4    ab        d    d  1.4619883        2
 #5    ad        b    b  0.0000000        1
 #6    ad        d    c  0.2923977        1
 #7    ad        d    d  4.6783626        2
 #8    av        a    a  1.1695906        1
更新 根据新标准,我正在更新OP(@K.Brannen)建议的代码

更新2 基于更新的预期结果

  datafr1 %>%
       filter(value!=0) %>% 
       group_by(location, type) %>% 
       mutate(value1=sum(value)) %>% 
       group_by(class, add=TRUE) %>% 
       summarise(abundance=round(100*sum(value)/unique(value1)), 
                         otu.freq=n())
  #    location type class abundance otu.freq
  #1        a    a    ab        79        2
  #2        a    a    av        21        1
  #3        b    b    ab       100        1
  #4        c    c    ab       100        1
  #5        d    c    ad       100        1
  #6        d    d    ab        24        2
  #7        d    d    ad        76        2

这两者都提供了输出,但dplyr版本更快。但是,有些类的丰度为0,而otu.freq不是0(也显示在dplyr输出中)。我想计算共享的OTU的数量,值为0表示OTU不在组之间共享。所以\u many\u shared%group\u by(Class,location,type)%>%summary(丰度=总和(值)/sum(sou many\u melt['value']]]*100,OTU.freq=sum(值!=0))这样做吗?我想确保它正在做我认为应该做的事。@K.Brannen我不在。我将很快检查您的更新。@K.Brannen但您的代码将第7行otu.freq列为
2
,而不是预期的
1
。预期中是否有输入错误?好的,因此我发现了其他一些问题。如果OTU在位置和类型的任何复制中的值为0,则不会在所有这些样本之间共享。有没有办法去掉在任何复制中都为0的OTU,而不计算该OTU的丰度或OTU.freq?
library(dplyr)
datafr1 %>% 
     group_by(class, location, type) %>% 
     summarise(abundance=sum(value)/sum(datafr1[['value']])*100, otu.freq=n())
 #   class location type  abundance otu.freq
 #1    ab        a    a  4.3859649        2
 #2    ab        b    b 87.7192982        2
 #3    ab        c    c  0.2923977        2
 #4    ab        d    d  1.4619883        2
 #5    ad        b    b  0.0000000        1
 #6    ad        d    c  0.2923977        1
 #7    ad        d    d  4.6783626        2
 #8    av        a    a  1.1695906        1
  datafr1 %>%
       group_by(class, location, type) %>% 
       summarise(abundance=sum(value)/sum(datafr1[['value']])*100, 
             otu.freq=sum(value !=0)) 
  datafr1 %>%
       filter(value!=0) %>% 
       group_by(location, type) %>% 
       mutate(value1=sum(value)) %>% 
       group_by(class, add=TRUE) %>% 
       summarise(abundance=round(100*sum(value)/unique(value1)), 
                         otu.freq=n())
  #    location type class abundance otu.freq
  #1        a    a    ab        79        2
  #2        a    a    av        21        1
  #3        b    b    ab       100        1
  #4        c    c    ab       100        1
  #5        d    c    ad       100        1
  #6        d    d    ab        24        2
  #7        d    d    ad        76        2