R 组之间共享的计数值_R_Count_Aggregate_Shared

R 组之间共享的计数值

R 组之间共享的计数值,r,count,aggregate,shared,R,Count,Aggregate,Shared,以下是一些虚拟数据： class<-c("ab","ab","ad","ab","ab","ad","ab","ab","ad","ab","ad","ab","av") otu<-c("ab","ac","ad","ab","ac","ad","ab","ac","ad","ab","ad","ac","av") value<-c(0,1,12,13,300,1,2,3,4,0,0,2,4) type<-c("b","c","d","a","b","c","d","d"

以下是一些虚拟数据：

class<-c("ab","ab","ad","ab","ab","ad","ab","ab","ad","ab","ad","ab","av")
otu<-c("ab","ac","ad","ab","ac","ad","ab","ac","ad","ab","ad","ac","av")
value<-c(0,1,12,13,300,1,2,3,4,0,0,2,4)
type<-c("b","c","d","a","b","c","d","d","d","c","b","a","a")
location<-c("b","c","d","a","b","d","d","d","d","c","b","a","a")
datafr1<-data.frame(class,otu,value,type,location)

我有一个更大的数据帧，并尝试使用dplyr的建议，但我用完了RAM，所以我不知道它是否有效

下面@Akron提供的解决方案不计算丰度为0的情况，但它不能从该组中的其他复制中去除OTU。如果任何OTU的丰度为0，则该组之间不共享，我需要将其从丰度和OTU.freq计算中完全贴现

library(dplyr)    
so_many_shared3<-datafr1 %>% 
      group_by(class, location, type) %>% 
      summarise(abundance=sum(value)/sum(datafr1[['value']])*100, otu.freq=sum(value !=0))


   class location type  abundance  otu.freq
1    ab        a    a  4.3859649     2
2    ab        b    b 87.7192982     1
3    ab        c    c  0.2923977     1
4    ab        d    d  1.4619883     2
5    ad        b    b  0.0000000     0
6    ad        d    c  0.2923977     1
7    ad        d    d  4.6783626     2
8    av        a    a  1.1695906     1

库（dplyr）
所以很多人分享了3%
分组依据（类别、位置、类型）%>%
总结（丰度=总和（值）/总和（数据FR1[['value']]）*100，otu.freq=总和（值！=0））
类位置类型丰度otu.freq
1 ab a 4.3859649 2
2 ab 87.7192982 1
3 ab c 0.2923977 1
4 ab d d 1.4619883 2
5 ad b 0.0000000
6 ad d c 0.2923977 1
7 ad d d 4.6783626 2
8 av a 1.1695906 1

聚合函数中存在错误。如果你想计算otu的频率，你应该把otu放在“~”号之前。之后，您可以使用

plyr

库中的

join

函数合并它们

abund_shared_freq<-aggregate(otu~class+location+type,datafr1,length)
library(plyr)
join(abund_shared, abund_shared_freq, by=c("class", "location","type"), type="left")

您可以使用

data.table

library(data.table)
val = sum(datafr1$value)
setDT(datafr1)[order(class,type), list(abundance = 
               sum(value)/val*100, otu.freq = .N), 
               by = .(class, location, type)]

或者使用

dplyr

library(dplyr)
datafr1 %>% 
     group_by(class, location, type) %>% 
     summarise(abundance=sum(value)/sum(datafr1[['value']])*100, otu.freq=n())
 #   class location type  abundance otu.freq
 #1    ab        a    a  4.3859649        2
 #2    ab        b    b 87.7192982        2
 #3    ab        c    c  0.2923977        2
 #4    ab        d    d  1.4619883        2
 #5    ad        b    b  0.0000000        1
 #6    ad        d    c  0.2923977        1
 #7    ad        d    d  4.6783626        2
 #8    av        a    a  1.1695906        1

更新根据新标准，我正在更新OP（@K.Brannen）建议的代码

更新2 基于更新的预期结果

  datafr1 %>%
       filter(value!=0) %>% 
       group_by(location, type) %>% 
       mutate(value1=sum(value)) %>% 
       group_by(class, add=TRUE) %>% 
       summarise(abundance=round(100*sum(value)/unique(value1)), 
                         otu.freq=n())
  #    location type class abundance otu.freq
  #1        a    a    ab        79        2
  #2        a    a    av        21        1
  #3        b    b    ab       100        1
  #4        c    c    ab       100        1
  #5        d    c    ad       100        1
  #6        d    d    ab        24        2
  #7        d    d    ad        76        2

这两者都提供了输出，但dplyr版本更快。但是，有些类的丰度为0，而otu.freq不是0（也显示在dplyr输出中）。我想计算共享的OTU的数量，值为0表示OTU不在组之间共享。所以\u many\u shared%group\u by（Class，location，type）%>%summary（丰度=总和（值）/sum（sou many\u melt['value']]]*100，OTU.freq=sum（值！=0））这样做吗？我想确保它正在做我认为应该做的事。@K.Brannen我不在。我将很快检查您的更新。@K.Brannen但您的代码将第7行otu.freq列为

，而不是预期的

。预期中是否有输入错误？好的，因此我发现了其他一些问题。如果OTU在位置和类型的任何复制中的值为0，则不会在所有这些样本之间共享。有没有办法去掉在任何复制中都为0的OTU，而不计算该OTU的丰度或OTU.freq？

library(dplyr)
datafr1 %>% 
     group_by(class, location, type) %>% 
     summarise(abundance=sum(value)/sum(datafr1[['value']])*100, otu.freq=n())
 #   class location type  abundance otu.freq
 #1    ab        a    a  4.3859649        2
 #2    ab        b    b 87.7192982        2
 #3    ab        c    c  0.2923977        2
 #4    ab        d    d  1.4619883        2
 #5    ad        b    b  0.0000000        1
 #6    ad        d    c  0.2923977        1
 #7    ad        d    d  4.6783626        2
 #8    av        a    a  1.1695906        1

  datafr1 %>%
       group_by(class, location, type) %>% 
       summarise(abundance=sum(value)/sum(datafr1[['value']])*100, 
             otu.freq=sum(value !=0))

  datafr1 %>%
       filter(value!=0) %>% 
       group_by(location, type) %>% 
       mutate(value1=sum(value)) %>% 
       group_by(class, add=TRUE) %>% 
       summarise(abundance=round(100*sum(value)/unique(value1)), 
                         otu.freq=n())
  #    location type class abundance otu.freq
  #1        a    a    ab        79        2
  #2        a    a    av        21        1
  #3        b    b    ab       100        1
  #4        c    c    ab       100        1
  #5        d    c    ad       100        1
  #6        d    d    ab        24        2
  #7        d    d    ad        76        2