是否有一个R函数用于组合表中的两个复制位点列以显示物种的存在和缺失?

是否有一个R函数用于组合表中的两个复制位点列以显示物种的存在和缺失?,r,R,我有以下DF(示例数据,我的实际数据集是96列): 类表示有机体的系统发育类(字母的每个复制品是不同的物种,但属于同一类)。1A和1B是来自同一场地的样品。我想结合来自每个站点的两个样本的两个存在/不存在数据(分别为1/0),并将该站点上的类的“存在”数量相加。我的df现在看起来像这样: Sample Class Number of Species Present 1 A 3 1 B 2 1 C

我有以下DF(示例数据,我的实际数据集是96列):

类表示有机体的系统发育类(字母的每个复制品是不同的物种,但属于同一类)。1A和1B是来自同一场地的样品。我想结合来自每个站点的两个样本的两个存在/不存在数据(分别为1/0),并将该站点上的类的“存在”数量相加。我的df现在看起来像这样:

  Sample Class  Number of Species Present  
      1     A     3  
      1     B     2  
      1     C     0  
      1     D     1  
      2     A     2  
      2     B     3  
      2     C     3  
      2     D     3
比如说,, 在原始df中,您可以看到样本2A中根本不存在C类物种,但样本2B中存在C类物种。因此,输出df记录物种C在样本2中出现3次。此外,B类有3种不同的物种出现在2A和2B中,但由于它们是输出df记录样本2的复制品,因此存在3种B类物种

当我被难倒的时候,任何帮助都会被感激的


干杯

您只需要对初始的
df
进行一点格式化(因为您的colname实际上包含的信息不仅仅是一个“name”)

您可以尝试以下方法:

代码

df %>%
#long format with column for sample and species
  pivot_longer(-class,
               names_pattern = "(\\d*)([A-Z]*)",
               names_to = c("sample", "species")) %>%
#creating two columns (for each species one) 
  pivot_wider(c(class, sample),
              names_from = species, 
              values_from = value,
              values_fn = list) %>%
  unnest(c(A, B)) %>%
#creating a presence column - 1 when any species (column  A and B) is presence
  mutate(presence = ifelse(A  == 1 | B == 1, 1, 0)) %>%
#sum prescence by sample and class
  group_by(sample, class) %>%
  summarise(Number = sum(presence))
输出

# A tibble: 24 x 3
# Groups:   sample [6]
   sample class Number
   <chr>  <chr>  <dbl>
 1 1      A          3
 2 1      B          2
 3 1      C          0
 4 1      D          1
 5 2      A          2
 6 2      B          3
 7 2      C          3
 8 2      D          3
 9 3      A          0
10 3      B          0
# ... with 14 more rows
#一个tible:24 x 3
#分组:样本[6]
样本类别编号
1A3
2 1 B 2
31 c0
4 1 D 1
5 2 A 2
62B3
7 2 C 3
8 2 D 3
93A0
103B0
# ... 还有14行

你好,贾罗德,欢迎来到SO。请始终在问题正文中发布数据的实际代码或示例数据。您可以使用
dput(您的数据)
# A tibble: 24 x 3
# Groups:   class [4]
   class sample presence
   <chr> <chr>     <dbl>
 1 A     1             3
 2 B     1             4
 3 C     1             0
 4 D     1             1
 5 A     2             4
 6 B     2             6
 7 C     2             3
 8 D     2             3
 9 A     3             0
10 B     3             0

structure(list(class = c("A", "B", "C", "D", "A", "B", "C", "D", 
"A", "B", "C", "D"), `1A` = c(0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 
0), `1B` = c(1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1), `2A` = c(0, 
1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0), `2B` = c(0, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1), `3A` = c(0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 
1), `3B` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `4A` = c(0, 
0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1), `4B` = c(1, 1, 0, 0, 1, 1, 
0, 0, 1, 1, 0, 0), `5A` = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1), `5B` = c(1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0), `6A` = c(1, 
0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0), `6B` = c(1, 1, 0, 0, 1, 1, 
0, 0, 1, 1, 0, 0)), class = c("spec_tbl_df", "tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -12L), spec = structure(list(
    cols = list(class = structure(list(), class = c("collector_character", 
    "collector")), `1A` = structure(list(), class = c("collector_double", 
    "collector")), `1B` = structure(list(), class = c("collector_double", 
    "collector")), `2A` = structure(list(), class = c("collector_double", 
    "collector")), `2B` = structure(list(), class = c("collector_double", 
    "collector")), `3A` = structure(list(), class = c("collector_double", 
    "collector")), `3B` = structure(list(), class = c("collector_double", 
    "collector")), `4A` = structure(list(), class = c("collector_double", 
    "collector")), `4B` = structure(list(), class = c("collector_double", 
    "collector")), `5A` = structure(list(), class = c("collector_double", 
    "collector")), `5B` = structure(list(), class = c("collector_double", 
    "collector")), `6A` = structure(list(), class = c("collector_double", 
    "collector")), `6B` = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1L), class = "col_spec"))

df %>%
#long format with column for sample and species
  pivot_longer(-class,
               names_pattern = "(\\d*)([A-Z]*)",
               names_to = c("sample", "species")) %>%
#creating two columns (for each species one) 
  pivot_wider(c(class, sample),
              names_from = species, 
              values_from = value,
              values_fn = list) %>%
  unnest(c(A, B)) %>%
#creating a presence column - 1 when any species (column  A and B) is presence
  mutate(presence = ifelse(A  == 1 | B == 1, 1, 0)) %>%
#sum prescence by sample and class
  group_by(sample, class) %>%
  summarise(Number = sum(presence))
# A tibble: 24 x 3
# Groups:   sample [6]
   sample class Number
   <chr>  <chr>  <dbl>
 1 1      A          3
 2 1      B          2
 3 1      C          0
 4 1      D          1
 5 2      A          2
 6 2      B          3
 7 2      C          3
 8 2      D          3
 9 3      A          0
10 3      B          0
# ... with 14 more rows