是否有一个R函数用于组合表中的两个复制位点列以显示物种的存在和缺失?
我有以下DF(示例数据,我的实际数据集是96列): 类表示有机体的系统发育类(字母的每个复制品是不同的物种,但属于同一类)。1A和1B是来自同一场地的样品。我想结合来自每个站点的两个样本的两个存在/不存在数据(分别为1/0),并将该站点上的类的“存在”数量相加。我的df现在看起来像这样:是否有一个R函数用于组合表中的两个复制位点列以显示物种的存在和缺失?,r,R,我有以下DF(示例数据,我的实际数据集是96列): 类表示有机体的系统发育类(字母的每个复制品是不同的物种,但属于同一类)。1A和1B是来自同一场地的样品。我想结合来自每个站点的两个样本的两个存在/不存在数据(分别为1/0),并将该站点上的类的“存在”数量相加。我的df现在看起来像这样: Sample Class Number of Species Present 1 A 3 1 B 2 1 C
Sample Class Number of Species Present
1 A 3
1 B 2
1 C 0
1 D 1
2 A 2
2 B 3
2 C 3
2 D 3
比如说,,
在原始df中,您可以看到样本2A中根本不存在C类物种,但样本2B中存在C类物种。因此,输出df记录物种C在样本2中出现3次。此外,B类有3种不同的物种出现在2A和2B中,但由于它们是输出df记录样本2的复制品,因此存在3种B类物种
当我被难倒的时候,任何帮助都会被感激的
干杯 您只需要对初始的
df
进行一点格式化(因为您的colname实际上包含的信息不仅仅是一个“name”)
您可以尝试以下方法:
代码
df %>%
#long format with column for sample and species
pivot_longer(-class,
names_pattern = "(\\d*)([A-Z]*)",
names_to = c("sample", "species")) %>%
#creating two columns (for each species one)
pivot_wider(c(class, sample),
names_from = species,
values_from = value,
values_fn = list) %>%
unnest(c(A, B)) %>%
#creating a presence column - 1 when any species (column A and B) is presence
mutate(presence = ifelse(A == 1 | B == 1, 1, 0)) %>%
#sum prescence by sample and class
group_by(sample, class) %>%
summarise(Number = sum(presence))
输出
# A tibble: 24 x 3
# Groups: sample [6]
sample class Number
<chr> <chr> <dbl>
1 1 A 3
2 1 B 2
3 1 C 0
4 1 D 1
5 2 A 2
6 2 B 3
7 2 C 3
8 2 D 3
9 3 A 0
10 3 B 0
# ... with 14 more rows
#一个tible:24 x 3
#分组:样本[6]
样本类别编号
1A3
2 1 B 2
31 c0
4 1 D 1
5 2 A 2
62B3
7 2 C 3
8 2 D 3
93A0
103B0
# ... 还有14行
你好,贾罗德,欢迎来到SO。请始终在问题正文中发布数据的实际代码或示例数据。您可以使用dput(您的数据)
# A tibble: 24 x 3
# Groups: class [4]
class sample presence
<chr> <chr> <dbl>
1 A 1 3
2 B 1 4
3 C 1 0
4 D 1 1
5 A 2 4
6 B 2 6
7 C 2 3
8 D 2 3
9 A 3 0
10 B 3 0
structure(list(class = c("A", "B", "C", "D", "A", "B", "C", "D",
"A", "B", "C", "D"), `1A` = c(0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0), `1B` = c(1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1), `2A` = c(0,
1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0), `2B` = c(0, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1), `3A` = c(0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1,
1), `3B` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `4A` = c(0,
0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1), `4B` = c(1, 1, 0, 0, 1, 1,
0, 0, 1, 1, 0, 0), `5A` = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1), `5B` = c(1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0), `6A` = c(1,
0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0), `6B` = c(1, 1, 0, 0, 1, 1,
0, 0, 1, 1, 0, 0)), class = c("spec_tbl_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -12L), spec = structure(list(
cols = list(class = structure(list(), class = c("collector_character",
"collector")), `1A` = structure(list(), class = c("collector_double",
"collector")), `1B` = structure(list(), class = c("collector_double",
"collector")), `2A` = structure(list(), class = c("collector_double",
"collector")), `2B` = structure(list(), class = c("collector_double",
"collector")), `3A` = structure(list(), class = c("collector_double",
"collector")), `3B` = structure(list(), class = c("collector_double",
"collector")), `4A` = structure(list(), class = c("collector_double",
"collector")), `4B` = structure(list(), class = c("collector_double",
"collector")), `5A` = structure(list(), class = c("collector_double",
"collector")), `5B` = structure(list(), class = c("collector_double",
"collector")), `6A` = structure(list(), class = c("collector_double",
"collector")), `6B` = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
df %>%
#long format with column for sample and species
pivot_longer(-class,
names_pattern = "(\\d*)([A-Z]*)",
names_to = c("sample", "species")) %>%
#creating two columns (for each species one)
pivot_wider(c(class, sample),
names_from = species,
values_from = value,
values_fn = list) %>%
unnest(c(A, B)) %>%
#creating a presence column - 1 when any species (column A and B) is presence
mutate(presence = ifelse(A == 1 | B == 1, 1, 0)) %>%
#sum prescence by sample and class
group_by(sample, class) %>%
summarise(Number = sum(presence))
# A tibble: 24 x 3
# Groups: sample [6]
sample class Number
<chr> <chr> <dbl>
1 1 A 3
2 1 B 2
3 1 C 0
4 1 D 1
5 2 A 2
6 2 B 3
7 2 C 3
8 2 D 3
9 3 A 0
10 3 B 0
# ... with 14 more rows