R 使用2个数据帧聚合列_R_Data.table

R 使用2个数据帧聚合列

R 使用2个数据帧聚合列,r,data.table,R,Data.table,我正在尝试使用R（data.table？）聚合给定条件下的多个列。。。我有一个数据框df1，其中列12:262包含每个样本（行）的物种丰度（每列）在另一个数据帧df2中，我有门、属等。。对于每个物种（行）我想聚合来自df1的所有列，它们的物种属于同一门（在df2中定义）。。。这有意义吗谢谢大家! 首先要做的是重塑df1。如果将数据从“宽”格式转换为“长”格式，则每个样本将有多行。然后，您可以通过species变量将其与第二个数据集合并。从这里开始，您还没有提供足够的详细信息来确切说明如

我正在尝试使用R（data.table？）聚合给定条件下的多个列。。。我有一个数据框df1，其中列12:262包含每个样本（行）的物种丰度（每列）

在另一个数据帧df2中，我有门、属等。。对于每个物种（行）

我想聚合来自df1的所有列，它们的物种属于同一门（在df2中定义）。。。这有意义吗

谢谢大家!

首先要做的是重塑df1。如果将数据从“宽”格式转换为“长”格式，则每个样本将有多行。然后，您可以通过

species

变量将其与第二个数据集合并。从这里开始，您还没有提供足够的详细信息来确切说明如何聚合数据，但我提供了两个简单的示例。您应该能够轻松地调整聚合代码，以包含您需要的任何内容

library(tidyr)
library(dplyr)

df1 <- data.frame(
  sample = c("sample1", "sample2", "sample3"),
  species1 = c(1, 47, 8),
  species2 = c(21, 36, 32))

df2 <- data.frame(
  species = c("species1", "species2"),
  phylum = c("X", "Y"),
  genus = c("A", "B")
)

df1_long <- tidyr::pivot_longer(df1, starts_with("species"),
                                names_to = "species", values_to = "abundance")

df3 <- dplyr::left_join(df1_long, df2, by = "species")

df3 %>% 
  group_by(phylum) %>% 
  summarize(total_abundance = sum(abundance), 
            avg_abundance = mean(abundance))

library（tidyr）
图书馆（dplyr）
df1A数据表
library(data.table)
dt1 <- data.table(
  sample = c("sample1", "sample2", "sample3"),
  species1 = c(1, 47, 8),
  species2 = c(21, 36, 32))

dt2 <- data.table(
  species = c("species1", "species2"),
  phylum = c("X", "Y"),
  genus = c("A", "B")
)

# long format 
dt1_long <-
  melt(
    dt1,
    id.vars = 'sample',
    variable.name = "species",
    value.name = "abundence"
  )
# join and group
dt1_long[dt2,on = "species",by = "phylum"]

库（data.table）
dt1根据您的示例，您能否澄清您的最终输出是什么样的？例如，您是否希望每个门所有物种/样本的总数/丰度？另外，您提到了带有问号的data.table
，并进行了标记-您喜欢data.table解决方案吗？谢谢！它起作用了。
library(tidyr)
library(dplyr)

df1 <- data.frame(
  sample = c("sample1", "sample2", "sample3"),
  species1 = c(1, 47, 8),
  species2 = c(21, 36, 32))

df2 <- data.frame(
  species = c("species1", "species2"),
  phylum = c("X", "Y"),
  genus = c("A", "B")
)

df1_long <- tidyr::pivot_longer(df1, starts_with("species"),
                                names_to = "species", values_to = "abundance")

df3 <- dplyr::left_join(df1_long, df2, by = "species")

df3 %>% 
  group_by(phylum) %>% 
  summarize(total_abundance = sum(abundance), 
            avg_abundance = mean(abundance))

library(data.table)
dt1 <- data.table(
  sample = c("sample1", "sample2", "sample3"),
  species1 = c(1, 47, 8),
  species2 = c(21, 36, 32))

dt2 <- data.table(
  species = c("species1", "species2"),
  phylum = c("X", "Y"),
  genus = c("A", "B")
)

# long format 
dt1_long <-
  melt(
    dt1,
    id.vars = 'sample',
    variable.name = "species",
    value.name = "abundence"
  )
# join and group
dt1_long[dt2,on = "species",by = "phylum"]