R 如何基于行分组进行两两划分
我有一个数据框,其制作方式如下:R 如何基于行分组进行两两划分,r,aggregate,R,Aggregate,我有一个数据框,其制作方式如下: df <- structure(list(celltype = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L), .Label = c("Bcells", "DendriticCells", "Macrophages", "Monocytes", "NKCells", "Neutrophils", "StemCells
df <- structure(list(celltype = structure(c(1L, 1L, 2L, 2L, 3L, 3L,
4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L), .Label = c("Bcells",
"DendriticCells", "Macrophages", "Monocytes", "NKCells", "Neutrophils",
"StemCells", "StromalCells", "abTcells", "gdTCells"), class = "factor"),
sample = c("SP ID control", "SP ID treated", "SP ID control",
"SP ID treated", "SP ID control", "SP ID treated", "SP ID control",
"SP ID treated", "SP ID control", "SP ID treated", "SP ID control",
"SP ID treated", "SP ID control", "SP ID treated", "SP ID control",
"SP ID treated", "SP ID control", "SP ID treated", "SP ID control",
"SP ID treated"), `mean(score)` = c(0.160953535029424, 0.155743474395545,
0.104788051104575, 0.125247035158472, -0.159665650045289,
-0.134662049979712, 0.196249441751866, 0.212256889027029,
0.0532668251890109, 0.0738264693971133, 0.151828478029596,
0.159941552142933, -0.14128323638966, -0.120556640790534,
0.196518649474078, 0.185264282171863, 0.0654641151966543,
0.0837989059507186, 0.145111577618456, 0.145448549866796)), .Names = c("celltype",
"sample", "mean(score)"), row.names = c(7L, 8L, 17L, 18L, 27L,
28L, 37L, 38L, 47L, 48L, 57L, 58L, 67L, 68L, 77L, 78L, 87L, 88L,
97L, 98L), class = "data.frame")
我想做的是在单元格类型分组中,根据处理过的和控制过的样本计算分数的划分
下面的Excel图像演示了该示例。我们要找的是最右边的专栏。例如,在B单元中,0.155/0.161=0.967
在一天结束时,我希望得到如下所示的df:
> df
celltype sample mean(score)
7 Bcells SP ID control 0.16095354
8 Bcells SP ID treated 0.15574347
17 DendriticCells SP ID control 0.10478805
18 DendriticCells SP ID treated 0.12524704
27 Macrophages SP ID control -0.15966565
28 Macrophages SP ID treated -0.13466205
37 Monocytes SP ID control 0.19624944
38 Monocytes SP ID treated 0.21225689
47 NKCells SP ID control 0.05326683
48 NKCells SP ID treated 0.07382647
57 Neutrophils SP ID control 0.15182848
58 Neutrophils SP ID treated 0.15994155
67 StemCells SP ID control -0.14128324
68 StemCells SP ID treated -0.12055664
77 StromalCells SP ID control 0.19651865
78 StromalCells SP ID treated 0.18526428
87 abTcells SP ID control 0.06546412
88 abTcells SP ID treated 0.08379891
97 gdTCells SP ID control 0.14511158
98 gdTCells SP ID treated 0.14544855
celltype sample Pairwise division
Bcells SP ID treated 0.967630031
DendriticCells SP ID treated 1.195241574
Macrophages SP ID treated 0.843400255
Monocytes SP ID treated 1.081566841
NKCells SP ID treated 1.385974647
Neutrophils SP ID treated 1.053435786
StemCells SP ID treated 0.853297563
StromalCells SP ID treated 0.942731303
abTcells SP ID treated 1.280073915
gdTCells SP ID treated 1.002322158
我如何才能在R中实现这一点?如果你传播到更广泛的形式,这是非常琐碎的:
library(tidyr)
library(dplyr)
df %>% spread(sample, `mean(score)`) %>%
mutate(pairwise_division = `SP ID treated` / `SP ID control`)
## celltype SP ID control SP ID treated pairwise_division
## 1 Bcells 0.16095354 0.15574347 0.9676300
## 2 DendriticCells 0.10478805 0.12524704 1.1952416
## 3 Macrophages -0.15966565 -0.13466205 0.8434003
## 4 Monocytes 0.19624944 0.21225689 1.0815668
## 5 NKCells 0.05326683 0.07382647 1.3859746
## 6 Neutrophils 0.15182848 0.15994155 1.0534358
## 7 StemCells -0.14128324 -0.12055664 0.8532976
## 8 StromalCells 0.19651865 0.18526428 0.9427313
## 9 abTcells 0.06546412 0.08379891 1.2800739
## 10 gdTCells 0.14511158 0.14544855 1.0023222
请注意,您可能应该修复列名,这样就不必经常使用反勾号
要精确获得所需结果,请聚集回long,过滤到刚刚处理过的行,然后选择所需的列:
df %>% spread(sample, `mean(score)`) %>%
mutate(pairwise_division = `SP ID treated` / `SP ID control`) %>%
gather(sample, `mean(score)`, starts_with('SP')) %>%
filter(sample == 'SP ID treated') %>%
select(celltype, sample, pairwise_division)
## celltype sample pairwise_division
## 1 Bcells SP ID treated 0.9676300
## 2 DendriticCells SP ID treated 1.1952416
## 3 Macrophages SP ID treated 0.8434003
## 4 Monocytes SP ID treated 1.0815668
## 5 NKCells SP ID treated 1.3859746
## 6 Neutrophils SP ID treated 1.0534358
## 7 StemCells SP ID treated 0.8532976
## 8 StromalCells SP ID treated 0.9427313
## 9 abTcells SP ID treated 1.2800739
## 10 gdTCells SP ID treated 1.0023222
如果愿意,可以在base和data.table中使用等效版本。或者走直达路线:
aggregate(cbind(pairwise_division = `mean(score)`) ~ celltype,
df[order(df$celltype, df$sample), ],
FUN = function(x){x[2]/x[1]})
## celltype pairwise_division
## 1 Bcells 0.9676300
## 2 DendriticCells 1.1952416
## 3 Macrophages 0.8434003
## 4 Monocytes 1.0815668
## 5 NKCells 1.3859746
## 6 Neutrophils 1.0534358
## 7 StemCells 0.8532976
## 8 StromalCells 0.9427313
## 9 abTcells 1.2800739
## 10 gdTCells 1.0023222
如果你传播到广泛的形式,它是相当琐碎的:
library(tidyr)
library(dplyr)
df %>% spread(sample, `mean(score)`) %>%
mutate(pairwise_division = `SP ID treated` / `SP ID control`)
## celltype SP ID control SP ID treated pairwise_division
## 1 Bcells 0.16095354 0.15574347 0.9676300
## 2 DendriticCells 0.10478805 0.12524704 1.1952416
## 3 Macrophages -0.15966565 -0.13466205 0.8434003
## 4 Monocytes 0.19624944 0.21225689 1.0815668
## 5 NKCells 0.05326683 0.07382647 1.3859746
## 6 Neutrophils 0.15182848 0.15994155 1.0534358
## 7 StemCells -0.14128324 -0.12055664 0.8532976
## 8 StromalCells 0.19651865 0.18526428 0.9427313
## 9 abTcells 0.06546412 0.08379891 1.2800739
## 10 gdTCells 0.14511158 0.14544855 1.0023222
请注意,您可能应该修复列名,这样就不必经常使用反勾号
要精确获得所需结果,请聚集回long,过滤到刚刚处理过的行,然后选择所需的列:
df %>% spread(sample, `mean(score)`) %>%
mutate(pairwise_division = `SP ID treated` / `SP ID control`) %>%
gather(sample, `mean(score)`, starts_with('SP')) %>%
filter(sample == 'SP ID treated') %>%
select(celltype, sample, pairwise_division)
## celltype sample pairwise_division
## 1 Bcells SP ID treated 0.9676300
## 2 DendriticCells SP ID treated 1.1952416
## 3 Macrophages SP ID treated 0.8434003
## 4 Monocytes SP ID treated 1.0815668
## 5 NKCells SP ID treated 1.3859746
## 6 Neutrophils SP ID treated 1.0534358
## 7 StemCells SP ID treated 0.8532976
## 8 StromalCells SP ID treated 0.9427313
## 9 abTcells SP ID treated 1.2800739
## 10 gdTCells SP ID treated 1.0023222
如果愿意,可以在base和data.table中使用等效版本。或者走直达路线:
aggregate(cbind(pairwise_division = `mean(score)`) ~ celltype,
df[order(df$celltype, df$sample), ],
FUN = function(x){x[2]/x[1]})
## celltype pairwise_division
## 1 Bcells 0.9676300
## 2 DendriticCells 1.1952416
## 3 Macrophages 0.8434003
## 4 Monocytes 1.0815668
## 5 NKCells 1.3859746
## 6 Neutrophils 1.0534358
## 7 StemCells 0.8532976
## 8 StromalCells 0.9427313
## 9 abTcells 1.2800739
## 10 gdTCells 1.0023222
如果您的数据已排序并完全配对:
pair_index <- 1:(dim(df)[1]/2)*2
df[pair_index,'pairwise-division'] <- df[pair_index,3] / df[pair_index-1,3]
df[pair_index,c(1,2,4)]
如果您的数据已排序并完全配对:
pair_index <- 1:(dim(df)[1]/2)*2
df[pair_index,'pairwise-division'] <- df[pair_index,3] / df[pair_index-1,3]
df[pair_index,c(1,2,4)]
谢谢,但是为什么结果的第一行的值不是0.967630031?哎呀,向后分割,发布了错误的版本。修正了。谢谢,但是为什么结果的第一行的值不是0.967630031?哎呀,向后分割,发布了错误的版本。固定的