R 基于列中的固有值对数据帧进行排序
我希望捕获dataframe中的固有值,然后根据每列和每行中的事件数将列和行从降序排列到升序排列 样本数据R 基于列中的固有值对数据帧进行排序,r,sorting,dplyr,R,Sorting,Dplyr,我希望捕获dataframe中的固有值,然后根据每列和每行中的事件数将列和行从降序排列到升序排列 样本数据 #A tibble: 26 x 9 sample_id Gene_A Gene_B Gene_C Gene_D Gene_E Gene_F Gene_G Gene_H <fct> <int> <int> <int> <int> <int> <int> <int>
#A tibble: 26 x 9
sample_id Gene_A Gene_B Gene_C Gene_D Gene_E Gene_F Gene_G Gene_H
<fct> <int> <int> <int> <int> <int> <int> <int> <int>
1 A -1 0 0 0 -1 0 0 -1
2 B 1 0 -1 1 -1 -1 -1 0
3 C 1 0 -1 0 1 0 0 -1
4 D -1 0 0 -1 1 1 -1 1
5 E 1 1 1 1 -1 1 -1 0
6 F -1 -1 1 1 1 -1 0 0
7 G 0 0 -1 -1 0 -1 0 -1
8 H 1 1 1 0 1 -1 -1 0
9 I 0 -1 -1 -1 0 -1 0 1
10 J -1 0 0 1 -1 -1 0 1
# ... with 16 more rows
dummy.tb <- tibble (sample_id = (sample (1:30,30)), Gene_A = (sample
(-1:1,30, replace = T)), Gene_B = (sample (-1:1,30, replace = T)))
dummy1.tb <- tibble (Gene_C = (sample (-1:1,30, replace = T)), Gene_D
= (sample (-1:1,30, replace = T)), Gene_E = (sample (-1:1,30, replace = T)))
dummy2.tb <- tibble (Gene_F = (sample (-1:1,30, replace = T)), Gene_G
= (sample (-1:1,30, replace = T)), Gene_H = (sample (-1:1,30, replace = T)))
dummy.tb <- cbind.data.frame(dummy.tb, dummy1.tb, dummy2.tb)
dummy.genes <- c ("Gene_A", "Gene_B", "Gene_C", "Gene_D", "Gene_E",
"Gene_F", "Gene_G", "Gene_H")
dummy.total <- as.data.frame (dummy.total)
#一个tible:26 x 9
样本id基因A基因B基因C基因D基因E基因F基因G基因H
1A-1000-1000-1
2B10-11-1-1-10
3C10-1010-1
4d-100-11-11
5 e1-11-10
6F-1-11-100
7G00-1-10-10-1
8H110101-1-10
9I0-1-1-10-101
10J-1001-1-1001
# ... 还有16行
我要得到的最终结果是一个按以下层次结构排序的表:
- 基于事件最多到事件最少的基因数量
- 然后,根据每个样本id的事件数,从大多数事件到最少事件
# A tibble: 26 x 9
sample_id Gene_B Gene_G Gene_H Gene_A Gene_C Gene_D Gene_F Gene_E
* <chr> <int> <int> <int> <int> <int> <int> <int> <int>
1 A 0 0 -1 -1 0 0 0 -1
2 U 0 -1 0 0 0 -1 0 1
3 C 0 0 -1 1 -1 0 0 1
4 G 0 0 -1 0 -1 -1 -1 0
5 W 0 -1 1 1 0 1 0 0
6 Y 0 0 1 1 0 1 1 0
7 I -1 0 1 0 -1 -1 -1 0
8 J 0 0 1 -1 0 1 -1 -1
9 O 0 1 0 0 1 -1 1 1
10 P 1 -1 -1 0 -1 0 0 -1
# ... with 16 more rows
#一个tible:26 x 9
样本id基因B基因G基因H基因A基因C基因D基因F基因E
*
1A0-1-1000-1
2U0-1000-1001
3C00-11-1001
4G00-10-1-1-10
5 W 0-11 0 1 0 0
6 Y 0 0 1 0 1 0
7 I-1010-1-1-10
8 J 0 1-1 0 1-1-1
9010101-1111
10p1-1-10-100-1
# ... 还有16行
我的第一个想法是取绝对和,并为每个样本添加一个包含总计的列,取绝对和,并为每个列添加一个包含总计的行,然后使用顺序
生成样本数据
#A tibble: 26 x 9
sample_id Gene_A Gene_B Gene_C Gene_D Gene_E Gene_F Gene_G Gene_H
<fct> <int> <int> <int> <int> <int> <int> <int> <int>
1 A -1 0 0 0 -1 0 0 -1
2 B 1 0 -1 1 -1 -1 -1 0
3 C 1 0 -1 0 1 0 0 -1
4 D -1 0 0 -1 1 1 -1 1
5 E 1 1 1 1 -1 1 -1 0
6 F -1 -1 1 1 1 -1 0 0
7 G 0 0 -1 -1 0 -1 0 -1
8 H 1 1 1 0 1 -1 -1 0
9 I 0 -1 -1 -1 0 -1 0 1
10 J -1 0 0 1 -1 -1 0 1
# ... with 16 more rows
dummy.tb <- tibble (sample_id = (sample (1:30,30)), Gene_A = (sample
(-1:1,30, replace = T)), Gene_B = (sample (-1:1,30, replace = T)))
dummy1.tb <- tibble (Gene_C = (sample (-1:1,30, replace = T)), Gene_D
= (sample (-1:1,30, replace = T)), Gene_E = (sample (-1:1,30, replace = T)))
dummy2.tb <- tibble (Gene_F = (sample (-1:1,30, replace = T)), Gene_G
= (sample (-1:1,30, replace = T)), Gene_H = (sample (-1:1,30, replace = T)))
dummy.tb <- cbind.data.frame(dummy.tb, dummy1.tb, dummy2.tb)
dummy.genes <- c ("Gene_A", "Gene_B", "Gene_C", "Gene_D", "Gene_E",
"Gene_F", "Gene_G", "Gene_H")
dummy.total <- as.data.frame (dummy.total)
dummy.tb对于这个问题,matrix
更适合处理同质(数值)数据。如果将列名和sample\u id
分配给matrix
的dimnames
,则排序后将能够保留列和行标识符
我建议您使用set.seed
,这样您的示例将是可复制的,并且可以用所需的输出验证答案
请参阅下文:
set.seed(123)
n <- 30
m <- 9
mat <- matrix(
sample(-1:1, n * m, replace = TRUE),
nrow = n,
dimnames = list(1:n, paste("Gene", LETTERS[1:m], sep = "_"))
)
foo <- mat[, order(colSums(abs(mat)))]
bar <- foo[order(rowSums(abs(foo))), ]
head(bar)
请更正报价。它给了errorsChanged dummy.genes的引号
dummy.total <- dummy.total %>% mutate (Row_Total = rowSums (abs((select (., one_of(dummy.genes))))))
dummy.total <- as.data.frame (dummy.total)
dummy.total <- dummy.total [order (dummy.total [,ncol(dummy.total)], decreasing = FALSE),]
dummy.total <- dummy.total %>% select (-Row_Total)
dummy.total <- dummy.total %>% select (sample_id, everything())
dummy.total <- as.tibble(dummy.total)
set.seed(123)
n <- 30
m <- 9
mat <- matrix(
sample(-1:1, n * m, replace = TRUE),
nrow = n,
dimnames = list(1:n, paste("Gene", LETTERS[1:m], sep = "_"))
)
foo <- mat[, order(colSums(abs(mat)))]
bar <- foo[order(rowSums(abs(foo))), ]
head(bar)
Gene_F Gene_D Gene_I Gene_G Gene_C Gene_A Gene_H Gene_B Gene_E
18 -1 0 0 0 0 -1 0 0 1
15 0 0 0 1 0 -1 -1 -1 0
27 0 0 0 0 1 0 -1 -1 -1
1 1 -1 0 1 0 -1 0 1 0
3 0 0 -1 1 0 0 -1 1 -1
6 0 -1 1 0 0 -1 1 0 1