R 基于列中的固有值对数据帧进行排序

R 基于列中的固有值对数据帧进行排序,r,sorting,dplyr,R,Sorting,Dplyr,我希望捕获dataframe中的固有值,然后根据每列和每行中的事件数将列和行从降序排列到升序排列 样本数据 #A tibble: 26 x 9 sample_id Gene_A Gene_B Gene_C Gene_D Gene_E Gene_F Gene_G Gene_H <fct> <int> <int> <int> <int> <int> <int> <int>

我希望捕获dataframe中的固有值,然后根据每列和每行中的事件数将列和行从降序排列到升序排列

样本数据

 #A tibble: 26 x 9
   sample_id Gene_A Gene_B Gene_C Gene_D Gene_E Gene_F Gene_G Gene_H
   <fct>      <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>
 1 A             -1      0      0      0     -1      0      0     -1
 2 B              1      0     -1      1     -1     -1     -1      0
 3 C              1      0     -1      0      1      0      0     -1
 4 D             -1      0      0     -1      1      1     -1      1
 5 E              1      1      1      1     -1      1     -1      0
 6 F             -1     -1      1      1      1     -1      0      0
 7 G              0      0     -1     -1      0     -1      0     -1
 8 H              1      1      1      0      1     -1     -1      0
 9 I              0     -1     -1     -1      0     -1      0      1
10 J             -1      0      0      1     -1     -1      0      1
# ... with 16 more rows
dummy.tb <- tibble (sample_id = (sample (1:30,30)), Gene_A = (sample
(-1:1,30, replace = T)), Gene_B = (sample (-1:1,30, replace = T)))

dummy1.tb <- tibble (Gene_C = (sample (-1:1,30, replace = T)), Gene_D
= (sample (-1:1,30, replace = T)), Gene_E = (sample (-1:1,30, replace = T)))

dummy2.tb <- tibble (Gene_F = (sample (-1:1,30, replace = T)), Gene_G
= (sample (-1:1,30, replace = T)), Gene_H = (sample (-1:1,30, replace = T)))

dummy.tb <- cbind.data.frame(dummy.tb, dummy1.tb, dummy2.tb)

dummy.genes <- c ("Gene_A", "Gene_B", "Gene_C", "Gene_D", "Gene_E",
"Gene_F", "Gene_G", "Gene_H")

dummy.total <- as.data.frame (dummy.total)
#一个tible:26 x 9
样本id基因A基因B基因C基因D基因E基因F基因G基因H
1A-1000-1000-1
2B10-11-1-1-10
3C10-1010-1
4d-100-11-11
5 e1-11-10
6F-1-11-100
7G00-1-10-10-1
8H110101-1-10
9I0-1-1-10-101
10J-1001-1-1001
# ... 还有16行
我要得到的最终结果是一个按以下层次结构排序的表:

  • 基于事件最多到事件最少的基因数量
    • 然后,根据每个样本id的事件数,从大多数事件到最少事件
以下是示例输出:

# A tibble: 26 x 9
   sample_id Gene_B Gene_G Gene_H Gene_A Gene_C Gene_D Gene_F Gene_E
 * <chr>      <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>
 1 A              0      0     -1     -1      0      0      0     -1
 2 U              0     -1      0      0      0     -1      0      1
 3 C              0      0     -1      1     -1      0      0      1
 4 G              0      0     -1      0     -1     -1     -1      0
 5 W              0     -1      1      1      0      1      0      0
 6 Y              0      0      1      1      0      1      1      0
 7 I             -1      0      1      0     -1     -1     -1      0
 8 J              0      0      1     -1      0      1     -1     -1
 9 O              0      1      0      0      1     -1      1      1
10 P              1     -1     -1      0     -1      0      0     -1
# ... with 16 more rows
#一个tible:26 x 9
样本id基因B基因G基因H基因A基因C基因D基因F基因E
*                     
1A0-1-1000-1
2U0-1000-1001
3C00-11-1001
4G00-10-1-1-10
5 W 0-11 0 1 0 0
6 Y 0 0 1 0 1 0
7 I-1010-1-1-10
8 J 0 1-1 0 1-1-1
9010101-1111
10p1-1-10-100-1
# ... 还有16行
我的第一个想法是取绝对和,并为每个样本添加一个包含总计的列,取绝对和,并为每个列添加一个包含总计的行,然后使用顺序

生成样本数据

 #A tibble: 26 x 9
   sample_id Gene_A Gene_B Gene_C Gene_D Gene_E Gene_F Gene_G Gene_H
   <fct>      <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>
 1 A             -1      0      0      0     -1      0      0     -1
 2 B              1      0     -1      1     -1     -1     -1      0
 3 C              1      0     -1      0      1      0      0     -1
 4 D             -1      0      0     -1      1      1     -1      1
 5 E              1      1      1      1     -1      1     -1      0
 6 F             -1     -1      1      1      1     -1      0      0
 7 G              0      0     -1     -1      0     -1      0     -1
 8 H              1      1      1      0      1     -1     -1      0
 9 I              0     -1     -1     -1      0     -1      0      1
10 J             -1      0      0      1     -1     -1      0      1
# ... with 16 more rows
dummy.tb <- tibble (sample_id = (sample (1:30,30)), Gene_A = (sample
(-1:1,30, replace = T)), Gene_B = (sample (-1:1,30, replace = T)))

dummy1.tb <- tibble (Gene_C = (sample (-1:1,30, replace = T)), Gene_D
= (sample (-1:1,30, replace = T)), Gene_E = (sample (-1:1,30, replace = T)))

dummy2.tb <- tibble (Gene_F = (sample (-1:1,30, replace = T)), Gene_G
= (sample (-1:1,30, replace = T)), Gene_H = (sample (-1:1,30, replace = T)))

dummy.tb <- cbind.data.frame(dummy.tb, dummy1.tb, dummy2.tb)

dummy.genes <- c ("Gene_A", "Gene_B", "Gene_C", "Gene_D", "Gene_E",
"Gene_F", "Gene_G", "Gene_H")

dummy.total <- as.data.frame (dummy.total)

dummy.tb对于这个问题,
matrix
更适合处理同质(数值)数据。如果将列名和
sample\u id
分配给
matrix
dimnames
,则排序后将能够保留列和行标识符

我建议您使用
set.seed
,这样您的示例将是可复制的,并且可以用所需的输出验证答案

请参阅下文:

set.seed(123)
n <- 30
m <- 9
mat <- matrix(
  sample(-1:1, n * m, replace = TRUE), 
  nrow = n,   
  dimnames = list(1:n, paste("Gene", LETTERS[1:m], sep = "_"))
)
foo <- mat[, order(colSums(abs(mat)))]
bar <- foo[order(rowSums(abs(foo))), ]
head(bar)

请更正报价。它给了errorsChanged dummy.genes的引号
dummy.total <- dummy.total %>% mutate (Row_Total = rowSums (abs((select (., one_of(dummy.genes))))))

dummy.total <- as.data.frame (dummy.total)
dummy.total <- dummy.total [order (dummy.total [,ncol(dummy.total)], decreasing = FALSE),]
dummy.total <- dummy.total %>% select (-Row_Total)
dummy.total <- dummy.total %>% select (sample_id, everything())

dummy.total <- as.tibble(dummy.total)
set.seed(123)
n <- 30
m <- 9
mat <- matrix(
  sample(-1:1, n * m, replace = TRUE), 
  nrow = n,   
  dimnames = list(1:n, paste("Gene", LETTERS[1:m], sep = "_"))
)
foo <- mat[, order(colSums(abs(mat)))]
bar <- foo[order(rowSums(abs(foo))), ]
head(bar)
   Gene_F Gene_D Gene_I Gene_G Gene_C Gene_A Gene_H Gene_B Gene_E
18     -1      0      0      0      0     -1      0      0      1
15      0      0      0      1      0     -1     -1     -1      0
27      0      0      0      0      1      0     -1     -1     -1
1       1     -1      0      1      0     -1      0      1      0
3       0      0     -1      1      0      0     -1      1     -1
6       0     -1      1      0      0     -1      1      0      1