在R中使用sort（）或order（）对因子排序_R_Sorting

在R中使用sort（）或order（）对因子排序

r sorting

在R中使用sort（）或order（）对因子排序,r,sorting,R,Sorting,我正在尝试根据一列对数据帧进行排序。我的数据帧结构是： data.frame': 9194 obs. of 7 variables: $ taxonomy_y: Factor w/ 51 levels "Alistipes","Alphaproteobacteria",..: 1 1 1 1 1 1 1 1 1 1 ... $ otu1id : Factor w/ 51 levels "_1","_10","_102",..: 12 12 12 12 12 12 12 12 1

我正在尝试根据一列对数据帧进行排序。我的数据帧结构是：

data.frame':    9194 obs. of  7 variables:
 $ taxonomy_y: Factor w/ 51 levels "Alistipes","Alphaproteobacteria",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ otu1id    : Factor w/ 51 levels "_1","_10","_102",..: 12 12 12 12 12 12 12 12 12 12 ...
 $ taxonomy_x: Factor w/ 51 levels "Alistipes","Alphaproteobacteria",..: 45 50 42 24 17 14 2 7 39 44 ...
 $ otu2id    : Factor w/ 51 levels "_1","_10","_102",..: 23 41 26 51 2 10 25 35 42 5 ...
 $ otu2      : chr  "333" "241" "14" "56" ...
 $ otu1      : chr  "16" "119" "90" "16" ...
 $ CONTROL1  : num  0.0897 0.0864 0.2444 0.1818 0.5976 ...

我的数据框看起来像：

     taxonomy_y otu1id  taxonomy_x   otu2id otu2 otu1   
 1  Alistipes    _14    Roseburia      _29  333  16   
 2  Alistipes    _14    Turicibacter   _63  241  119 
 3  Alistipes    _14    Parasutterella _37  14   90 
 4  Alistipes    _14    Dorea          _98  56   16 
 5  Alistipes    _14    Clostridium    _10  178  16 
 6  Alistipes    _14    Clostridium S  _12  155  16

我尝试对column1id使用sort（）和order（）函数，但排序不正确，如下所示：（请关注otuid列）

为什么我在2之前得到10？我需要像_1，_2，_3 _4….这样的排序顺序。我如何做到这一点？我正在使用ubundu OS

，因为

otu1id

列是

factor

，您无法直接订购

例如，观察数据的级别

factor(as.character(1:10))
# [1] 1  2  3  4  5  6  7  8  9  10
#Levels: 1 10 2 3 4 5 6 7 8 9

我们可以删除字符串开头的

“quot

，将数据转换为数字和

顺序

df[order(as.numeric(sub("_", "", df$otu1id))), ]
#OR
#df[order(as.numeric(sub("\\D", "", df$otu1id))), ]

#   taxonomy_y otu1id     taxonomy_x otu2id otu2 otu1
#1   Alistipes     _1      Roseburia    _29  333   16
#2   Alistipes     _1   Turicibacter    _63  241  119
#3   Alistipes     _1 Parasutterella    _37   14   90
#9   Alistipes     _2    Clostridium    _12  155   16
#4   Alistipes    _10          Dorea    _98   56   16
#5   Alistipes    _10    Clostridium    _10  178   16
#6   Alistipes    _10    Clostridium    _12  155   16
#10  Alistipes    _23   ClostridiumS    _12  155   16
#7   Alistipes   _100    Clostridium    _12  155   16
#8   Alistipes  _1008    Clostridium    _12  155   16

如果将

otu1id

转换为字符，则可以直接从

gtools

df[gtools::mixedorder(as.character(df$otu1id)), ]

数据

df <- structure(list(taxonomy_y = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), .Label = "Alistipes", class = "factor"), otu1id = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 3L, 4L, 5L, 6L), .Label = c("_1", "_10", 
"_100", "_1008", "_2", "_23"), class = "factor"), taxonomy_x = structure(c(5L, 
6L, 4L, 3L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("Clostridium", 
"ClostridiumS", "Dorea", "Parasutterella", "Roseburia", "Turicibacter"
), class = "factor"), otu2id = structure(c(3L, 5L, 4L, 6L, 1L, 
2L, 2L, 2L, 2L, 2L), .Label = c("_10", "_12", "_29", "_37", "_63", 
"_98"), class = "factor"), otu2 = c(333L, 241L, 14L, 56L, 178L, 
155L, 155L, 155L, 155L, 155L), otu1 = c(16L, 119L, 90L, 16L, 
16L, 16L, 16L, 16L, 16L, 16L)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))

df由于otu1id
列是factor
您不能直接订购
例如，观察数据的级别
factor(as.character(1:10))
# [1] 1  2  3  4  5  6  7  8  9  10
#Levels: 1 10 2 3 4 5 6 7 8 9

我们可以删除字符串开头的“quot
，将数据转换为数字和顺序

df[order(as.numeric(sub("_", "", df$otu1id))), ]
#OR
#df[order(as.numeric(sub("\\D", "", df$otu1id))), ]

#   taxonomy_y otu1id     taxonomy_x otu2id otu2 otu1
#1   Alistipes     _1      Roseburia    _29  333   16
#2   Alistipes     _1   Turicibacter    _63  241  119
#3   Alistipes     _1 Parasutterella    _37   14   90
#9   Alistipes     _2    Clostridium    _12  155   16
#4   Alistipes    _10          Dorea    _98   56   16
#5   Alistipes    _10    Clostridium    _10  178   16
#6   Alistipes    _10    Clostridium    _12  155   16
#10  Alistipes    _23   ClostridiumS    _12  155   16
#7   Alistipes   _100    Clostridium    _12  155   16
#8   Alistipes  _1008    Clostridium    _12  155   16


如果将otu1id
转换为字符，则可以直接从gtools

df[gtools::mixedorder(as.character(df$otu1id)), ]

数据
df <- structure(list(taxonomy_y = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), .Label = "Alistipes", class = "factor"), otu1id = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 3L, 4L, 5L, 6L), .Label = c("_1", "_10", 
"_100", "_1008", "_2", "_23"), class = "factor"), taxonomy_x = structure(c(5L, 
6L, 4L, 3L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("Clostridium", 
"ClostridiumS", "Dorea", "Parasutterella", "Roseburia", "Turicibacter"
), class = "factor"), otu2id = structure(c(3L, 5L, 4L, 6L, 1L, 
2L, 2L, 2L, 2L, 2L), .Label = c("_10", "_12", "_29", "_37", "_63", 
"_98"), class = "factor"), otu2 = c(333L, 241L, 14L, 56L, 178L, 
155L, 155L, 155L, 155L, 155L), otu1 = c(16L, 119L, 90L, 16L, 
16L, 16L, 16L, 16L, 16L, 16L)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))

df