在R中使用sort()或order()对因子排序
我正在尝试根据一列对数据帧进行排序。我的数据帧结构是:在R中使用sort()或order()对因子排序,r,sorting,R,Sorting,我正在尝试根据一列对数据帧进行排序。我的数据帧结构是: data.frame': 9194 obs. of 7 variables: $ taxonomy_y: Factor w/ 51 levels "Alistipes","Alphaproteobacteria",..: 1 1 1 1 1 1 1 1 1 1 ... $ otu1id : Factor w/ 51 levels "_1","_10","_102",..: 12 12 12 12 12 12 12 12 1
data.frame': 9194 obs. of 7 variables:
$ taxonomy_y: Factor w/ 51 levels "Alistipes","Alphaproteobacteria",..: 1 1 1 1 1 1 1 1 1 1 ...
$ otu1id : Factor w/ 51 levels "_1","_10","_102",..: 12 12 12 12 12 12 12 12 12 12 ...
$ taxonomy_x: Factor w/ 51 levels "Alistipes","Alphaproteobacteria",..: 45 50 42 24 17 14 2 7 39 44 ...
$ otu2id : Factor w/ 51 levels "_1","_10","_102",..: 23 41 26 51 2 10 25 35 42 5 ...
$ otu2 : chr "333" "241" "14" "56" ...
$ otu1 : chr "16" "119" "90" "16" ...
$ CONTROL1 : num 0.0897 0.0864 0.2444 0.1818 0.5976 ...
我的数据框看起来像:
taxonomy_y otu1id taxonomy_x otu2id otu2 otu1
1 Alistipes _14 Roseburia _29 333 16
2 Alistipes _14 Turicibacter _63 241 119
3 Alistipes _14 Parasutterella _37 14 90
4 Alistipes _14 Dorea _98 56 16
5 Alistipes _14 Clostridium _10 178 16
6 Alistipes _14 Clostridium S _12 155 16
我尝试对column1id使用sort()和order()函数,但排序不正确,如下所示:(请关注otuid列)
为什么我在2之前得到10?我需要像_1,_2,_3 _4….这样的排序顺序。我如何做到这一点?我正在使用ubundu OS,因为
otu1id
列是factor
,您无法直接订购
例如,观察数据的级别
factor(as.character(1:10))
# [1] 1 2 3 4 5 6 7 8 9 10
#Levels: 1 10 2 3 4 5 6 7 8 9
我们可以删除字符串开头的“quot
,将数据转换为数字和顺序
df[order(as.numeric(sub("_", "", df$otu1id))), ]
#OR
#df[order(as.numeric(sub("\\D", "", df$otu1id))), ]
# taxonomy_y otu1id taxonomy_x otu2id otu2 otu1
#1 Alistipes _1 Roseburia _29 333 16
#2 Alistipes _1 Turicibacter _63 241 119
#3 Alistipes _1 Parasutterella _37 14 90
#9 Alistipes _2 Clostridium _12 155 16
#4 Alistipes _10 Dorea _98 56 16
#5 Alistipes _10 Clostridium _10 178 16
#6 Alistipes _10 Clostridium _12 155 16
#10 Alistipes _23 ClostridiumS _12 155 16
#7 Alistipes _100 Clostridium _12 155 16
#8 Alistipes _1008 Clostridium _12 155 16
如果将
otu1id
转换为字符,则可以直接从gtools
df[gtools::mixedorder(as.character(df$otu1id)), ]
数据
df <- structure(list(taxonomy_y = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = "Alistipes", class = "factor"), otu1id = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 4L, 5L, 6L), .Label = c("_1", "_10",
"_100", "_1008", "_2", "_23"), class = "factor"), taxonomy_x = structure(c(5L,
6L, 4L, 3L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("Clostridium",
"ClostridiumS", "Dorea", "Parasutterella", "Roseburia", "Turicibacter"
), class = "factor"), otu2id = structure(c(3L, 5L, 4L, 6L, 1L,
2L, 2L, 2L, 2L, 2L), .Label = c("_10", "_12", "_29", "_37", "_63",
"_98"), class = "factor"), otu2 = c(333L, 241L, 14L, 56L, 178L,
155L, 155L, 155L, 155L, 155L), otu1 = c(16L, 119L, 90L, 16L,
16L, 16L, 16L, 16L, 16L, 16L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"))
df由于otu1id
列是factor
您不能直接订购
例如,观察数据的级别
factor(as.character(1:10))
# [1] 1 2 3 4 5 6 7 8 9 10
#Levels: 1 10 2 3 4 5 6 7 8 9
我们可以删除字符串开头的“quot
,将数据转换为数字和顺序
df[order(as.numeric(sub("_", "", df$otu1id))), ]
#OR
#df[order(as.numeric(sub("\\D", "", df$otu1id))), ]
# taxonomy_y otu1id taxonomy_x otu2id otu2 otu1
#1 Alistipes _1 Roseburia _29 333 16
#2 Alistipes _1 Turicibacter _63 241 119
#3 Alistipes _1 Parasutterella _37 14 90
#9 Alistipes _2 Clostridium _12 155 16
#4 Alistipes _10 Dorea _98 56 16
#5 Alistipes _10 Clostridium _10 178 16
#6 Alistipes _10 Clostridium _12 155 16
#10 Alistipes _23 ClostridiumS _12 155 16
#7 Alistipes _100 Clostridium _12 155 16
#8 Alistipes _1008 Clostridium _12 155 16
如果将otu1id
转换为字符,则可以直接从gtools
df[gtools::mixedorder(as.character(df$otu1id)), ]
数据
df <- structure(list(taxonomy_y = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = "Alistipes", class = "factor"), otu1id = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 4L, 5L, 6L), .Label = c("_1", "_10",
"_100", "_1008", "_2", "_23"), class = "factor"), taxonomy_x = structure(c(5L,
6L, 4L, 3L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("Clostridium",
"ClostridiumS", "Dorea", "Parasutterella", "Roseburia", "Turicibacter"
), class = "factor"), otu2id = structure(c(3L, 5L, 4L, 6L, 1L,
2L, 2L, 2L, 2L, 2L), .Label = c("_10", "_12", "_29", "_37", "_63",
"_98"), class = "factor"), otu2 = c(333L, 241L, 14L, 56L, 178L,
155L, 155L, 155L, 155L, 155L), otu1 = c(16L, 119L, 90L, 16L,
16L, 16L, 16L, 16L, 16L, 16L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"))
df