R 按colname排序数据帧_R_Bioinformatics_Genetics

R 按colname排序数据帧

R 按colname排序数据帧,r,bioinformatics,genetics,R,Bioinformatics,Genetics,我有这样一个数据帧： G2_ref G10_ref G12_ref G2_alt G10_alt G12_alt 20011953 3 6 0 5 1 5 12677336 0 0 0 1 3 6 20076754 0 3 0 12 16 8 2089670 0 4

我有这样一个数据帧：

           G2_ref G10_ref G12_ref G2_alt G10_alt G12_alt
20011953      3      6      0      5       1     5    
12677336      0      0      0      1       3     6  
20076754      0      3      0     12      16     8 
2089670       0      4      0      1      11     9
9456633       0      2      0      3      10     0 
468487        0      0      0      0       0     0

我试图对列进行排序，最终得到以下列顺序：

G2_ref G2_alt G10_ref G10_alt G12_ref G12_alt

我试过：

df[，order（colnames（df））]

但我有一个命令：

G10_alt G10_ref G12_alt G12_ref G2_alt G2_ref

如果有人知道的话，那就太好了

一个选项是提取数字部分和末尾的子字符串，然后执行

顺序

df[order(as.numeric(gsub("\\D+", "", names(df))), 
            factor(sub(".*_", "", names(df)), levels = c('ref', 'alt')))]
#          G2_ref G2_alt G10_ref G10_alt G12_ref G12_alt
#20011953      3      5       6       1       0       5
#12677336      0      1       0       3       0       6
#20076754      0     12       3      16       0       8
#2089670       0      1       4      11       0       9
#9456633       0      3       2      10       0       0
#468487        0      0       0       0       0       0

数据

df使用dplyr的简单解决方案
：
library(dplyr)
df <- df %>%
      select(G2_ref, G2_alt, G10_ref, G10_alt, G12_ref, G12_alt)

库（dplyr）
df%
选择（G2\U ref、G2\U alt、G10\U ref、G10\U alt、G12\U ref、G12\U alt）

也许这比@akrun的答案要简单（复杂），但只适用于需要订购少量列的情况。
我猜您的数据来自遗传学，看起来相当标准，第一列为所有变体的ref等位基因，然后是所有变体的alt等位基因
这意味着我们可以从数据帧的一半开始使用，即：我们将尝试创建此索引-c（1,4,2,5,3,6）
，然后创建子集：
ix <- c(rbind(seq(1, ncol(df1)/2), seq(ncol(df1)/2 + 1, ncol(df1))))
ix
# [1] 1 4 2 5 3 6

df1[, ix]
#          G2_ref G2_alt G10_ref G10_alt G12_ref G12_alt
# 20011953      3      5       6       1       0       5
# 12677336      0      1       0       3       0       6
# 20076754      0     12       3      16       0       8
# 2089670       0      1       4      11       0       9
# 9456633       0      3       2      10       0       0
# 468487        0      0       0       0       0       0

# or all in one line
df1[, c(rbind(seq(1, ncol(df1)/2), seq(ncol(df1)/2 + 1, ncol(df1))))]

ix这是我编写命令行时它返回给我的结果：head（rc[order（as.numeric（gsub（\\D+），“”，names（rc））、factor（sub（“.*”，“”，names（rc））、levels=c（'ref'，'alt'）））
整数（0）
@Erika我假设您的数据集为data.frame
（更新了我使用的数据集）。如果它是一个矩阵，那么使用逗号，即，.erc[，order（…
谢谢你，但实际上这是一个小例子，如果你想要比df[c（1,4,2,5,3,6）]更紧凑、更简单，我可以有100多个列
对于这一点，您并不需要一个足够公平的包，但是对于使用列名而不是列号，特别是在减少错误源方面，有一些话要说。我同意您的答案是更好的方法。这个数据帧是如何创建的？可能是由dcast（）
或gather（）创建的？
ix <- c(rbind(seq(1, ncol(df1)/2), seq(ncol(df1)/2 + 1, ncol(df1))))
ix
# [1] 1 4 2 5 3 6

df1[, ix]
#          G2_ref G2_alt G10_ref G10_alt G12_ref G12_alt
# 20011953      3      5       6       1       0       5
# 12677336      0      1       0       3       0       6
# 20076754      0     12       3      16       0       8
# 2089670       0      1       4      11       0       9
# 9456633       0      3       2      10       0       0
# 468487        0      0       0       0       0       0

# or all in one line
df1[, c(rbind(seq(1, ncol(df1)/2), seq(ncol(df1)/2 + 1, ncol(df1))))]