Regex 如何复制列名,在分隔符处拆分列名'/';,转换为多个列名,在R?
我有一个矩阵(它的大小很大)“mymat”。我需要复制列名中与“/”匹配的列,并生成一个“resmatrix”。我怎样才能在R中完成这项工作 麦马特 resmatrixRegex 如何复制列名,在分隔符处拆分列名'/';,转换为多个列名,在R?,regex,r,matrix,string-parsing,columnname,Regex,R,Matrix,String Parsing,Columnname,我有一个矩阵(它的大小很大)“mymat”。我需要复制列名中与“/”匹配的列,并生成一个“resmatrix”。我怎样才能在R中完成这项工作 麦马特 resmatrix a b IID:WE:G12D IID:WE:G12V GH:SQ:p.R172W GH:SQ:p.R172G c 1 3 4 4 2 2 4 22 4
a b IID:WE:G12D IID:WE:G12V GH:SQ:p.R172W GH:SQ:p.R172G c
1 3 4 4 2 2 4
22 4 2 2 2 2 4
2 3 2 2 2 2 4
找出哪些列具有“/”并复制它们,然后重命名。要计算新名称,只需在
/
上拆分,并将最后一个字母替换为第二个名称
# which columns have '/' in them?
which.slash <- grep('/', names(mymat), value=T)
new.names <- unlist(lapply(strsplit(which.slash, '/'),
function (bits) {
# bits[1] is e.g. IID:WE:G12D and bits[2] is the V
# take bits[1] and replace the last letter for the second colname
c(bits[1], sub('.$', bits[2], bits[1]))
}))
# make resmat by copying the appropriate columns
resmat <- cbind(mymat, mymat[, which.slash])
# order the columns to make sure the names replace properly
resmat <- resmat[, order(names(resmat))]
# put the new names in
names(resmat)[grep('/', names(resmat))] <- sort(new.names)
找出哪些列具有“/”并复制它们,然后重命名。要计算新名称,只需在
/
上拆分,并将最后一个字母替换为第二个名称
# which columns have '/' in them?
which.slash <- grep('/', names(mymat), value=T)
new.names <- unlist(lapply(strsplit(which.slash, '/'),
function (bits) {
# bits[1] is e.g. IID:WE:G12D and bits[2] is the V
# take bits[1] and replace the last letter for the second colname
c(bits[1], sub('.$', bits[2], bits[1]))
}))
# make resmat by copying the appropriate columns
resmat <- cbind(mymat, mymat[, which.slash])
# order the columns to make sure the names replace properly
resmat <- resmat[, order(names(resmat))]
# put the new names in
names(resmat)[grep('/', names(resmat))] <- sort(new.names)
您可以使用
grep
获取具有/
('nm1')的列名索引,通过使用sub/scan
复制'nm1'中的列名以创建'nm2'。然后,cbind
将非“nm1”的列与复制列(“nm1”)一起更改为“nm2”的列名,并在需要时对列进行排序
#get the column index with grep
nm1 <- grepl('/', names(df1))
#used regex to rearrange the substrings in the nm1 column names
#removed the `/` and use `scan` to split at the space delimiter
nm2 <- scan(text=gsub('([^/]+)(.)/(.*)', '\\1\\2 \\1\\3',
names(df1)[nm1]), what='', quiet=TRUE)
#cbind the columns that are not in nm1, with the replicate nm1 columns
df2 <- cbind(df1[!nm1], setNames(df1[rep(which(nm1), each= 2)], nm2))
#create another index to find the starting position of nm1 columns
nm3 <- names(df1)[1:(which(nm1)[1L]-1)]
#we concatenate the nm3, nm2, and the rest of the columns to match
#the expected output order
df2N <- df2[c(nm3, nm2, setdiff(names(df1)[!nm1], nm3))]
df2N
# a b IID:WE:G12D IID:WE:G12V GH:SQ:p.R172W GH:SQ:p.R172G c
#1 1 3 4 4 2 2 4
#2 22 4 2 2 2 2 4
#3 2 3 2 2 2 2 4
#使用grep获取列索引
nm1您可以使用grep
获取具有/
('nm1')的列名索引,使用sub/scan
复制'nm1'中的列名以创建'nm2'。然后,cbind
将非“nm1”的列与复制列(“nm1”)一起更改为“nm2”的列名,并在需要时对列进行排序
#get the column index with grep
nm1 <- grepl('/', names(df1))
#used regex to rearrange the substrings in the nm1 column names
#removed the `/` and use `scan` to split at the space delimiter
nm2 <- scan(text=gsub('([^/]+)(.)/(.*)', '\\1\\2 \\1\\3',
names(df1)[nm1]), what='', quiet=TRUE)
#cbind the columns that are not in nm1, with the replicate nm1 columns
df2 <- cbind(df1[!nm1], setNames(df1[rep(which(nm1), each= 2)], nm2))
#create another index to find the starting position of nm1 columns
nm3 <- names(df1)[1:(which(nm1)[1L]-1)]
#we concatenate the nm3, nm2, and the rest of the columns to match
#the expected output order
df2N <- df2[c(nm3, nm2, setdiff(names(df1)[!nm1], nm3))]
df2N
# a b IID:WE:G12D IID:WE:G12V GH:SQ:p.R172W GH:SQ:p.R172G c
#1 1 3 4 4 2 2 4
#2 22 4 2 2 2 2 4
#3 2 3 2 2 2 2 4
#使用grep获取列索引
nm1
df1 <- structure(list(a = c(1L, 22L, 2L), b = c(3L, 4L, 3L),
`IID:WE:G12D/V` = c(4L,
2L, 2L), `GH:SQ:p.R172W/G` = c(2L, 2L, 2L), c = c(4L, 4L, 4L)),
.Names = c("a", "b", "IID:WE:G12D/V", "GH:SQ:p.R172W/G", "c"),
class = "data.frame", row.names = c(NA, -3L))