R 如何基于密钥数据合并字符串
我有这样的数据R 如何基于密钥数据合并字符串,r,R,我有这样的数据 df<- structure(list(position = structure(c(6L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 7L, 1L, 2L, 3L, 4L, 8L, 1L, 2L, 3L, 4L), .Label = c("1,2,3,4,5,6,7,8,9,10,11,12,13,14,15", "2,3,4,5,6,7,8,9,10,11,12,13,14,15,16", "3,4,5,6,7,8,9,10,11,12,13,14,1
df<- structure(list(position = structure(c(6L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 7L, 1L, 2L, 3L, 4L, 8L, 1L, 2L, 3L, 4L), .Label = c("1,2,3,4,5,6,7,8,9,10,11,12,13,14,15",
"2,3,4,5,6,7,8,9,10,11,12,13,14,15,16", "3,4,5,6,7,8,9,10,11,12,13,14,15,16,17",
"4,5,6,7,8,9,10,11,12,13,14,15,16,17,18", "TP<AMB88", "TP<AMT55",
"TP<ELANE", "TP<RACK1"), class = "factor"), col = structure(c(15L,
6L, 3L, 11L, 5L, 14L, 9L, 18L, 16L, 8L, 13L, 4L, 2L, 17L, 7L,
12L, 1L, 10L), .Label = c("EQMTLRGTLKGHNGW", "GRRLACLFLACVLPA",
"GSLSNYALLQLTLTA", "LGRRLACLFLACVLP", "LSNYALLQLTLTAFL", "MGSLSNYALLQLTLT",
"MTEQMTLRGTLKGHN", "MTLGRRLACLFLACV", "MVKETTYYDVLGVKP", "QMTLRGTLKGHNGWV",
"SLSNYALLQLTLTAF", "TEQMTLRGTLKGHNG", "TLGRRLACLFLACVL", "TP<AMB88",
"TP<AMT55", "TP<ELANE", "TP<RACK1", "VKETTYYDVLGVKPN"), class = "factor")), class = "data.frame", row.names = c(NA,
-18L))
基本上,我试图合并所有的部分,并有一个这样的输出
df<- structure(list(position = structure(c(6L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 7L, 1L, 2L, 3L, 4L, 8L, 1L, 2L, 3L, 4L), .Label = c("1,2,3,4,5,6,7,8,9,10,11,12,13,14,15",
"2,3,4,5,6,7,8,9,10,11,12,13,14,15,16", "3,4,5,6,7,8,9,10,11,12,13,14,15,16,17",
"4,5,6,7,8,9,10,11,12,13,14,15,16,17,18", "TP<AMB88", "TP<AMT55",
"TP<ELANE", "TP<RACK1"), class = "factor"), col = structure(c(15L,
6L, 3L, 11L, 5L, 14L, 9L, 18L, 16L, 8L, 13L, 4L, 2L, 17L, 7L,
12L, 1L, 10L), .Label = c("EQMTLRGTLKGHNGW", "GRRLACLFLACVLPA",
"GSLSNYALLQLTLTA", "LGRRLACLFLACVLP", "LSNYALLQLTLTAFL", "MGSLSNYALLQLTLT",
"MTEQMTLRGTLKGHN", "MTLGRRLACLFLACV", "MVKETTYYDVLGVKP", "QMTLRGTLKGHNGWV",
"SLSNYALLQLTLTAF", "TEQMTLRGTLKGHNG", "TLGRRLACLFLACVL", "TP<AMB88",
"TP<AMT55", "TP<ELANE", "TP<RACK1", "VKETTYYDVLGVKPN"), class = "factor")), class = "data.frame", row.names = c(NA,
-18L))
1对应于第一个中的M
2对应于G(第一个和第二个)
3对应于S(第一个和第二个)等
out据我所知,您需要将所有条目中的最后一个字母粘贴到第二个条目上,因为您将分组(每组由TP
定义)
#确保列是字符,而不是因子
df[]你能描述一下你想要达到的目标吗?我对氨基酸一无所知。我在你的问题中看到的只是一个前后数据帧,中间没有任何解释……你是如何得到你的输出的?@Sotos你现在怎么想?哦,好的。现在我明白你需要什么了!让我试一试,我在nchar(j)中遇到一个错误,说“nchar()”需要一个字符向量
请确保您的列是as.character
。这可能是一个因素,这就是为什么你会出错。我编辑了我的回答,有没有可能改变结构,使其像输出?我尝试了as.data.frame,但当数据量很大时,我不知道是什么,因为它忽略了标题name@Learner请你看看,如果它对你有用,请接受答案,所以我们可以考虑这个Q关闭。谢谢
TP<AMT55 TP<AMT55
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18 MGSLSNYALLQLTLTAFL
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
MGSLSNYALLQLTLT
3
2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
GSLSNYALLQLTLTA
out<- structure(list(position = structure(c(6L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 7L, 1L, 2L, 3L, 4L, 8L, 1L, 2L, 3L, 4L), .Label = c("1,2,3,4,5,6,7,8,9,10,11,12,13,14,15",
"2,3,4,5,6,7,8,9,10,11,12,13,14,15,16", "3,4,5,6,7,8,9,10,11,12,13,14,15,16,17",
"4,5,6,7,8,9,10,11,12,13,14,15,16,17,18", "TP<AMB88", "TP<AMT55",
"TP<ELANE", "TP<RACK1"), class = "factor"), col = structure(c(15L,
6L, 3L, 11L, 5L, 14L, 9L, 18L, 16L, 8L, 13L, 4L, 2L, 17L, 7L,
12L, 1L, 10L), .Label = c("EQMTLRGTLKGHNGW", "GRRLACLFLACVLPA",
"GSLSNYALLQLTLTA", "LGRRLACLFLACVLP", "LSNYALLQLTLTAFL", "MGSLSNYALLQLTLT",
"MTEQMTLRGTLKGHN", "MTLGRRLACLFLACV", "MVKETTYYDVLGVKP", "QMTLRGTLKGHNGWV",
"SLSNYALLQLTLTAF", "TEQMTLRGTLKGHNG", "TLGRRLACLFLACVL", "TP<AMB88",
"TP<AMT55", "TP<ELANE", "TP<RACK1", "VKETTYYDVLGVKPN"), class = "factor"),
join = structure(c(7L, 2L, 1L, 1L, 1L, 6L, 5L, 1L, 8L, 4L,
1L, 1L, 1L, 9L, 3L, 1L, 1L, 1L), .Label = c("", "MGSLSNYALLQLTLTAFL",
"MTEQMTLRGTLKGHNGWV", "MTLGRRLACLFLACVLPA", "MVKETTYYDVLGVKPN",
"TP<AMB88", "TP<AMT55", "TP<ELANE", "TP<RACK1"), class = "factor")), class = "data.frame", row.names = c(NA,
-18L))
#make sure columns are character, not factors
df[] <- lapply(df, as.character)
l1 <- split(df, cumsum(grepl('TP', df$position)))
lapply(l1, function(i) paste0(i$col[2], paste(sapply(i$col[3:nrow(i)],
function(j) substr(j, nchar(j), nchar(j))), collapse = '')))
#$`1`
#[1] "MGSLSNYALLQLTLTAFL"
#$`2`
#[1] "MVKETTYYDVLGVKPN"
#$`3`
#[1] "MTLGRRLACLFLACVLPA"
#$`4`
#[1] "MTEQMTLRGTLKGHNGWV"
do.call(rbind, lapply(l1, function(i) {dd <- cbind.data.frame(i, join = paste0(i$col[2], paste(sapply(i$col[3:nrow(i)], function(j) substr(j, nchar(j), nchar(j))), collapse = ''))); dd$join <- as.character(dd$join); dd$join[1] <- dd$col[1]; dd$join[3:nrow(dd)] <- ''; dd}))
position col join
1.1 TP<AMT55 TP<AMT55 TP<AMT55
1.2 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 MGSLSNYALLQLTLT MGSLSNYALLQLTLTAFL
1.3 2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 GSLSNYALLQLTLTA
1.4 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17 SLSNYALLQLTLTAF
1.5 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18 LSNYALLQLTLTAFL
2.6 TP<AMB88 TP<AMB88 TP<AMB88
2.7 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 MVKETTYYDVLGVKP MVKETTYYDVLGVKPN
2.8 2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 VKETTYYDVLGVKPN
3.9 TP<ELANE TP<ELANE TP<ELANE
3.10 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 MTLGRRLACLFLACV MTLGRRLACLFLACVLPA
3.11 2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 TLGRRLACLFLACVL
3.12 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17 LGRRLACLFLACVLP
3.13 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18 GRRLACLFLACVLPA
4.14 TP<RACK1 TP<RACK1 TP<RACK1
4.15 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 MTEQMTLRGTLKGHN MTEQMTLRGTLKGHNGWV
4.16 2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 TEQMTLRGTLKGHNG
4.17 3,4,5,6,7,8,9,10,11,12,13,14,15,16,17 EQMTLRGTLKGHNGW
4.18 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18 QMTLRGTLKGHNGWV