将函数应用于R data.table对象,该对象将更新多个列并创建多个行
我的起始表如下所示:将函数应用于R data.table对象,该对象将更新多个列并创建多个行,r,data.table,R,Data.table,我的起始表如下所示: CHROM POS REF ALT GT 1: 1 58211 A G 1/1 2: 1 6464767 CAAATAAATAAATAAATAAATAAAT C,CAAATAAATAAATAAATAAATAAATAAA
CHROM POS REF ALT GT
1: 1 58211 A G 1/1
2: 1 6464767 CAAATAAATAAATAAATAAATAAAT C,CAAATAAATAAATAAATAAATAAATAAATAAAT 1/2
3: 12 83011 T C 0/1
4: 18 1541042 C T,A 1/2
CHROM POS REF ALT GT
1: 1 58211 A G 1/1
2: 1 6464767 CAAATAAATAAATAAATAAATAAAT C 1/2
3: 1 6464791 T TAAATAAAT 1/2
4: 12 83011 T C 0/1
5: 18 1541042 C T,A 1/2
我想应用一个函数“ap2”,将第2行上的长REF和ALT条目拆分为两个较短的条目,更新第2行上的数据(更改REF、ALT和GT),并插入一个新行(#3带有新的POS、ALT和GT)。结果如下所示:
CHROM POS REF ALT GT
1: 1 58211 A G 1/1
2: 1 6464767 CAAATAAATAAATAAATAAATAAAT C,CAAATAAATAAATAAATAAATAAATAAATAAAT 1/2
3: 12 83011 T C 0/1
4: 18 1541042 C T,A 1/2
CHROM POS REF ALT GT
1: 1 58211 A G 1/1
2: 1 6464767 CAAATAAATAAATAAATAAATAAAT C 1/2
3: 1 6464791 T TAAATAAAT 1/2
4: 12 83011 T C 0/1
5: 18 1541042 C T,A 1/2
如果我运行ap2函数,它将显示预期结果(V1-V4列):
但是,如果尝试更新原始列,则会出现错误:
tmp[, c("POS","REF","ALT","GT") := ap2(POS,REF,ALT,GT), by=c("CHROM","POS","REF","ALT","GT")]
Warning messages:
1: In `[.data.table`(tmp, , `:=`(c("POS", "REF", "ALT", "GT"), ap2(POS, :
RHS 1 is length 2 (greater than the size (1) of group 2). The last 1 element(s) will be discarded.
2: In `[.data.table`(tmp, , `:=`(c("POS", "REF", "ALT", "GT"), ap2(POS, :
RHS 2 is length 2 (greater than the size (1) of group 2). The last 1 element(s) will be discarded.
3: In `[.data.table`(tmp, , `:=`(c("POS", "REF", "ALT", "GT"), ap2(POS, :
RHS 3 is length 2 (greater than the size (1) of group 2). The last 1 element(s) will be discarded.
4: In `[.data.table`(tmp, , `:=`(c("POS", "REF", "ALT", "GT"), ap2(POS, :
RHS 4 is length 2 (greater than the size (1) of group 2). The last 1 element(s) will be discarded.
下面是创建tmp data.table的代码
data.table(
CHROM = as.character(c("1","1","12","18")) ,
POS = as.integer(c(58211,6464767,83011,1541042)) ,
REF = c("A","CAAATAAATAAATAAATAAATAAAT","T","C") ,
ALT = c("G","C,CAAATAAATAAATAAATAAATAAATAAATAAAT","C","T,A") ,
GT = c("1/1","1/2","0/1","1/2")
)
这就是我试图应用的函数:
ap2 <- function(pos,ref,alt,gt) {
if(gt=="1/2") {
alt.split <- unlist(strsplit(alt,","))
matching <- attr(regexpr(ref,alt.split), "match.length")
if(max(matching) == -1) {
list(pos,ref,alt,gt)
} else {
alt.new <- NULL
ref.new <- NULL
pos.new <- NULL
gt.new <- NULL
for(i in 1:length(matching)) {
stopPos <- matching[i]
if(stopPos == -1) {
pos.new <- c(pos.new,as.integer(pos))
ref.new <- c(ref.new,ref)
alt.new <- c(alt.new,alt.split[i])
} else {
pos.new <- c(pos.new, as.integer(pos+matching[i]-1))
ref.new <- c(ref.new, substring(ref,stopPos))
alt.new <- c(alt.new, substring(alt.split[i],stopPos))
}
gt.new <- c(gt.new, "0/1")
}
list(pos.new, ref.new, alt.new, gt.new)
}
}
}
ap2如果要更改行数,我很确定您不能使用:=
。只需修改ap2
,使其在不需要拆分的情况下返回当前正在执行的行时保持不变。你会得到你想要的东西,不幸的是需要复制这个表。@BrodieG根据这条线索,这是可能的。在上面的示例中,请尝试:“tmp[,list(ALTSPLIT=unlist(strsplit(ALT,,”),by=eval(colnames(tmp))]”。我只是无法研究如何对多个列执行此操作并替换原始列。在该示例中,Arun没有通过引用修改表。他正在复印。注意,答案中没有一个:=
。