将函数应用于R data.table对象,该对象将更新多个列并创建多个行

将函数应用于R data.table对象,该对象将更新多个列并创建多个行,r,data.table,R,Data.table,我的起始表如下所示: CHROM POS REF ALT GT 1: 1 58211 A G 1/1 2: 1 6464767 CAAATAAATAAATAAATAAATAAAT C,CAAATAAATAAATAAATAAATAAATAAA

我的起始表如下所示:

   CHROM     POS                       REF                                 ALT  GT
1:     1   58211                         A                                   G 1/1
2:     1 6464767 CAAATAAATAAATAAATAAATAAAT C,CAAATAAATAAATAAATAAATAAATAAATAAAT 1/2
3:    12   83011                         T                                   C 0/1
4:    18 1541042                         C                                 T,A 1/2
   CHROM     POS                       REF                                 ALT  GT
1:     1   58211                         A                                   G 1/1
2:     1 6464767 CAAATAAATAAATAAATAAATAAAT                                   C 1/2
3:     1 6464791                         T                           TAAATAAAT 1/2
4:    12   83011                         T                                   C 0/1
5:    18 1541042                         C                                 T,A 1/2
我想应用一个函数“ap2”,将第2行上的长REF和ALT条目拆分为两个较短的条目,更新第2行上的数据(更改REF、ALT和GT),并插入一个新行(#3带有新的POS、ALT和GT)。结果如下所示:

   CHROM     POS                       REF                                 ALT  GT
1:     1   58211                         A                                   G 1/1
2:     1 6464767 CAAATAAATAAATAAATAAATAAAT C,CAAATAAATAAATAAATAAATAAATAAATAAAT 1/2
3:    12   83011                         T                                   C 0/1
4:    18 1541042                         C                                 T,A 1/2
   CHROM     POS                       REF                                 ALT  GT
1:     1   58211                         A                                   G 1/1
2:     1 6464767 CAAATAAATAAATAAATAAATAAAT                                   C 1/2
3:     1 6464791                         T                           TAAATAAAT 1/2
4:    12   83011                         T                                   C 0/1
5:    18 1541042                         C                                 T,A 1/2
如果我运行ap2函数,它将显示预期结果(V1-V4列):

但是,如果尝试更新原始列,则会出现错误:

tmp[, c("POS","REF","ALT","GT") := ap2(POS,REF,ALT,GT), by=c("CHROM","POS","REF","ALT","GT")]
Warning messages:
1: In `[.data.table`(tmp, , `:=`(c("POS", "REF", "ALT", "GT"), ap2(POS,  :
  RHS 1 is length 2 (greater than the size (1) of group 2). The last 1 element(s) will be discarded.
2: In `[.data.table`(tmp, , `:=`(c("POS", "REF", "ALT", "GT"), ap2(POS,  :
  RHS 2 is length 2 (greater than the size (1) of group 2). The last 1 element(s) will be discarded.
3: In `[.data.table`(tmp, , `:=`(c("POS", "REF", "ALT", "GT"), ap2(POS,  :
  RHS 3 is length 2 (greater than the size (1) of group 2). The last 1 element(s) will be discarded.
4: In `[.data.table`(tmp, , `:=`(c("POS", "REF", "ALT", "GT"), ap2(POS,  :
  RHS 4 is length 2 (greater than the size (1) of group 2). The last 1 element(s) will be discarded.
下面是创建tmp data.table的代码

data.table(
  CHROM = as.character(c("1","1","12","18")) ,
  POS = as.integer(c(58211,6464767,83011,1541042)) ,
  REF = c("A","CAAATAAATAAATAAATAAATAAAT","T","C") ,
  ALT = c("G","C,CAAATAAATAAATAAATAAATAAATAAATAAAT","C","T,A") ,
  GT = c("1/1","1/2","0/1","1/2")
)
这就是我试图应用的函数:

ap2 <- function(pos,ref,alt,gt) {
  if(gt=="1/2") {
    alt.split <- unlist(strsplit(alt,","))
    matching <- attr(regexpr(ref,alt.split), "match.length")
    if(max(matching) == -1) {
      list(pos,ref,alt,gt)
    } else {
      alt.new <- NULL
      ref.new <- NULL
      pos.new <- NULL
      gt.new <- NULL
      for(i in 1:length(matching)) {
        stopPos <- matching[i]
        if(stopPos == -1) {
          pos.new <- c(pos.new,as.integer(pos))
          ref.new <- c(ref.new,ref)
          alt.new <- c(alt.new,alt.split[i])
        } else {
          pos.new <- c(pos.new, as.integer(pos+matching[i]-1))
          ref.new <- c(ref.new, substring(ref,stopPos))
          alt.new <- c(alt.new, substring(alt.split[i],stopPos))
        }
        gt.new <- c(gt.new, "0/1")
      }
      list(pos.new, ref.new, alt.new, gt.new)
    }
  }
}

ap2如果要更改行数,我很确定您不能使用
:=
。只需修改ap2
,使其在不需要拆分的情况下返回当前正在执行的行时保持不变。你会得到你想要的东西,不幸的是需要复制这个表。@BrodieG根据这条线索,这是可能的。在上面的示例中,请尝试:“tmp[,list(ALTSPLIT=unlist(strsplit(ALT,,”),by=eval(colnames(tmp))]”。我只是无法研究如何对多个列执行此操作并替换原始列。在该示例中,Arun没有通过引用修改表。他正在复印。注意,答案中没有一个
:=