R每行的计数速度非常慢

R每行的计数速度非常慢,r,dataframe,dplyr,R,Dataframe,Dplyr,我试图获得每行数据帧中出现的所有值,如下所示: a b c d e 1 1 1 0 -1 NA 2 0 -1 -1 1 NA 3 -1 0 NA NA 1 对此 a b c d e count.-1 count.0 count.1 count.NA 1 1 1 0 -1 NA 1 1 2 1 2 0 -1 -1 1 NA 2

我试图获得每行数据帧中出现的所有值,如下所示:

     a   b  c  d  e
  1  1   1  0 -1 NA
  2  0  -1 -1  1 NA
  3  -1  0 NA NA  1
对此

     a   b  c  d  e count.-1 count.0 count.1 count.NA
  1  1   1  0 -1 NA        1       1       2        1
  2  0  -1 -1  1 NA        2       1       1        1
  3  1   0 NA NA  1        0       1       2        2
我现在就是这样做的:

    df = df %>%
  by_row(
    ..f = function(x) {
      sum(is.na(x[1:8]))
    },
    .to = "count_na",
    .collate = "cols"
  ) %>% 
  by_row(
    ..f = function(x) {
      sum(x[1:8] == 1, na.rm = T)
    },
    .to = "count_positive",
    .collate = "cols"
  ) %>% 
  by_row(
    ..f = function(x) {
      sum(x[1:8] == -1, na.rm = T)
    },
    .to = "count_negative",
    .collate = "cols"
  ) %>% 
  by_row(
    ..f = function(x) {
      sum(x[1:8] == 0, na.rm = T)
    },
    .to = "count_neutral",
    .collate = "cols"
  ) 

但问题是,对于5 mil行,这需要永远完成(超过3个小时,有没有更好的方法来完成?

您可以利用
数据表
进行快速处理。首先,将其分解为一个长格式,然后在返回并合并之前按行号和值制表,以获得所需的输出

agg <- dcast(melt(DT[, rn:=.I], id.vars="rn")[, .N, by=.(rn, value)], 
    rn ~ value, sum, value.var="N")
DT[agg, on=.(rn)]
时间:

Unit: seconds
    expr      min       lq     mean  median       uq      max neval
 dtmtd() 10.07663 10.14351 10.17387 10.2104 10.22249 10.23458     3
可能重复,效率不高,但应该比当前版本快。请尝试
cbind(df1,t(apply(df1,1,table,exclude=NULL))
dtmtd <- function() {
    agg <- dcast(melt(DT[, rn:=.I], id.vars="rn")[, .N, by=.(rn, value)], 
        rn ~ value, sum, value.var="N")
    DT[agg, on=.(rn)]

}    
microbenchmark::microbenchmark(dtmtd(), times=3L)
Unit: seconds
    expr      min       lq     mean  median       uq      max neval
 dtmtd() 10.07663 10.14351 10.17387 10.2104 10.22249 10.23458     3