使用R data.Table的表驱动评估_R_Data.table

使用R data.Table的表驱动评估

使用R data.Table的表驱动评估,r,data.table,R,Data.table,构建和评估针对数据集评估的各种条件表的最佳方法是什么例如，假设我想识别数据集中的无效行，如下所示： library("data.table") # notional example -- some observations are wrong, some missing set.seed(1) n = 100 # Number of customers. # Also included are "non-customers" where values except cust_

构建和评估针对数据集评估的各种条件表的最佳方法是什么

例如，假设我想识别数据集中的无效行，如下所示：

library("data.table")

# notional example -- some observations are wrong, some missing
set.seed(1)
n = 100 # Number of customers.
        # Also included are "non-customers" where values except cust_id should be NA.
cust <- data.table( cust_id = sample.int(n+1),
                    first_purch_dt =
                      c(sample(as.Date(c(1:n, NA), origin="2000-01-01"), n), NA),
                    last_purch_dt = 
                      c(sample(as.Date(c(1:n, NA), origin="2000-04-01"), n), NA),
                    largest_purch_amt = 
                      c(sample(c(50:100, NA), n, replace=TRUE), NA),
                    last_purch_amt = 
                      c(sample(c(1:65,NA), n, replace=TRUE), NA)
                    )
setkey(cust, cust_id)

这似乎有效，并且在评估中正确处理

NA

当我说“最好的方法是什么”时，我在问：

这是最好的方法，还是有比
```
rbindlist（lappy（…）
```
更有效或更惯用的替代方法
我目前的方法是否存在缺陷
这是否可以写成一个合并或联接，类似于eval上的客户内部联接检查（checks.condition（cust.values））==TRUE

checks[, cust[eval(parse(text = cond_txt), .SD)][, err_msg := cond_msg], by = cond_id]

.SD

[，err\u msg:=cond\u msg]

cond_msg

cust

检查返回它

检查[，j，by=cond_id]

cond\u id

j=cust[…]

数据。表

数据表的威力。我根本不确定我能用dplyr:：internal\u join
或其他方法做到这一点。我想我必须做一个笛卡尔积，然后过滤eval（）。
err_obs <- 
  rbindlist(
    lapply(1:nrow(checks), function(i) {
      err_set <- cust[eval( parse(text= checks[i,cond_txt]) ) ,  ]
      cbind(err_set, 
            checks[i, .(err_id = rep.int(cond_id, times = nrow(err_set)),
                        err_msg = rep.int(cond_msg, times = nrow(err_set))
                        )]
            )                
    } )
  )
print(err_obs) # returns desired result

checks[, cust[eval(parse(text = cond_txt), .SD)][, err_msg := cond_msg], by = cond_id]