在R data.table列计算中使用ifelse语句取决于第一行的值_R_Data.table

在R data.table列计算中使用ifelse语句取决于第一行的值

在R data.table列计算中使用ifelse语句取决于第一行的值,r,data.table,R,Data.table,我必须在大型数据表（+3000万行）上执行一些正则表达式（实际上其中很多）。其中一列只是重复的字符串（每行相同或缺少），而另一列则是每行不同的字符串。现在，如果第一列值缺失或传递了其他一些正则表达式，我不想做正则表达式，只返回FALSE，如果不缺失，我想看看列是否匹配。这是因为对于数千个data.tables，我确实需要这样做，而且由于正则表达式需要几秒钟的时间，我希望包含一个ifelse语句，如果该语句为FALSE，则正则表达式甚至不会被尝试这就是我所尝试的，但没有一个成功（我也尝试了fiv

我必须在大型数据表（+3000万行）上执行一些正则表达式（实际上其中很多）。其中一列只是重复的字符串（每行相同或缺少），而另一列则是每行不同的字符串。现在，如果第一列值缺失或传递了其他一些正则表达式，我不想做正则表达式，只返回FALSE，如果不缺失，我想看看列是否匹配。这是因为对于数千个data.tables，我确实需要这样做，而且由于正则表达式需要几秒钟的时间，我希望包含一个ifelse语句，如果该语句为FALSE，则正则表达式甚至不会被尝试

这就是我所尝试的，但没有一个成功（我也尝试了

fivelse

和

if\u else

library(data.table)
set.seed(10)
data_table_test <-
  data.table(col  = rep("c", 1e6),
             col2 =  paste(
               sample(letters, 1e6,
                      replace = T),
               sample(letters, 1e6,
                      replace = T),
               sep = ""
             ))

data_table_test2 <-
  data.table(col  = rep(NA, 1e6),
             col2 =  paste(
               sample(letters, 1e6,
                      replace = T),
               sample(letters, 1e6,
                      replace = T),
               sep = ""
             ))


data_table_test[, ':='(matching_letter_1   = stringi::stri_detect_fixed(col2, col),
                       matching_letter_2   = ifelse(is.na(data_table_test[1, col ]), F, stringi::stri_detect_fixed(col2, col))),]

data_table_test2[, ':='(matching_letter_1   = stringi::stri_detect_fixed(col2, col),
                       matching_letter_2   = ifelse(is.na(data_table_test2[1, col ]), F, stringi::stri_detect_fixed(col2, col))),]

编辑预期的输出应该是这样的

data\u table\u test[匹配字母\u 1==TRUE]

应与相同

data\u table\u test[匹配字母\u 2==TRUE]

及

data\u table\u test2[匹配字母\u 1==TRUE]

应与相同（均为空data.tables）

data\u table\u test2[匹配字母\u 2==TRUE]

一个缓慢但实用的tidyverse解决方案是：

data_table_test %>%
  as_tibble() %>%
  rowwise() %>%
  mutate(matching_letter = ifelse(is.na(data_table_test$col[1]), F, stringi::stri_detect_fixed(col2, col))) %>%
  filter(matching_letter)


# A tibble: 75,772 x 3
# Rowwise: 
   col   col2  matching_letter
   <chr> <chr> <lgl>          
 1 c     cb    TRUE           
 2 c     ce    TRUE           
 3 c     yc    TRUE           
 4 c     ch    TRUE           
 5 c     ic    TRUE           
 6 c     gc    TRUE           
 7 c     cg    TRUE           
 8 c     lc    TRUE           
 9 c     ci    TRUE           
10 c     zc    TRUE           
# ... with 75,762 more rows

我没有tidyverse将预期输出与进行比较，请包括没有如此严重依赖关系的预期输出

setmatchingletter=函数（x）{
stopifnot（nrow（x）>0L，c（“col”，“col2”）%在%names（x）中）
v=如果（is.na（x$col[1L]）为假，则为其他{
stringi:：stri_detect_fixed（x$col2，x$col）
}
集合（x，“匹配字母”，v）
}
设置匹配字母（数据表测试）
数据表测试[匹配字母==真]
setmatchingletter（数据表测试2）
数据表测试2[匹配字母==TRUE]

此解决方案假定

stringi:：stri\u detect\u fixed

是“矢量化”的，与问题中的使用不同。

这基本上是

if（is.na（data\u table\u test[1，col]）{data\u table\u test[，matching\u letter:=F，]}否则{data\u table\u test matching\u letter\u 1:=stringi:：stri detect\u fixed（col2，col）}

。这适用于MRE，但适用于我的实际代码，因为我想对30多个列的组合进行测试，所以if语句应该在data.table操作中出现，而不是在it@LSmeets因此，请扩展您的示例以涵盖“我需要测试许多列的组合”。

data_table_test %>%
  as_tibble() %>%
  rowwise() %>%
  mutate(matching_letter = ifelse(is.na(data_table_test$col[1]), F, stringi::stri_detect_fixed(col2, col))) %>%
  filter(matching_letter)


# A tibble: 75,772 x 3
# Rowwise: 
   col   col2  matching_letter
   <chr> <chr> <lgl>          
 1 c     cb    TRUE           
 2 c     ce    TRUE           
 3 c     yc    TRUE           
 4 c     ch    TRUE           
 5 c     ic    TRUE           
 6 c     gc    TRUE           
 7 c     cg    TRUE           
 8 c     lc    TRUE           
 9 c     ci    TRUE           
10 c     zc    TRUE           
# ... with 75,762 more rows


data_table_test2 %>%
  as_tibble() %>%
  rowwise() %>%
  mutate(matching_letter = ifelse(is.na(data_table_test2$col[1]), F, stringi::stri_detect_fixed(col2, col))) %>%
  filter(matching_letter)



# A tibble: 0 x 3
# Rowwise: 
# ... with 3 variables: col <lgl>, col2 <chr>, matching_letter <lgl>

if(is.na(data_table_test[1, col ])){
  data_table_test[, matching_letter := F, ]
}else{
  data_table_test[, matching_letter_1 := stringi::stri_detect_fixed(col2, col),]
}