R data.table-字符串的快速比较_R_Data.table_Grepl

R data.table-字符串的快速比较

R data.table-字符串的快速比较,r,data.table,grepl,R,Data.table,Grepl,我想找到以下问题的快速解决方案。示例非常小，实际数据很大，速度是一个重要因素我有两个字符串向量，目前在data.tables中，但这并不重要。我需要从第二个向量中的一个向量中找到字符串出现的频率，并保留这些结果范例 library(data.table) dt1<-data.table(c("first","second","third"),c(0,0,0)) dt2<-data.table(c("first and second","third and fifth","se

我想找到以下问题的快速解决方案。示例非常小，实际数据很大，速度是一个重要因素

我有两个字符串向量，目前在data.tables中，但这并不重要。我需要从第二个向量中的一个向量中找到字符串出现的频率，并保留这些结果

范例

library(data.table)

dt1<-data.table(c("first","second","third"),c(0,0,0))
dt2<-data.table(c("first and second","third and fifth","second and no else","first and second and third"))

库（data.table）
dt1
此外，您可能还可以使用固定字符串匹配来提高速度。
在这种情况下，您可以从stringi
软件包中使用stri\u detect\u fixed
：
dt1[, V2 := sapply(V1, function(x) sum(stri_detect_fixed(dt2$V1, x)))]

您的实际数据有多大？两个表中的1000+和一些字符串的速度相当长。谢谢
dt1<-data.table(rep(c("first","second","third"),10),rep(c(0,0,0),10))
dt2<-data.table(rep(c("first and second","third and fifth","second and no else","first and second and third"),10))

pm<-proc.time()
for (l in 1:dim(dt2)[1]) {
    for (k in 1:dim(dt1)[1]) set(dt1,k,2L,dt1[k,V2]+as.integer(grepl(dt1[k,V1],dt2[l,V1])))
}
proc.time() - pm

   user  system elapsed 
   1.93    0.06    2.06 

dt1[, V2 := sapply(V1, function(x) sum(grepl(x, dt2$V1)))]

dt1[, V2 := sapply(V1, function(x) sum(stri_detect_fixed(dt2$V1, x)))]