R 连接模式匹配

R 连接模式匹配,r,join,data.table,R,Join,Data.table,我有以下数据表: > measures source measure 1: my123 0.08130182 2: 123my -1.45285168 3: your123 -0.30460771 4: 123your 0.94670380 5: 12your3 -0.54728546 > sources name pattern 1: My Source my 2: Your Source your 使用创建 m

我有以下数据表:

> measures
    source     measure
1:   my123  0.08130182
2:   123my -1.45285168
3: your123 -0.30460771
4: 123your  0.94670380
5: 12your3 -0.54728546
> sources
          name pattern
1:   My Source      my
2: Your Source    your
使用创建

measures <- data.table(source=c('my123', '123my', 'your123', '123your', '12your3'), measure=rnorm(5))
sources <- data.table(name=c('My Source', 'Your Source'), pattern=c('my', 'your'))
这将返回所需的:

source  |   measure   |     name     | pattern
--------+-------------+--------------+---------
my123   |  0.08130182 | My Source    | my
123my   | -1.45285168 | My Source    | my
your123 | -0.30460771 | Your Sources | your
123your |  0.94670380 | Your Sources | your
your123 |  0.94670380 | Your Sources | your

我不确定这是否属于“不切实际”,但这可以做到。。。对于更复杂的模式匹配,我将处理拼贴器

> rbind.pages(lapply(1:nrow(measures), function(i){
       matched_slice <- which(stri_detect_regex(measures[i,1],sources$pattern))
       data.frame(measures[i,], sources[matched_slice, ])
  }))
   source     measure        name pattern
1   my123  0.75119183   My Source      my
2   123my  0.55344334   My Source      my
3 your123 -0.03498414 Your Source    your
4 123your  0.09364795 Your Source    your
5 12your3  0.47537732 Your Source    your

我想你的意思是“度量值”
source
而不是
measures.name
这个标记为“高优先级”的fwiw有一个开放的FR。@G5W谢谢,我修复了它it@Frank谢谢你的参考谢谢,不幸的是,这对我来说仍然不可行,因为我正在处理的数据集需要花费太长时间。
> rbind.pages(lapply(1:nrow(measures), function(i){
       matched_slice <- which(stri_detect_regex(measures[i,1],sources$pattern))
       data.frame(measures[i,], sources[matched_slice, ])
  }))
   source     measure        name pattern
1   my123  0.75119183   My Source      my
2   123my  0.55344334   My Source      my
3 your123 -0.03498414 Your Source    your
4 123your  0.09364795 Your Source    your
5 12your3  0.47537732 Your Source    your
rbindlist(lapply(1:nrow(measures), function(i){
    matched_slice <- which(stri_detect_regex(measures[i,1],sources$pattern))
    cbind(measures[i,], sources[matched_slice, ])
}))