R 对行中的唯一字符串模式进行计数_R_Stringi

R 对行中的唯一字符串模式进行计数

R 对行中的唯一字符串模式进行计数,r,stringi,R,Stringi,我举了一个例子： dat <- read.table(text="index string 1 'I have first and second' 2 'I have first, first' 3 'I have second and first and thirdeen'", header=TRUE) toMatch <- c('first', 'second', 'third') dat$count <- stri_count_re

我举了一个例子：

dat <- read.table(text="index  string
1      'I have first and second'
2      'I have first, first'
3      'I have second and first and thirdeen'", header=TRUE)


toMatch <-  c('first', 'second', 'third')

dat$count <- stri_count_regex(dat$string, paste0('\\b',toMatch,'\\b', collapse="|"))

dat

index                               string count
1     1              I have first and second     2
2     2                  I have first, first     2
3     3 I have second and first and thirdeen     2

你能告诉我如何修改原来的公式吗？非常感谢

使用base R，您可以执行以下操作：

sapply(dat$string, function(x) 
    {sum(sapply(toMatch, function(y) {grepl(paste0('\\b', y, '\\b'), x)}))})

[1] 2 1 2

希望这有帮助

我们可以使用

stri\u match\u all

来代替，它为我们提供精确的匹配，然后使用

n\u distinct

或

length（unique（x））

在base中计算不同的值

library(stringi)
library(dplyr)
sapply(stri_match_all(dat$string, regex = paste0('\\b',toMatch,'\\b',
                    collapse="|")), n_distinct)

#[1] 2 1 2

或者类似地在R底

sapply(stri_match_all(dat$string, regex = paste0('\\b',toMatch,'\\b',
         collapse="|")), function(x) length(unique(x)))

#[1] 2 1 2

只需要一个循环：

sapply（stri_extract_all_regex（dat$string，paste0（'\\b'，toMatch，'\\b'，collapse=“|”）、函数（x）长度（唯一（x））

这也是一个很好的解决方案，更接近OP自己的尝试。我认为你可以/应该补充这一点作为回答：）

sapply(stri_match_all(dat$string, regex = paste0('\\b',toMatch,'\\b',
         collapse="|")), function(x) length(unique(x)))

#[1] 2 1 2