使用stringr和str_count返回字符串中唯一的字数_R_Stringr

使用stringr和str_count返回字符串中唯一的字数

使用stringr和str_count返回字符串中唯一的字数,r,stringr,R,Stringr,有没有一种方法可以使用str_count来计算字符串中的唯一单词？我希望下面的简单代码返回2而不是6 library(tidyverse) string <- "Z AD Banana EW Z AD Z AD X" str_count(string, "Z|AD") Returns: 6 库（tidyverse） string一种方法是提取满足模式的所有值，然后计算唯一值 library(dplyr) library(stringr)

有没有一种方法可以使用str_count来计算字符串中的唯一单词？我希望下面的简单代码返回2而不是6

library(tidyverse)

string <- "Z AD Banana EW Z AD Z AD X" 

str_count(string, "Z|AD")

Returns: 6

库（tidyverse）
string一种方法是提取满足模式的所有值，然后计算唯一值
library(dplyr)
library(stringr)

n_distinct(str_extract_all(string, "Z|AD")[[1]])
#[1] 2

这可以用base R写成：
length(unique(regmatches(string, gregexpr("Z|AD", string))[[1]]))

一种方法是提取满足模式的所有值，然后计算唯一值
library(dplyr)
library(stringr)

n_distinct(str_extract_all(string, "Z|AD")[[1]])
#[1] 2

这可以用base R写成：
length(unique(regmatches(string, gregexpr("Z|AD", string))[[1]]))

我们可以使用
library(stringr)
library(purrr)
map_lgl(c("Z", "AD"), ~ str_detect(string, .x)) %>% sum
#[1] 2

我们可以使用
library(stringr)
library(purrr)
map_lgl(c("Z", "AD"), ~ str_detect(string, .x)) %>% sum
#[1] 2

这与上面的字符串配合得很好。但是，当我尝试在数据集上应用它时（mutate=n_distinct（str_extract_all（string_var，pattern））[[1]]），它会为每一行返回相同的值。如果有列值，则不需要[[1]]
。尝试df%>%mutate（temp=str_extract_all（string，'Z|AD'）、n=map_dbl（temp，n_distinct））
Brilliant！为什么我要把地图扔进去？这是因为str_extract_all返回一个列表吗？是的，在本例中，因为我只使用了一个字符串[[1]]
，但如果有多个字符串，则需要使用map
或lappy
。这与上面的字符串配合得很好。但是，当我尝试在数据集上应用它时（mutate=n_distinct（str_extract_all（string_var，pattern））[[1]]），它会为每一行返回相同的值。如果有列值，则不需要[[1]]
。尝试df%>%mutate（temp=str_extract_all（string，'Z|AD'）、n=map_dbl（temp，n_distinct））
Brilliant！为什么我要把地图扔进去？这是因为str_extract_all返回一个列表吗？是的，在本例中，因为我只使用了一个字符串[[1]]
，但如果有多个字符串，则需要使用map
或lappy
。