Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/81.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 计算两个字符串中的常用词_R_String_Text Mining_Data Analysis - Fatal编程技术网

R 计算两个字符串中的常用词

R 计算两个字符串中的常用词,r,string,text-mining,data-analysis,R,String,Text Mining,Data Analysis,我有两条线: a <- "Roy lives in Japan and travels to Africa" b <- "Roy travels Africa with this wife" a也许,使用intersect和stru-extract 对于多个字符串,您可以将它们作为列表或向量 vec1 <- c(a,b) Reduce(`intersect`,str_extract_all(vec1, "\\w+")) #[1] "Roy" "travels"

我有两条线:

a <- "Roy lives in Japan and travels to Africa"
b <- "Roy travels Africa with this wife"

a也许,使用
intersect
stru-extract
对于
多个字符串
,您可以将它们作为
列表
向量

 vec1 <- c(a,b)
 Reduce(`intersect`,str_extract_all(vec1, "\\w+"))
 #[1] "Roy"     "travels" "Africa" 
计数:

 length(Reduce(`intersect`,stri_extract_all_regex(vec1,"\\w+")))
 #[1] 3
或使用
base R

  Reduce(`intersect`,regmatches(vec1,gregexpr("\\w+", vec1)))
  #[1] "Roy"     "travels" "Africa" 
您可以使用和从
base
库:

> a <- "Roy lives in Japan and travels to Africa"
> b <- "Roy travels Africa with this wife"
> a_split <- unlist(strsplit(a, sep=" "))
> b_split <- unlist(strsplit(b, sep=" "))
> length(intersect(a_split, b_split))
[1] 3
>a b a_分割b_分割长度(相交(a_分割,b_分割))
[1] 3

此方法可推广到n个向量:

a <- "Roy lives in Japan and travels to Africa"
b <- "Roy travels Africa with this wife"
c <- "Bob also travels Africa for trips but lives in the US unlike Roy."

library(stringi);library(qdapTools)
X <- stri_extract_all_words(list(a, b, c))
X <- mtabulate(X) > 0
Y <- colSums(X) == nrow(X); names(Y)[Y]

[1] "Africa"  "Roy"     "travels"

a实际上我并不建议这样做,但使用“stra”和“strb”,您可能只需执行
merge(stra,strb)
…参数“sep”需要更改为“split”->a\u split
  Reduce(`intersect`,regmatches(vec1,gregexpr("\\w+", vec1)))
  #[1] "Roy"     "travels" "Africa" 
> a <- "Roy lives in Japan and travels to Africa"
> b <- "Roy travels Africa with this wife"
> a_split <- unlist(strsplit(a, sep=" "))
> b_split <- unlist(strsplit(b, sep=" "))
> length(intersect(a_split, b_split))
[1] 3
a <- "Roy lives in Japan and travels to Africa"
b <- "Roy travels Africa with this wife"
c <- "Bob also travels Africa for trips but lives in the US unlike Roy."

library(stringi);library(qdapTools)
X <- stri_extract_all_words(list(a, b, c))
X <- mtabulate(X) > 0
Y <- colSums(X) == nrow(X); names(Y)[Y]

[1] "Africa"  "Roy"     "travels"