String R使用“将字符串转换为向量标记化”&引用;
我有一个字符串:String R使用“将字符串转换为向量标记化”&引用;,string,r,vector,String,R,Vector,我有一个字符串: string1 <- "This is my string" 我该怎么做?我知道我可以使用tm包转换成termDocumentMatrix,然后转换成一个矩阵,但它会按字母顺序排列单词,我需要它们保持相同的顺序。您可以使用strsplit来完成此任务 string1 <- "This is my string" strsplit(string1, " ")[[1]] #[1] "This" "is" "my" "string" string1
string1 <- "This is my string"
我该怎么做?我知道我可以使用
tm
包转换成termDocumentMatrix
,然后转换成一个矩阵,但它会按字母顺序排列单词,我需要它们保持相同的顺序。您可以使用strsplit来完成此任务
string1 <- "This is my string"
strsplit(string1, " ")[[1]]
#[1] "This" "is" "my" "string"
string1与Dason稍有不同,但这将分割任意数量的空白,包括换行符:
string1 <- "This is my
string"
strsplit(string1, "\\s+")[[1]]
string1试试:
对于您的问题,这是一个过度设计的解决方案。使用Sacha的方法进行strsplit通常很好。作为补充,我们还可以使用unlist()
从给定的列表结构生成向量:
string1 <- "This is my string" # get a list structure
unlist(strsplit(string1, "\\s+")) # unlist the list
#[1] "This" "is" "my" "string"
string1如果您只是通过在空格上拆分来提取单词,这里有几个不错的选择
string1 <- "This is my string"
scan(text = string1, what = "")
# [1] "This" "is" "my" "string"
library(stringi)
stri_split_fixed(string1, " ")[[1]]
# [1] "This" "is" "my" "string"
stri_extract_all_words(string1, simplify = TRUE)
# [,1] [,2] [,3] [,4]
# [1,] "This" "is" "my" "string"
stri_split_boundaries(string1, simplify = TRUE)
# [,1] [,2] [,3] [,4]
# [1,] "This " "is " "my " "string"
string1-Dason提供了一个很好的解决方案,但如果你的文本比这更复杂(如标点符号等),你需要一个更强大的方法。可能重复:GSee要求不同的东西。screechOwl希望将单个字符向量拆分为单词,因为您提供的链接表明海报希望输入将转换为字符的未加引号的单词。@Tylerlinker,是的,我想这个问题不是重复的,但一些答案回答了这个问题,如,或
string1 <- "This is my string" # get a list structure
unlist(strsplit(string1, "\\s+")) # unlist the list
#[1] "This" "is" "my" "string"
string1 <- "This is my string"
scan(text = string1, what = "")
# [1] "This" "is" "my" "string"
library(stringi)
stri_split_fixed(string1, " ")[[1]]
# [1] "This" "is" "my" "string"
stri_extract_all_words(string1, simplify = TRUE)
# [,1] [,2] [,3] [,4]
# [1,] "This" "is" "my" "string"
stri_split_boundaries(string1, simplify = TRUE)
# [,1] [,2] [,3] [,4]
# [1,] "This " "is " "my " "string"