R 如何将文本拆分为向量,其中每个条目对应于分配给每个唯一单词的索引值?
假设我有一个文档,其中包含一些文本,如以下所示:R 如何将文本拆分为向量,其中每个条目对应于分配给每个唯一单词的索引值?,r,dplyr,word,stringi,R,Dplyr,Word,Stringi,假设我有一个文档,其中包含一些文本,如以下所示: doc <- 'Questions with similar titles have frequently been downvoted and/or closed. Consider using a title that more accurately describes your question.' 像这样的dply方法不起作用 有没有更有效的方法 为了给出一个更简单的示例来显示预期的输出,我想要一个如下所示的数据帧: word
doc <- 'Questions with similar titles have frequently been downvoted and/or closed. Consider using a title that more accurately describes your question.'
像这样的dply方法不起作用
有没有更有效的方法
为了给出一个更简单的示例来显示预期的输出,我想要一个如下所示的数据帧:
words id
1 to 1
2 row 2
3 zip 3
4 zip 3
其中,我的起始词向量是:
doc使用sapply的廉价方式
数据
doc <- 'Questions with with titles have frequently been downvoted and/or closed. Consider using a title that more accurately describes your question.'
使用sapply的廉价方式
数据
doc <- 'Questions with with titles have frequently been downvoted and/or closed. Consider using a title that more accurately describes your question.'
如果可能的话,可以在这里添加您的预期输出。我甚至不确定你的问题是否是个骗局,但如果你展示了你想做的事情,有人可以编辑你的标题。@TimBiegeleisen我已经添加了。只需dfall$id@DavidArenburg就可以了。。而且比sapply快得多。如果你发帖子,我会接受。如果可能的话,可以在这里添加你的预期输出。我甚至不确定你的问题是否是个骗局,但如果你展示了你想做的事情,有人可以编辑你的标题。@TimBiegeleisen我已经添加了。只需dfall$id@DavidArenburg就可以了。。而且比sapply快得多。我会接受的,如果你发布。看起来有重复的吗?你这样做是为了演示吗?是的,我是在紫色中做的,因为你的例子没有任何重复。我是个笨蛋。谢谢,看起来像是复制的?你这样做是为了演示吗?是的,我是在紫色中做的,因为你的例子没有任何重复。我是个笨蛋。非常感谢。
words id
1 to 1
2 row 2
3 zip 3
4 zip 3
doc <- 'Questions with with titles have frequently been downvoted and/or closed. Consider using a title that more accurately describes your question.'
alldf=cbind(dfall,sapply(1:nrow(dfall),function(x) which(uniquedf$words==dfall$words[x])))
colnames(alldf)=c("words","id")
> alldf
words id
1 questions 1
2 with 2
3 with 2
4 titles 3
5 have 4
6 frequently 5
7 been 6
8 downvoted 7
9 and 8
10 or 9
11 closed 10
12 consider 11
13 using 12
14 a 13
15 title 14
16 that 15
17 more 16
18 accurately 17
19 describes 18
20 your 19
21 question 20