R 从文档中单词之间的相关性生成网络图_R_Ggplot2_Igraph_Quanteda_Ggnetwork

R 从文档中单词之间的相关性生成网络图

R 从文档中单词之间的相关性生成网络图,r,ggplot2,igraph,quanteda,ggnetwork,R,Ggplot2,Igraph,Quanteda,Ggnetwork,我有兴趣创建一个类似于此个人网站上显示的网络图-此页面上的第一个>> 我希望将此图的节点设置为.txt文档中的==单词（在删除stopwords和其他预处理之后）。我还希望将此图的顶点/边设置为与文档中其他单词的相关性（例如，单词“word”经常出现在单词“up”的旁边），仅说明较强的相关性。我的想法是“节点大小”=文档中的“单词频率”，以及“节点之间的距离”= 词与词之间关系的强弱我目前使用的是R、quanteda和ggplot2以及其他一些依赖项的组合如果有人对我如何在R中生成单词关联（

我有兴趣创建一个类似于此个人网站上显示的网络图-此页面上的第一个>>

我希望将此图的节点设置为.txt文档中的==单词（在删除stopwords和其他预处理之后）。我还希望将此图的顶点/边设置为与文档中其他单词的相关性（例如，单词“word”经常出现在单词“up”的旁边），仅说明较强的相关性。我的想法是“节点大小”=文档中的“单词频率”，以及“节点之间的距离”= 词与词之间关系的强弱

我目前使用的是R、quanteda和ggplot2以及其他一些依赖项的组合

如果有人对我如何在R中生成单词关联（最好是使用quanteda）并将其绘制成图形有任何建议，我将永远感激

当然，如果我对这个问题有任何改进，请告诉我。以下是我目前的尝试：

library(quanteda)
library(readtext)
library(ggplot2)
library(stringi)

## Load the .txt doc 
document <- texts(readtext("file1.txt"))

## Make everything lowercase... store in a seperate variable
documentlower <- char_tolower(document$text)

## Tokenize the lower-case document
documenttokens <- tokens(documentlower, remove_punct = TRUE) %>% as.character()
(total_length <- length(documenttokens)

## Create the Document Frequency Matrix - here we can also remove stopwords and stem
docudfm <- dfm(documentlower, remove_punct = TRUE, remove = stopwords("english"), stem = TRUE)

## Inspect the top 10 Words by Count
textstat_frequency(docudfm, n = 10)

## Create a sorted list of tokens by frequency count
sorted_document <- topfeatures(docudfm, n = nfeat(docudfm))

## Normalize the data points to find their percentage of occurrence in the documents
sorted_document <- sorted_document / sum(sorted_document) * 100

## Also normalize the data points in the DFM
docudfm_pct <- dfm_weight(docudfm, scheme = "prop") * 100

库（quanteda）
图书馆（readtext）
图书馆（GG2）
图书馆（stringi）
##加载.txt文档
文档你看到这个页面了吗？非常感谢你指出这一点-我肯定希望将来将我的大部分文本分析迁移到quanteda包中