Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/79.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在R中的Dataframe中对文本列执行文本分析_R_Dataframe_Text Analysis - Fatal编程技术网

在R中的Dataframe中对文本列执行文本分析

在R中的Dataframe中对文本列执行文本分析,r,dataframe,text-analysis,R,Dataframe,Text Analysis,我已将CSV文件导入R中的数据框,其中一列包含文本 我想对文本进行分析。我该怎么做呢 我尝试创建一个只包含文本列的新数据框 OnlyTXT= Txtanalytics1 %>% select(problem_note_text) View(OnlyTXT). 这可以让你开始 install.packages("gtools", dependencies = T) library(gtools) # if problems calling library, install.packa

我已将CSV文件导入R中的数据框,其中一列包含文本

我想对文本进行分析。我该怎么做呢

我尝试创建一个只包含文本列的新数据框

OnlyTXT= Txtanalytics1 %>%
  select(problem_note_text)
View(OnlyTXT). 

这可以让你开始

install.packages("gtools", dependencies = T)
library(gtools) # if problems calling library, install.packages("gtools", dependencies = T)
library(qdap) # qualitative data analysis package (it masks %>%)
library(tm) # framework for text mining; it loads NLP package
library(Rgraphviz) # depict the terms within the tm package framework
library(SnowballC); library(RWeka); library(rJava); library(RWekajars)  # wordStem is masked from SnowballC
library(Rstem) # stemming terms as a link from R to Snowball C stemmer
以下假设文本变量(onlyText)位于标记为“text”的数据框“df”中


df$text您想从文本列中得到什么?想象一下,数一数
表中的字符(unlist(strsplit(onlytext[,1],“”))
。我正在尝试提取列中的每一行文本,并解析文本以获得词频,同时通过删除停止词和词干来清理文本数据。请查看
tm
SnowballC
软件包。我使用“DataframeSource(OnlyText)”将数据框中的每一行文本作为单独的文档获取。我想对这些单词进行分析。qdap不应该屏蔽
%%>%%
,因为它通过
`%>%````
从dplyr导入它:
df$text <- as.character(df$text) # to make sure it is text

# prepare the text by lower casing, removing numbers and white spaces, punctuation and unimportant words.  The `tm::`prefix is being cautious.
df$text <- tolower(df$text)
df$text <- tm::removeNumbers(df$text)
df$text <- str_replace_all(df$text, "  ", "") # replace double spaces with single space
df$text <- str_replace_all(df$text, pattern = "[[:punct:]]", " ")

df$text <- tm::removeWords(x = df$text, stopwords(kind = "SMART"))

corpus <- Corpus(VectorSource(df$text)) # turn into corpus

tdm <- TermDocumentMatrix(corpus) # create tdm from the corpus

freq_terms(text.var = df$text, top = 25) # find the 25 most frequent words