Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/ruby-on-rails-3/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何通过在R中的每次观察找到最常用的单词?_R_Nlp_Text Mining - Fatal编程技术网

如何通过在R中的每次观察找到最常用的单词?

如何通过在R中的每次观察找到最常用的单词?,r,nlp,text-mining,R,Nlp,Text Mining,我是NLP的新手。求你了,不要严格地评判我 我有一个非常大的客户反馈数据框架,我的目标是分析反馈。我在反馈中标记单词,删除停止单词(SMART)。现在,我需要收到一张使用频率最高、使用频率较低的单词表 代码如下所示: library(tokenizers) library(stopwords) words_as_tokens <- tokenize_words(dat$description, stopwords = stopwords

我是NLP的新手。求你了,不要严格地评判我

我有一个非常大的客户反馈数据框架,我的目标是分析反馈。我在反馈中标记单词,删除停止单词(SMART)。现在,我需要收到一张使用频率最高、使用频率较低的单词表

代码如下所示:

library(tokenizers)
library(stopwords)
words_as_tokens <- 
     tokenize_words(dat$description, 
                    stopwords = stopwords(language = "en", source = "smart"))
库(标记器)
图书馆(stopwords)
单词作为代词试试这个

library(tokenizers)
library(stopwords)
library(tidyverse)

# count freq of words
words_as_tokens <- setNames(lapply(sapply(dat$description, 
                                 tokenize_words, 
                                 stopwords = stopwords(language = "en", source = "smart")), 
                          function(x) as.data.frame(sort(table(x), TRUE), stringsAsFactors = F)), dat$name)

# tidyverse's job
df <- words_as_tokens %>%
  bind_rows(, .id = "name") %>%
  rename(word = x)

# output
df

#    name          word Freq
# 1  John    experience    2
# 2  John          word    2
# 3  John    absolutely    1
# 4  John        action    1
# 5  John        amazon    1
# 6  John     amazon.ae    1
# 7  John     answering    1
# ....
# 42 Alex         break    2
# 43 Alex          nice    2
# 44 Alex         times    2
# 45 Alex             8    1
# 46 Alex        accent    1
# 47 Alex        africa    1
# 48 Alex        agents    1
# ....
库(标记器)
图书馆(stopwords)
图书馆(tidyverse)
#字数
单词作为代词您可以尝试使用,如下所示:

library(quanteda)
# define a corpus object to store your initial documents
mycorpus = corpus(dat$description)
# convert the corpus to a Document-Feature Matrix
mydfm = dfm( mycorpus, 
             tolower = TRUE, 
             remove = stopwords(),  # this removes English stopwords
             remove_punct = TRUE,   # this removes punctuation
             remove_numbers = TRUE, # this removes digits
             remove_symbol = TRUE,  # this removes symbols 
             remove_url = TRUE )     # this removes urls

# calculate word frequencies and return a data.frame
word_frequencies = textstat_frequency( mydfm )

对不起,“x”是什么?什么是函数?@k1rgas它是lambda函数的参数,通常在函数中使用。例如,如果您想知道data.frame的每列缺少多少值,您可以这样做:
apply(df,2,函数(x)sum(is.na(x))
。在这种情况下,
x
是data.frame的每列
df
library(quanteda)
# define a corpus object to store your initial documents
mycorpus = corpus(dat$description)
# convert the corpus to a Document-Feature Matrix
mydfm = dfm( mycorpus, 
             tolower = TRUE, 
             remove = stopwords(),  # this removes English stopwords
             remove_punct = TRUE,   # this removes punctuation
             remove_numbers = TRUE, # this removes digits
             remove_symbol = TRUE,  # this removes symbols 
             remove_url = TRUE )     # this removes urls

# calculate word frequencies and return a data.frame
word_frequencies = textstat_frequency( mydfm )