将Twitter数据转换为整洁的格式_R_Nlp_Tidyverse_Tidytext

将Twitter数据转换为整洁的格式

r nlp

将Twitter数据转换为整洁的格式,r,nlp,tidyverse,tidytext,R,Nlp,Tidyverse,Tidytext,我正在尝试使用以下格式和代码将推文转换为整洁的文本格式： ## Convert twitter into a tidy text format where the unit of analysis is the ##`tweet_id-handle-time_stamp-word` tidy_format = trump_clinton_tweets %>% mutate(tweet_id = row_number()) %>% tidy_format = tidy_for

我正在尝试使用以下格式和代码将推文转换为整洁的文本格式：

    ## Convert twitter into a tidy text format where the unit of analysis is the ##`tweet_id-handle-time_stamp-word`
tidy_format = trump_clinton_tweets %>% mutate(tweet_id = row_number()) %>% 
tidy_format = tidy_format %>% group_by(tweet_id) %>% unnest_tokens(word, text, token = "tweets")  %>% 
glimpse(tidy_format)

我一直在犯这样的错误：

“检查输入（x）中出错：输入必须是任意长度的字符向量或字符列表向量，每个向量的长度为1“

打印正在清理的tweet，您将知道哪个tweet产生错误，很可能是由于tweet中的空字符串导致了此错误。

在使用

unnest\u标记（）之前，您不需要group\u by（）
；它将保留tweet\u id
列，并且不会跨tweet折叠。