如何在twitter文本数据上使用unnest_标记?

如何在twitter文本数据上使用unnest_标记?,r,twitter,tidyverse,unnest,tidytext,R,Twitter,Tidyverse,Unnest,Tidytext,我正在尝试运行以下程序,并给出一条错误消息 data <- c("Who said we cant have a lil dance party while were stuck in Quarantine? Happy Friday Cousins!! We got through another week of Quarantine. Lets continue to stay safe, healthy and make the best of the situation. . .

我正在尝试运行以下程序,并给出一条错误消息

data <- c("Who said we cant have a lil dance party while were stuck in Quarantine? Happy Friday Cousins!! We got through another week of Quarantine. Lets continue to stay safe, healthy and make the best of the situation.  . . Video:  . . -  #blackgirlstraveltoo #everydayafrica #travelnoire #blacktraveljourney #essencetravels #africanculture #blacktravelfeed #blacktravel #melanintravel #ethiopia #representationmatters #blackcommunity #Moyoafrika #browngirlbloggers #travelafrica #blackgirlskillingit #passportstamps #blacktravelista #blackisbeautiful #weworktotravel #blackgirlsrock #mytravelcrush #blackandabroad #blackgirlstravel #blacktravel #africanamerican #africangirlskillingit #africanmusic #blacktravelmovement #blacktravelgram",
      "#Copingwiththelockdown... Festac town, Lagos.  #covid19 #streetphotography #urbanphotography #copingwiththelockdown #documentaryphotography #hustlingandbustling #cityscape #coronavirus #busyroad #everydaypeople #everydaylife #commute #lagosroad #lagosmycity #nigeria #africa #westafrica #lagos #hustle #people #strength #faith #nopoverty #everydayeverywhere #everydayafrica #everydaylagos #nohunger #chroniclesofonyinye",
      "Peace Everywhere. Amani Kila Pahali. Photo by Adan Galma  . * * * * * * #matharestories #mathare #adangalma #everydaymathare #everydayeverywhere #everydayafrica #peace #amani #knowmathare #streets #spi_street #mathareslums")
data_df <- as.data.frame(data)
remove_reg <- "&amp;|&lt;|&gt;"
tidy_data <- data_df %>% 
mutate(text = str_remove_all(text, remove_reg)) %>%
unnest_tokens(word, text, token = "data_df") %>%
filter(!word %in% stop_words$word,
     !word %in% str_remove_all(stop_words$word, "'"),
     str_detect(word, "[a-z]"))

data主要问题是,您为文本列指定了名称
data
,但后来将其称为
text
。请尝试类似以下内容:

库(tidyverse)
图书馆(tidytext)
文本%
反加入(get_stopwords())%>%
过滤器(str_检测(单词“[a-z]”)
#>连接,通过=“word”
#>#A tibble:105 x 1
#>话
#>         
#>我说
#>2不能
#>3里尔
#>4舞蹈
#>第五方
#>6卡
#>7检疫
#>8快乐
#>9星期五
#>10个表亲
#>#…还有95行

如果您对Twitter数据特别感兴趣,请考虑使用<代码>令牌=“推特”< /> >:

数据\u df%>%
unnest_标记(单词、文本、标记=“tweets”)
#>将'to_lower=TRUE'与'token='tweets'一起使用可能不会保留URL。
#>#A tibble:121 x 1
#>话
#>    
#>1谁
#>2说
#>3我们
#>4不能
#>5有
#>6 a
#>7里尔
#>8舞蹈
#>第九方
#>十分钟
#>#…还有111行
由(v0.3.0)于2020年4月12日创建

此选项可以很好地处理hashtag和用户名