我们如何使用R删除特定用户（拥有大量推文的用户）的推文以进行情绪分析？_R_Sentiment Analysis

我们如何使用R删除特定用户（拥有大量推文的用户）的推文以进行情绪分析？

我们如何使用R删除特定用户（拥有大量推文的用户）的推文以进行情绪分析？,r,sentiment-analysis,R,Sentiment Analysis,目的：对美国法院对同性婚姻的历史判决进行情感分析。 #由于推特的数量对一些用户来说非常高，这可能会引入偏见。我们怎样才能消除它们？ #另外，为什么usafull和total中的独特推文数量不同 rm(list=ls()) library(twitteR) library(wordcloud) library(tm) download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cac

目的：对美国法院对同性婚姻的历史判决进行情感分析。 #由于推特的数量对一些用户来说非常高，这可能会引入偏见。我们怎样才能消除它们？ #另外，为什么usafull和total中的独特推文数量不同

    rm(list=ls())
    library(twitteR)
    library(wordcloud)
    library(tm)

    download.file(url="http://curl.haxx.se/ca/cacert.pem",   destfile="cacert.pem")

    consumer_key <- 'key'
    consumer_secret <- 'secret'
    access_token <- 'key'
    access_secret <- 'secret'
    setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)


    usa <- searchTwitter("#LoveWins", n=1500 , lang="en")

    usa2 <- searchTwitter("#LGBT", n=1500 , lang="en")

    usa3 <- searchTwitter("#gay", n=1500 , lang="en")

#get the text
    tusa <- sapply(usa, function(x) x$getText())
    tusa2 <- sapply(usa2, function(x) x$getText())
    tusa3 <- sapply(usa3, function(x) x$getText())

#join texts
    total <- c(tusa,tusa2,tusa3)

#remove the duplicated tweets
    total <- total[!duplicated(total)]

#no. of unique tweets
    uni <- length(total)

# merging three set of tweets horozontally
    usafull<-c(usa,usa2,usa3)

#convert the tweets into dafa frame
    usafull <- twListToDF(usafull)
    usafull <- unique(usafull)

#to know the dates of the tweets (date formatting)
    usafull$date <- format(usafull$created, format = "%Y-%m-%d")
    table(usafull$date)

#make a table of number of tweets per user in decreasing number of tweets
    tdata <- as.data.frame(table(usafull$screenName))
    tdata <- tdata[order(tdata$Freq, decreasing = T), ]
    names(tdata) <- c("User","Tweets")
    head(tdata)


# plot the freq of tweets over time in two hour windows
    library(ggplot2)
    minutes <-60
    ggplot(data = usafull, aes(x=created))+geom_bar(aes(fill=..count..),    binwidth =60*minutes)+scale_x_datetime("Date")+ scale_y_continuous("Frequency")


#plot the table above for the top 30 to identify any unusual trends
    par(mar=c(5,10,2,2))
    with(tdata[rev(1:30), ], barplot(Tweets, names=User, horiz = T, las =1,     main="Top 30: Tweets per user", col = 1))

# the twitter users with more than 20 tweets for removing bias
    userid <- tdata[(tdata$Tweets>20),]
    userid <- userid[,1]

rm（list=ls（））
图书馆（推特）
图书馆（wordcloud）
图书馆（tm）
下载.file（url=”http://curl.haxx.se/ca/cacert.pem，destfile=“cacert.pem”）
consumer_key从你的代码中我知道你想删除userid
中的tweet，一种方法是这样做
usafull_nobias <- subset(usafull, !(screenName %in% userid$User))

usafull\u nobias据我所知，用户的屏幕名称将在usafull$screenName
中。因此，应该能够通过在%c（“screenname1”、“screenname2”），]中调用usafull[！usafull$screenName%”来删除具有特定屏幕名称的用户，比如screenname1和screenname2；]
。另外，您可以通过调用usafull[duplicated（usafull），]
来识别重复的行，以检查为什么unique和total不同。为什么不在情绪分类后对数据进行规范化？是的，您的想法是可行的，但是为了在过滤后从其他用户那里获取文本，我无法使用sapply
。这是输出：>new您能详细说明一下规范化数据的想法吗@hrbrmstrHi！是的，有帮助！但是我遇到了另一个问题，usafull\u nobias
已经改变，现在我无法使用sapply
获取文本进行进一步的清理和分析。我可以继续这样做吗，text您所说的“usafull\u nobias
已经改变”是什么意思？无论如何，您应该能够使用usafull\u nobias$text
访问文本。是的，您可以使用text，那么total
和usafull
中不同数量的tweet呢？你想出来了吗？