Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/wix/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
文本挖掘、tm、Twitter API提要_Twitter_Text Mining_Tm - Fatal编程技术网

文本挖掘、tm、Twitter API提要

文本挖掘、tm、Twitter API提要,twitter,text-mining,tm,Twitter,Text Mining,Tm,早上好 我希望你能帮我做一个文本挖掘练习。我对“AAPL”推文感兴趣,并且能够从API中获取500条推文。我能够独自跨越几个障碍,但最后一部分需要帮助。出于某种原因,tm软件包没有删除停止字。请你看一下,看看可能是什么问题 在绘制术语频率后,最常见的术语是“AAPL”、“苹果”、“iPhone”、“价格”、“股票” 提前谢谢 芒克酒店 transform into dataframe tweets.df <- twListToDF(tweets) #Isolate text from t

早上好

我希望你能帮我做一个文本挖掘练习。我对“AAPL”推文感兴趣,并且能够从API中获取500条推文。我能够独自跨越几个障碍,但最后一部分需要帮助。出于某种原因,tm软件包没有删除停止字。请你看一下,看看可能是什么问题

在绘制术语频率后,最常见的术语是“AAPL”、“苹果”、“iPhone”、“价格”、“股票”

提前谢谢

芒克酒店

transform into dataframe
tweets.df <- twListToDF(tweets)

#Isolate text from tweets
aapl_tweets <- tweets.df$text

#Deal with emoticons
tweets2 <- data.frame(text = iconv(aapl_tweets, "latin1", "ASCII", "bye"), stringsAsFactors = FALSE)

#Make a vector source:
aapl_source <- VectorSource(tweets2)

#make a volatile corpus
aapl_corpus <- VCorpus(aapl_source)
aapl_cleaned <- clean_corpus(aapl_source)

#create my list to remove words
myList <- c("aapl", "apple", "stock", "stocks", stopwords("en"))

#clean corpus function 

clean_corpus <- function(corpus){
  corpus <- tm_map(corpus, stripWhitespace, mc.cores = 1)
  corpus <- tm_map(corpus, removePunctuation, mc.cores = 1)
  corpus <- tm_map(corpus, removeWords, myList, mc.cores = 1)
  return(corpus)
}

#clean aapl corpus
aapl_cleaned <- clean_corpus(aapl_corpus)

#convert to TDM
aapl.tdm <- TermDocumentMatrix(aapl_cleaned)

aapl.tdm

#Convert as Matrix
aapl_m <- as.matrix(aapl.tdm)

#Create Frequency tables
term_frequency <- rowSums(aapl_m)
term_frequency <- sort(term_frequency, decreasing = TRUE)
term_frequency[1:10]

barplot(term_frequency[1:10])
转换为数据帧
tweets.df可能重复的可能重复的