R 按日期合并推文
我希望这不是一个基本的问题, 我有一个tweets的数据帧(在R中)。 我的目标是按日期计算情绪 如果有人能给我建议,我将不胜感激, 如何按日期连接tweetsR 按日期合并推文,r,text-mining,string-concatenation,R,Text Mining,String Concatenation,我希望这不是一个基本的问题, 我有一个tweets的数据帧(在R中)。 我的目标是按日期计算情绪 如果有人能给我建议,我将不胜感激, 如何按日期连接tweetstweet$text,其中 每个观察结果都会变成一个合并的tweet/文本字符串 例如,如果我有: Created_Date Tweet 2014-01-04 "the iphone is magnificent" 2014-01-04 "the iphone's screen is poo
tweet$text
,其中
每个观察结果都会变成一个合并的tweet/文本字符串
例如,如果我有:
Created_Date Tweet
2014-01-04 "the iphone is magnificent"
2014-01-04 "the iphone's screen is poor"
2014-01-04 "I will always use Apple products"
2014-01-03 "iphone is overpriced, but I love it"
2014-01-03 "Siri is very sluggish"
2014-01-03 "iphone's maps app is poor compared to Android"
我想要一个循环/函数来按创建日期合并推文
结果是这样的
Created_Date Tweet
2014-01-04 "the iphone is magnificent", "the iphone's screen is poor", "I will always use Apple products"
2014-01-03 "iphone is overpriced, but I love it", "Siri is very sluggish", "iphone's maps app is poor compared to Android"
这是我的数据
dat <- structure(list(Created_Date = structure(c(1388793600, 1388793600,
1388793600, 1388707200, 1388707200, 1388707200), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), Tweet = c("the iphone is magnificent",
"the iphone's screen is poor", "I will always use Apple products",
"iphone is overpriced, but I love it", "Siri is very sluggish",
"iphone's maps app is poor compared to Android")), .Names = c("Created_Date",
"Tweet"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-6L))
dat只是一个使用循环的简单实现。可能不是最快的解决方案,但很容易理解
# construction of a sample data.frame
text = c("Some random text.",
"Yet another line.",
"Will this ever stop.",
"This may be the last one.",
"It was not the last.")
date = c("9-11-2017",
"11-11-2017",
"10-11-2017",
"11-11-2017",
"10-11-2017")
tweet = data.frame(text, date)
# array with dates in the data.frame
dates = levels(tweet$date)
# initialise results with empty strings
resultString = rep.int("", length(dates))
for(i in 1:length(dates)) # loop over different dates
{
for(j in 1:length(tweet$text)) # loop over tweets
{
if (tweet$date[j] == dates[i]) # concatenate to resultString if dates match
{
resultString[i] = paste0(resultString[i], tweet$text[j])
}
}
}
# combine concatenated strings with dates in new data.frame
result = data.frame(date=dates, tweetsByDate=resultString)
result
# output:
# date tweetsByDate
# 1 10-11-2017 Will this ever stop.It was not the last.
# 2 11-11-2017 Yet another line.This may be the last one.
# 3 9-11-2017 Some random text.
使用data.table的示例
如果您使用的是语料库,那么您可以使用group
参数来term\u counts
或term\u matrix
按日期进行聚合(求和)
在您的情况下,如果您对计算肯定、否定和中性单词的数量感兴趣,可以首先创建一个“词干分析器”,将单词映射到以下类别:
library(corpus)
# map terms in the AFINN dictionary to Positive/Negative; others to Neutral
stem_sent <- new_stemmer(sentiment_afinn$term,
ifelse(sentiment_afinn$score > 0, "Positive", "Negative"),
default = "Neutral")
或者获取计数矩阵:
term_matrix(dat$Tweet, group = dat$Created_Date, stemmer = stem_sent)
## 2 x 3 sparse Matrix of class "dgCMatrix"
## Negative Neutral Positive
## 2014-01-03 2 17 1
## 2014-01-04 1 14 .
请提供一些您想做的事情,这样我们将更好地了解您的问题,并能够为您提供更多帮助。嗨,帕特里克,非常感谢您抽出时间帮助我提出问题。这正是我想做的。我从一个国家的多个新闻组中提取了“很多”帖子,并试图先按新闻组对它们进行分组,然后再进行汇总。我这样做的原因是,我希望考察一下围绕事件(选举)的情绪。我想用syuzhet来获得情感上的细微差别Patrick。我刚刚读到了您对其他人的另一个回复,其中显示了您如何创建一个循环来通过实例处理“stopwords”。这将是一个巨大的帮助,因为我通过了数百万的职位。谢谢你
aggregate(ta$Tweet,by=list(ta$Created_Date),FUN=function(X)paste(X, collapse = ","))
library(corpus)
# map terms in the AFINN dictionary to Positive/Negative; others to Neutral
stem_sent <- new_stemmer(sentiment_afinn$term,
ifelse(sentiment_afinn$score > 0, "Positive", "Negative"),
default = "Neutral")
term_counts(dat$Tweet, group = dat$Created_Date, stemmer = stem_sent)
## group term count
## 1 2014-01-03 Negative 2
## 2 2014-01-04 Negative 1
## 3 2014-01-03 Neutral 17
## 4 2014-01-04 Neutral 14
## 5 2014-01-03 Positive 1
term_matrix(dat$Tweet, group = dat$Created_Date, stemmer = stem_sent)
## 2 x 3 sparse Matrix of class "dgCMatrix"
## Negative Neutral Positive
## 2014-01-03 2 17 1
## 2014-01-04 1 14 .