Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/71.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 在每行1个字符串的多个句子的数据帧上应用感伤器_R_Dplyr_Sentimentr - Fatal编程技术网

R 在每行1个字符串的多个句子的数据帧上应用感伤器

R 在每行1个字符串的多个句子的数据帧上应用感伤器,r,dplyr,sentimentr,R,Dplyr,Sentimentr,我有一个数据集,我试图通过文章来获取情绪。我有大约1000篇文章。每一篇文章都是一个字符串。这个字符串中有多个句子。理想情况下,我想添加另一个专栏,总结每篇文章的观点。使用dplyr有没有一种有效的方法 下面是一个仅包含2篇文章的示例数据集 date<- as.Date(c('2020-06-24', '2020-06-24')) text <- c('3 more cops recover as PNP COVID-19 infections soar to 519', 'QC s

我有一个数据集,我试图通过文章来获取情绪。我有大约1000篇文章。每一篇文章都是一个字符串。这个字符串中有多个句子。理想情况下,我想添加另一个专栏,总结每篇文章的观点。使用dplyr有没有一种有效的方法

下面是一个仅包含2篇文章的示例数据集

date<- as.Date(c('2020-06-24', '2020-06-24'))
text <- c('3 more cops recover as PNP COVID-19 infections soar to 519', 'QC suspends processing of PWD IDs after reports of abuse in issuance of cards')
link<- c('https://newsinfo.inquirer.net/1296981/3-more-cops-recover-as-pnps-covid-19-infections-soar-to-519,3,10,4,11,9,8', 'https://newsinfo.inquirer.net/1296974/qc-suspends-processing-of-pwd-ids-after-reports-of-abuse-in-issuance-of-cards')
V4 <-c('MANILA, Philippines — Three more police officers have recovered from the new coronavirus disease, increasing the total number of recoveries in the Philippine National Police to (PNP) 316., This developed as the total number of COVID-19 cases in the PNP rose to 519 with one new infection and nine deaths recorded., In a Facebook post on Wednesday, the PNP also recorded 676 probable and 876 suspects for the disease., PNP chief Gen. Archie Gamboa previously said the force would will intensify its health protocols among its personnel after recording a recent increase in deaths., The latest fatality of the ailment is a police officer in Cebu City, which is under enhanced community quarantine as COVID-19 cases continued to surge there., ATM, \r\n\r\nFor more news about the novel coronavirus click here.\r\nWhat you need to know about Coronavirus.\r\n\r\n\r\n\r\nFor more information on COVID-19, call the DOH Hotline: (02) 86517800 local 1149/1150.\r\n\r\n \r\n \r\n \r\n\r\n  \r\n , The Inquirer Foundation supports our healthcare frontliners and is still accepting cash donations to be deposited at Banco de Oro (BDO) current account #007960018860 or donate through PayMaya using this  link  .',
   'MANILA, Philippines — Quezon City will halt the processing of identification cards to persons with disability for two days starting Thursday, June 25, so it could tweak its guidelines after reports that unqualified persons had issued with the said IDs., In a statement on Wednesday, Quezon City Mayor Joy Belmonte said the suspension would the individual who issued PWD ID cards to six members of a family who were not qualified but who paid P2,000 each to get the IDs., Belmonte said the suspect, who is a local government employee, was already issued with a show-cause order to respond to the allegation., According to city government lawyer Nino Casimir, the suspect could face a grave misconduct case that could result in dismissal., The IDs are issued to only to persons qualified under the Act Expanding the Benefits and Privileges of Persons with Disability (Republic Act No. 10754)., The IDs entitle PWDs to a 20 percent discount and VAT exemption on goods and services., /atm')

df<-data.frame(date, text, link, V4)

head(df)
我想要的输出是简单地添加一个额外的列my
df
dataframe,其中包含每篇文章的摘要和(情绪)

基于以下答案的其他信息:

date<- as.Date(c('2020-06-24', '2020-06-24'))
text <- c('3 more cops recover as PNP COVID-19 infections soar to 519', 'QC suspends processing of PWD IDs after reports of abuse in issuance of cards')
link<- c('https://newsinfo.inquirer.net/1296981/3-more-cops-recover-as-pnps-covid-19-infections-soar-to-519,3,10,4,11,9,8', 'https://newsinfo.inquirer.net/1296974/qc-suspends-processing-of-pwd-ids-after-reports-of-abuse-in-issuance-of-cards')
V4 <-c('MANILA, Philippines — Three more police officers have recovered from the new coronavirus disease, increasing the total number of recoveries in the Philippine National Police to (PNP) 316., This developed as the total number of COVID-19 cases in the PNP rose to 519 with one new infection and nine deaths recorded., In a Facebook post on Wednesday, the PNP also recorded 676 probable and 876 suspects for the disease., PNP chief Gen. Archie Gamboa previously said the force would will intensify its health protocols among its personnel after recording a recent increase in deaths., The latest fatality of the ailment is a police officer in Cebu City, which is under enhanced community quarantine as COVID-19 cases continued to surge there., ATM, \r\n\r\nFor more news about the novel coronavirus click here.\r\nWhat you need to know about Coronavirus.\r\n\r\n\r\n\r\nFor more information on COVID-19, call the DOH Hotline: (02) 86517800 local 1149/1150.\r\n\r\n \r\n \r\n \r\n\r\n  \r\n , The Inquirer Foundation supports our healthcare frontliners and is still accepting cash donations to be deposited at Banco de Oro (BDO) current account #007960018860 or donate through PayMaya using this  link  .',
   'MANILA, Philippines — Quezon City will halt the processing of identification cards to persons with disability for two days starting Thursday, June 25, so it could tweak its guidelines after reports that unqualified persons had issued with the said IDs., In a statement on Wednesday, Quezon City Mayor Joy Belmonte said the suspension would the individual who issued PWD ID cards to six members of a family who were not qualified but who paid P2,000 each to get the IDs., Belmonte said the suspect, who is a local government employee, was already issued with a show-cause order to respond to the allegation., According to city government lawyer Nino Casimir, the suspect could face a grave misconduct case that could result in dismissal., The IDs are issued to only to persons qualified under the Act Expanding the Benefits and Privileges of Persons with Disability (Republic Act No. 10754)., The IDs entitle PWDs to a 20 percent discount and VAT exemption on goods and services., /atm')

df<-data.frame(date, text, link, V4)

df %>%
  group_by(V4) %>% # group by not really needed
  mutate(V4 = gsub("[.],", ".", V4), 
         sentiment_score = sentiment_by(V4)) 

# A tibble: 2 x 5
# Groups:   V4 [2]
  date       text                      link                                V4                                                  sentiment_score$e~ $word_count   $sd $ave_sentiment
  <date>     <chr>                     <chr>                               <chr>                                                            <int>       <int> <dbl>          <dbl>
1 2020-06-24 3 more cops recover as P~ https://newsinfo.inquirer.net/1296~ "MANILA, Philippines — Three more police officers ~                  1         172 0.204       -0.00849
2 2020-06-24 QC suspends processing o~ https://newsinfo.inquirer.net/1296~ "MANILA, Philippines — Quezon City will halt the p~                  1         161 0.329       -0.174  
Warning message:
Can't combine <sentiment_by> and <sentiment_by>; falling back to <data.frame>.
x Some attributes are incompatible.
i The author of the class should implement vctrs methods.
i See <https://vctrs.r-lib.org/reference/faq-error-incompatible-attributes.html>. 

date如果您需要对整个文本进行情感分析,则无需先将文本拆分为句子,情感功能会处理此问题。我将您的文本中的.,替换为句点,因为这是情感功能所需要的。情感功能将“先生”识别为不是句子的结尾。如果你先使用
get_句子()
,你会得到每个句子的情感,而不是整个文本

函数
touction\u by
处理整个文本的情绪,并很好地将其平均化。如果需要更改此选项,请使用
平均值功能的选项查看帮助。函数的
by
部分可以处理您想要应用的任何分组

df %>%
  group_by(V4) %>% # group by not really needed
  mutate(V4 = gsub("[.],", ".", V4), 
         sentiment_score = sentiment_by(V4)) 

# A tibble: 2 x 5
# Groups:   V4 [2]
  date       text               link                      V4                            sentiment_score$~ $word_count   $sd $ave_sentiment
  <date>     <chr>              <chr>                     <chr>                                     <int>       <int> <dbl>          <dbl>
1 2020-06-24 3 more cops recov~ https://newsinfo.inquire~ "MANILA, Philippines — Three~                 1         172 0.204       -0.00849
2 2020-06-24 QC suspends proce~ https://newsinfo.inquire~ "MANILA, Philippines — Quezo~                 1         161 0.329       -0.174  
df%>%
group_by(V4)%>%#group by实际上不需要
突变(V4=gsub(“[.]”,“,”,V4),
情绪评分=情绪评分(V4))
#一个tibble:2x5
#组别:V4[2]
日期文本链接V4情绪评分$~$word\u count$sd$ave\u情绪
1 2020-06-24再记录3名警察https://newsinfo.inquire菲律宾马尼拉-3~11720.204-0.00849
2020年6月24日QC暂停程序https://newsinfo.inquire菲律宾马尼拉-奎索1161 0.329-0.174

为什么数据集的标点符号不是.,而是。?我可以用一个替换它们吗?如果你愿意,可以。但是如果你有像
mr.
这样的词,它可能会奇怪地解析出来。仅仅使用
[。],
来确保一切都正确吗?太棒了,谢谢。我应该担心警告信息吗?警告消息:不能合并和;回到过去。某些属性不兼容。我认为类的作者应该实现vctrs方法。我明白了。有趣的警告。我不明白。你能在一行或几行示例文本中复制这个问题吗?在问题中添加了它。应该没问题,只是好奇地检查了附加信息。我没有收到你的警告,但我在R4上,purrr_0.3.4,tidyr_1.1.0,Mouncerr_2.7.1和dplyr_1.0.0。这或许可以解释这种差异。
df %>%
  group_by(V4) %>% # group by not really needed
  mutate(V4 = gsub("[.],", ".", V4), 
         sentiment_score = sentiment_by(V4)) 

# A tibble: 2 x 5
# Groups:   V4 [2]
  date       text               link                      V4                            sentiment_score$~ $word_count   $sd $ave_sentiment
  <date>     <chr>              <chr>                     <chr>                                     <int>       <int> <dbl>          <dbl>
1 2020-06-24 3 more cops recov~ https://newsinfo.inquire~ "MANILA, Philippines — Three~                 1         172 0.204       -0.00849
2 2020-06-24 QC suspends proce~ https://newsinfo.inquire~ "MANILA, Philippines — Quezo~                 1         161 0.329       -0.174