Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/67.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用data.table()和dplyr()在R_R_Dataframe_Dplyr_Data.table_Summary - Fatal编程技术网

使用data.table()和dplyr()在R

使用data.table()和dplyr()在R,r,dataframe,dplyr,data.table,summary,R,Dataframe,Dplyr,Data.table,Summary,问题 我有一个名为“FID”的数据框(见下文),其中包含三年内每月FID的频率 我想通过使用软件包data.table和dplyr,计算3年内FID的总频率来对我的数据框架进行子集,然后我想总结我的数据,使其包含:- 摘要数据框 Month Total_FID Mean_FID SD_FID S.E_FID Ci_Lower Ci_Upper ##Reformat into a data.table object FID_Table<-data.tabl

问题

我有一个名为“FID”的数据框(见下文),其中包含三年内每月FID的频率

我想通过使用软件包data.table和dplyr,计算3年内FID的总频率来对我的数据框架进行子集,然后我想总结我的数据,使其包含:-

摘要数据框

Month   Total_FID    Mean_FID   SD_FID   S.E_FID  Ci_Lower  Ci_Upper
   ##Reformat into a data.table object
   FID_Table<-data.table(FID)

   ##Summary statistics
   FID.Summarised=FID_Table[, sum(FID), 
                              Month=.N,
                              Mean_FID=mean(FID),
                              SD_FID=sd(FID),
                              S.E = std.error(FID),
                              by=Month]
##Error message
Error in `[.data.table`(FID_Table, , sum(FID), Month = .N, Mean_FID = mean(FID),  : 
  unused arguments (Month = .N, Mean_FID = mean(FID), SD_FID = sd(FID), S.E = std.error(FID))
##Reformat into a data.table object
   FID_Table<-data.table(FID)

##Summarise Data
Summarised.FID<-FID_Table[, .(FID.Freq=sum(FID),
                              mean = mean(FID),
                              sd=sd(FID),
                              median=median(FID)), 
                              by = .(Month)]

##data.table results

        Month FID.Freq      mean        sd median
 1:   January      165 55.000000 10.535654     56
 2:  February      182 60.666667 29.737743     65
 3:     March      179 59.666667 33.291641     43
 4:     April      104 34.666667 16.862186     27
 5:       May      124 41.333333 49.571497     20
 6:      June       10  3.333333  5.773503      0
 7:      July       15  5.000000  4.358899      7
 8:    August      133 44.333333 21.007935     45
 9: September       97 32.333333 21.548395     34
10:   October       82 27.333333 13.051181     26
11:  November       75 25.000000 19.000000     25
12:  December      102 34.000000  4.582576     33
  • 3年内每月FID的总频率
  • 3年内每月FID的平均频率
  • 3年内每月FID的标准偏差
  • 3年内每月FID的标准误差
  • 3年内每月的置信下限和置信上限
  • 我不知道如何通过增加3年内每月FID的总频率来对数据帧进行子集划分。例如,在下面的示例中,三年中一月的总频率为-86+66+56=208,我希望每个月都这样做

     ###What a section of my data frame looks like      ##Desired outcome
     Year     Month       FID                             Month       FID  
     2018    January       86                             January     208
     2019    January       66                             February    176
     2020    January       56
     2018    February      76
     2019    February      55
     2020    February      45
    
    汇总数据框所需的列

    Month   Total_FID    Mean_FID   SD_FID   S.E_FID  Ci_Lower  Ci_Upper
    
       ##Reformat into a data.table object
       FID_Table<-data.table(FID)
    
       ##Summary statistics
       FID.Summarised=FID_Table[, sum(FID), 
                                  Month=.N,
                                  Mean_FID=mean(FID),
                                  SD_FID=sd(FID),
                                  S.E = std.error(FID),
                                  by=Month]
    ##Error message
    Error in `[.data.table`(FID_Table, , sum(FID), Month = .N, Mean_FID = mean(FID),  : 
      unused arguments (Month = .N, Mean_FID = mean(FID), SD_FID = sd(FID), S.E = std.error(FID))
    
    ##Reformat into a data.table object
       FID_Table<-data.table(FID)
    
    ##Summarise Data
    Summarised.FID<-FID_Table[, .(FID.Freq=sum(FID),
                                  mean = mean(FID),
                                  sd=sd(FID),
                                  median=median(FID)), 
                                  by = .(Month)]
    
    ##data.table results
    
            Month FID.Freq      mean        sd median
     1:   January      165 55.000000 10.535654     56
     2:  February      182 60.666667 29.737743     65
     3:     March      179 59.666667 33.291641     43
     4:     April      104 34.666667 16.862186     27
     5:       May      124 41.333333 49.571497     20
     6:      June       10  3.333333  5.773503      0
     7:      July       15  5.000000  4.358899      7
     8:    August      133 44.333333 21.007935     45
     9: September       97 32.333333 21.548395     34
    10:   October       82 27.333333 13.051181     26
    11:  November       75 25.000000 19.000000     25
    12:  December      102 34.000000  4.582576     33
    
    我不是一个高级R用户,我已经在网上阅读了许多堆栈溢出页面和教程,但我无法找到正确的程序(请参见下面的R代码)来生成所需的摘要数据帧。我也找不到使用package data.table生成上下置信区间的方法。知道如何同时使用data.table和dplyr执行此过程将非常方便,因为我经常使用这两个包

    如果有人能帮忙,我将不胜感激

    非常感谢

    R代码

    Month   Total_FID    Mean_FID   SD_FID   S.E_FID  Ci_Lower  Ci_Upper
    
       ##Reformat into a data.table object
       FID_Table<-data.table(FID)
    
       ##Summary statistics
       FID.Summarised=FID_Table[, sum(FID), 
                                  Month=.N,
                                  Mean_FID=mean(FID),
                                  SD_FID=sd(FID),
                                  S.E = std.error(FID),
                                  by=Month]
    ##Error message
    Error in `[.data.table`(FID_Table, , sum(FID), Month = .N, Mean_FID = mean(FID),  : 
      unused arguments (Month = .N, Mean_FID = mean(FID), SD_FID = sd(FID), S.E = std.error(FID))
    
    ##Reformat into a data.table object
       FID_Table<-data.table(FID)
    
    ##Summarise Data
    Summarised.FID<-FID_Table[, .(FID.Freq=sum(FID),
                                  mean = mean(FID),
                                  sd=sd(FID),
                                  median=median(FID)), 
                                  by = .(Month)]
    
    ##data.table results
    
            Month FID.Freq      mean        sd median
     1:   January      165 55.000000 10.535654     56
     2:  February      182 60.666667 29.737743     65
     3:     March      179 59.666667 33.291641     43
     4:     April      104 34.666667 16.862186     27
     5:       May      124 41.333333 49.571497     20
     6:      June       10  3.333333  5.773503      0
     7:      July       15  5.000000  4.358899      7
     8:    August      133 44.333333 21.007935     45
     9: September       97 32.333333 21.548395     34
    10:   October       82 27.333333 13.051181     26
    11:  November       75 25.000000 19.000000     25
    12:  December      102 34.000000  4.582576     33
    
    使用数据表

    Month   Total_FID    Mean_FID   SD_FID   S.E_FID  Ci_Lower  Ci_Upper
    
       ##Reformat into a data.table object
       FID_Table<-data.table(FID)
    
       ##Summary statistics
       FID.Summarised=FID_Table[, sum(FID), 
                                  Month=.N,
                                  Mean_FID=mean(FID),
                                  SD_FID=sd(FID),
                                  S.E = std.error(FID),
                                  by=Month]
    ##Error message
    Error in `[.data.table`(FID_Table, , sum(FID), Month = .N, Mean_FID = mean(FID),  : 
      unused arguments (Month = .N, Mean_FID = mean(FID), SD_FID = sd(FID), S.E = std.error(FID))
    
    ##Reformat into a data.table object
       FID_Table<-data.table(FID)
    
    ##Summarise Data
    Summarised.FID<-FID_Table[, .(FID.Freq=sum(FID),
                                  mean = mean(FID),
                                  sd=sd(FID),
                                  median=median(FID)), 
                                  by = .(Month)]
    
    ##data.table results
    
            Month FID.Freq      mean        sd median
     1:   January      165 55.000000 10.535654     56
     2:  February      182 60.666667 29.737743     65
     3:     March      179 59.666667 33.291641     43
     4:     April      104 34.666667 16.862186     27
     5:       May      124 41.333333 49.571497     20
     6:      June       10  3.333333  5.773503      0
     7:      July       15  5.000000  4.358899      7
     8:    August      133 44.333333 21.007935     45
     9: September       97 32.333333 21.548395     34
    10:   October       82 27.333333 13.051181     26
    11:  November       75 25.000000 19.000000     25
    12:  December      102 34.000000  4.582576     33
    
    数据帧:“FID”

         structure(list(Year = c(2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 
    2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2016L, 2016L, 2016L, 
    2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 
    2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
    2017L, 2017L, 2017L), Month = structure(c(5L, 4L, 8L, 1L, 9L, 
    7L, 6L, 2L, 12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L, 9L, 7L, 6L, 2L, 
    12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L, 9L, 7L, 6L, 2L, 12L, 11L, 
    10L, 3L), .Label = c("April", "August", "December", "February", 
    "January", "July", "June", "March", "May", "November", "October", 
    "September"), class = "factor"), FID = c(65L, 88L, 43L, 54L, 
    98L, 0L, 0L, 23L, 10L, 15L, 6L, 33L, 56L, 29L, 98L, 23L, 6L, 
    10L, 7L, 65L, 53L, 41L, 25L, 30L, 44L, 65L, 38L, 27L, 20L, 0L, 
    8L, 45L, 34L, 26L, 44L, 39L)), class = "data.frame", row.names = c(NA, 
    -36L))
    

    答案:数据表

    Month   Total_FID    Mean_FID   SD_FID   S.E_FID  Ci_Lower  Ci_Upper
    
       ##Reformat into a data.table object
       FID_Table<-data.table(FID)
    
       ##Summary statistics
       FID.Summarised=FID_Table[, sum(FID), 
                                  Month=.N,
                                  Mean_FID=mean(FID),
                                  SD_FID=sd(FID),
                                  S.E = std.error(FID),
                                  by=Month]
    ##Error message
    Error in `[.data.table`(FID_Table, , sum(FID), Month = .N, Mean_FID = mean(FID),  : 
      unused arguments (Month = .N, Mean_FID = mean(FID), SD_FID = sd(FID), S.E = std.error(FID))
    
    ##Reformat into a data.table object
       FID_Table<-data.table(FID)
    
    ##Summarise Data
    Summarised.FID<-FID_Table[, .(FID.Freq=sum(FID),
                                  mean = mean(FID),
                                  sd=sd(FID),
                                  median=median(FID)), 
                                  by = .(Month)]
    
    ##data.table results
    
            Month FID.Freq      mean        sd median
     1:   January      165 55.000000 10.535654     56
     2:  February      182 60.666667 29.737743     65
     3:     March      179 59.666667 33.291641     43
     4:     April      104 34.666667 16.862186     27
     5:       May      124 41.333333 49.571497     20
     6:      June       10  3.333333  5.773503      0
     7:      July       15  5.000000  4.358899      7
     8:    August      133 44.333333 21.007935     45
     9: September       97 32.333333 21.548395     34
    10:   October       82 27.333333 13.051181     26
    11:  November       75 25.000000 19.000000     25
    12:  December      102 34.000000  4.582576     33
    
    ##重新格式化为data.table对象
    FID_表%
    dplyr::总结(频率=总和(FID),
    月平均数=月平均数(FID),
    sd.month=sd(FID,na.rm=TRUE),
    n_FID=n(),
    sem=sd(FID)/sqrt(n()),
    ci_low=月平均值-1.96*sem,
    ci_hi=月平均值+1.96*sem)%>%
    解组()
    ##dplyr结果
    月频率平均值。月sd。月n_FID sem ci_低ci_高
    1月1日165 55.000000 10.535654 3 6.082763 43.0777854 66.922215
    2月2日182 60.666667 29.737743 3 17.169094 27.0152431 94.318090
    179年3月3日59.666667 33.291641 3 19.220938 21.9936289 97.339704
    4月4日104 34.666667 16.862186 3 9.735388 15.5853064 53.748027
    5月12441.33333349.571497 3 28.620117-14.7620965 97.428763
    6月10日3.333333 5.773503 3.333333-3.2000000 9.866667
    7月15日5.000000 4.358899 3 2.516611 0.0674415 9.932558
    8月13日44.333333 21.007935 3 12.128937 20.5606169 68.106050
    1997年9月9日32.333333 21.548395 3 12.440972 7.9490287 56.717638
    10月10日82 27.333333 13.051181 3 7.535103 12.5645314 42.102135
    11月11日75.25.000000 19.000000 310.969655 3.4994760 46.500524
    12月102日34.000000 4.582576 3 2.645751 28.8143274 39.185673
    
    dplyr::summary(Mean.Month=Mean(n),sd.Month=sd(n,na.rm=TRUE)
    什么是
    n
    ?应该是
    FID
    ?嗨,Ronak,谢谢你的建议。我根据你的建议更改了代码,并且有一条新的错误消息。你能提供建议吗?如果你能
    FID()
    不是一个函数,而是一个列名。因此您实际上需要
    sem=sd(FID)/sqrt(FID)
    但是
    sqrt
    没有给出一个值。你确定你使用的公式是正确的吗?嗨,Ronak。我已经接近了预期的结果。但是,dplyr代码没有在3年内逐月对数据帧进行子集划分。我重新编辑了页面,这样你就可以实际看到我的意思。在月份列中,应该有一个一月,2月、3月等,3年内FID的频率计数(我通过data.table实现了这一点-见上文),加上他们的相关摘要统计数据,由于您的建议,这些统计数据已在上文中成功创建。您知道如何解决此问题吗?如果您能提供帮助,谢谢:)是,正如我前面提到的,这段代码就是问题所在
    sem=sd(FID)/sqrt(FID)
    。请注意,前几行中的
    mean
    sd
    n()
    只为每组返回一个值,但这并不
    sem=sd(FID)/sqrt(FID)
    。它为组中的
    x
    行返回
    x
    值,因此数据帧不会每个月折叠成一行。这就是为什么我问你是否知道这句话是正确的。