使用data.table()和dplyr()在R
问题 我有一个名为“FID”的数据框(见下文),其中包含三年内每月FID的频率 我想通过使用软件包data.table和dplyr,计算3年内FID的总频率来对我的数据框架进行子集,然后我想总结我的数据,使其包含:- 摘要数据框使用data.table()和dplyr()在R,r,dataframe,dplyr,data.table,summary,R,Dataframe,Dplyr,Data.table,Summary,问题 我有一个名为“FID”的数据框(见下文),其中包含三年内每月FID的频率 我想通过使用软件包data.table和dplyr,计算3年内FID的总频率来对我的数据框架进行子集,然后我想总结我的数据,使其包含:- 摘要数据框 Month Total_FID Mean_FID SD_FID S.E_FID Ci_Lower Ci_Upper ##Reformat into a data.table object FID_Table<-data.tabl
Month Total_FID Mean_FID SD_FID S.E_FID Ci_Lower Ci_Upper
##Reformat into a data.table object
FID_Table<-data.table(FID)
##Summary statistics
FID.Summarised=FID_Table[, sum(FID),
Month=.N,
Mean_FID=mean(FID),
SD_FID=sd(FID),
S.E = std.error(FID),
by=Month]
##Error message
Error in `[.data.table`(FID_Table, , sum(FID), Month = .N, Mean_FID = mean(FID), :
unused arguments (Month = .N, Mean_FID = mean(FID), SD_FID = sd(FID), S.E = std.error(FID))
##Reformat into a data.table object
FID_Table<-data.table(FID)
##Summarise Data
Summarised.FID<-FID_Table[, .(FID.Freq=sum(FID),
mean = mean(FID),
sd=sd(FID),
median=median(FID)),
by = .(Month)]
##data.table results
Month FID.Freq mean sd median
1: January 165 55.000000 10.535654 56
2: February 182 60.666667 29.737743 65
3: March 179 59.666667 33.291641 43
4: April 104 34.666667 16.862186 27
5: May 124 41.333333 49.571497 20
6: June 10 3.333333 5.773503 0
7: July 15 5.000000 4.358899 7
8: August 133 44.333333 21.007935 45
9: September 97 32.333333 21.548395 34
10: October 82 27.333333 13.051181 26
11: November 75 25.000000 19.000000 25
12: December 102 34.000000 4.582576 33
###What a section of my data frame looks like ##Desired outcome
Year Month FID Month FID
2018 January 86 January 208
2019 January 66 February 176
2020 January 56
2018 February 76
2019 February 55
2020 February 45
汇总数据框所需的列
Month Total_FID Mean_FID SD_FID S.E_FID Ci_Lower Ci_Upper
##Reformat into a data.table object
FID_Table<-data.table(FID)
##Summary statistics
FID.Summarised=FID_Table[, sum(FID),
Month=.N,
Mean_FID=mean(FID),
SD_FID=sd(FID),
S.E = std.error(FID),
by=Month]
##Error message
Error in `[.data.table`(FID_Table, , sum(FID), Month = .N, Mean_FID = mean(FID), :
unused arguments (Month = .N, Mean_FID = mean(FID), SD_FID = sd(FID), S.E = std.error(FID))
##Reformat into a data.table object
FID_Table<-data.table(FID)
##Summarise Data
Summarised.FID<-FID_Table[, .(FID.Freq=sum(FID),
mean = mean(FID),
sd=sd(FID),
median=median(FID)),
by = .(Month)]
##data.table results
Month FID.Freq mean sd median
1: January 165 55.000000 10.535654 56
2: February 182 60.666667 29.737743 65
3: March 179 59.666667 33.291641 43
4: April 104 34.666667 16.862186 27
5: May 124 41.333333 49.571497 20
6: June 10 3.333333 5.773503 0
7: July 15 5.000000 4.358899 7
8: August 133 44.333333 21.007935 45
9: September 97 32.333333 21.548395 34
10: October 82 27.333333 13.051181 26
11: November 75 25.000000 19.000000 25
12: December 102 34.000000 4.582576 33
我不是一个高级R用户,我已经在网上阅读了许多堆栈溢出页面和教程,但我无法找到正确的程序(请参见下面的R代码)来生成所需的摘要数据帧。我也找不到使用package data.table生成上下置信区间的方法。知道如何同时使用data.table和dplyr执行此过程将非常方便,因为我经常使用这两个包
如果有人能帮忙,我将不胜感激
非常感谢
R代码
Month Total_FID Mean_FID SD_FID S.E_FID Ci_Lower Ci_Upper
##Reformat into a data.table object
FID_Table<-data.table(FID)
##Summary statistics
FID.Summarised=FID_Table[, sum(FID),
Month=.N,
Mean_FID=mean(FID),
SD_FID=sd(FID),
S.E = std.error(FID),
by=Month]
##Error message
Error in `[.data.table`(FID_Table, , sum(FID), Month = .N, Mean_FID = mean(FID), :
unused arguments (Month = .N, Mean_FID = mean(FID), SD_FID = sd(FID), S.E = std.error(FID))
##Reformat into a data.table object
FID_Table<-data.table(FID)
##Summarise Data
Summarised.FID<-FID_Table[, .(FID.Freq=sum(FID),
mean = mean(FID),
sd=sd(FID),
median=median(FID)),
by = .(Month)]
##data.table results
Month FID.Freq mean sd median
1: January 165 55.000000 10.535654 56
2: February 182 60.666667 29.737743 65
3: March 179 59.666667 33.291641 43
4: April 104 34.666667 16.862186 27
5: May 124 41.333333 49.571497 20
6: June 10 3.333333 5.773503 0
7: July 15 5.000000 4.358899 7
8: August 133 44.333333 21.007935 45
9: September 97 32.333333 21.548395 34
10: October 82 27.333333 13.051181 26
11: November 75 25.000000 19.000000 25
12: December 102 34.000000 4.582576 33
使用数据表
Month Total_FID Mean_FID SD_FID S.E_FID Ci_Lower Ci_Upper
##Reformat into a data.table object
FID_Table<-data.table(FID)
##Summary statistics
FID.Summarised=FID_Table[, sum(FID),
Month=.N,
Mean_FID=mean(FID),
SD_FID=sd(FID),
S.E = std.error(FID),
by=Month]
##Error message
Error in `[.data.table`(FID_Table, , sum(FID), Month = .N, Mean_FID = mean(FID), :
unused arguments (Month = .N, Mean_FID = mean(FID), SD_FID = sd(FID), S.E = std.error(FID))
##Reformat into a data.table object
FID_Table<-data.table(FID)
##Summarise Data
Summarised.FID<-FID_Table[, .(FID.Freq=sum(FID),
mean = mean(FID),
sd=sd(FID),
median=median(FID)),
by = .(Month)]
##data.table results
Month FID.Freq mean sd median
1: January 165 55.000000 10.535654 56
2: February 182 60.666667 29.737743 65
3: March 179 59.666667 33.291641 43
4: April 104 34.666667 16.862186 27
5: May 124 41.333333 49.571497 20
6: June 10 3.333333 5.773503 0
7: July 15 5.000000 4.358899 7
8: August 133 44.333333 21.007935 45
9: September 97 32.333333 21.548395 34
10: October 82 27.333333 13.051181 26
11: November 75 25.000000 19.000000 25
12: December 102 34.000000 4.582576 33
数据帧:“FID”
structure(list(Year = c(2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2016L, 2016L, 2016L,
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L,
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L,
2017L, 2017L, 2017L), Month = structure(c(5L, 4L, 8L, 1L, 9L,
7L, 6L, 2L, 12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L, 9L, 7L, 6L, 2L,
12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L, 9L, 7L, 6L, 2L, 12L, 11L,
10L, 3L), .Label = c("April", "August", "December", "February",
"January", "July", "June", "March", "May", "November", "October",
"September"), class = "factor"), FID = c(65L, 88L, 43L, 54L,
98L, 0L, 0L, 23L, 10L, 15L, 6L, 33L, 56L, 29L, 98L, 23L, 6L,
10L, 7L, 65L, 53L, 41L, 25L, 30L, 44L, 65L, 38L, 27L, 20L, 0L,
8L, 45L, 34L, 26L, 44L, 39L)), class = "data.frame", row.names = c(NA,
-36L))
答案:数据表
Month Total_FID Mean_FID SD_FID S.E_FID Ci_Lower Ci_Upper
##Reformat into a data.table object
FID_Table<-data.table(FID)
##Summary statistics
FID.Summarised=FID_Table[, sum(FID),
Month=.N,
Mean_FID=mean(FID),
SD_FID=sd(FID),
S.E = std.error(FID),
by=Month]
##Error message
Error in `[.data.table`(FID_Table, , sum(FID), Month = .N, Mean_FID = mean(FID), :
unused arguments (Month = .N, Mean_FID = mean(FID), SD_FID = sd(FID), S.E = std.error(FID))
##Reformat into a data.table object
FID_Table<-data.table(FID)
##Summarise Data
Summarised.FID<-FID_Table[, .(FID.Freq=sum(FID),
mean = mean(FID),
sd=sd(FID),
median=median(FID)),
by = .(Month)]
##data.table results
Month FID.Freq mean sd median
1: January 165 55.000000 10.535654 56
2: February 182 60.666667 29.737743 65
3: March 179 59.666667 33.291641 43
4: April 104 34.666667 16.862186 27
5: May 124 41.333333 49.571497 20
6: June 10 3.333333 5.773503 0
7: July 15 5.000000 4.358899 7
8: August 133 44.333333 21.007935 45
9: September 97 32.333333 21.548395 34
10: October 82 27.333333 13.051181 26
11: November 75 25.000000 19.000000 25
12: December 102 34.000000 4.582576 33
##重新格式化为data.table对象
FID_表%
dplyr::总结(频率=总和(FID),
月平均数=月平均数(FID),
sd.month=sd(FID,na.rm=TRUE),
n_FID=n(),
sem=sd(FID)/sqrt(n()),
ci_low=月平均值-1.96*sem,
ci_hi=月平均值+1.96*sem)%>%
解组()
##dplyr结果
月频率平均值。月sd。月n_FID sem ci_低ci_高
1月1日165 55.000000 10.535654 3 6.082763 43.0777854 66.922215
2月2日182 60.666667 29.737743 3 17.169094 27.0152431 94.318090
179年3月3日59.666667 33.291641 3 19.220938 21.9936289 97.339704
4月4日104 34.666667 16.862186 3 9.735388 15.5853064 53.748027
5月12441.33333349.571497 3 28.620117-14.7620965 97.428763
6月10日3.333333 5.773503 3.333333-3.2000000 9.866667
7月15日5.000000 4.358899 3 2.516611 0.0674415 9.932558
8月13日44.333333 21.007935 3 12.128937 20.5606169 68.106050
1997年9月9日32.333333 21.548395 3 12.440972 7.9490287 56.717638
10月10日82 27.333333 13.051181 3 7.535103 12.5645314 42.102135
11月11日75.25.000000 19.000000 310.969655 3.4994760 46.500524
12月102日34.000000 4.582576 3 2.645751 28.8143274 39.185673
dplyr::summary(Mean.Month=Mean(n),sd.Month=sd(n,na.rm=TRUE)
什么是n
?应该是FID
?嗨,Ronak,谢谢你的建议。我根据你的建议更改了代码,并且有一条新的错误消息。你能提供建议吗?如果你能FID()
不是一个函数,而是一个列名。因此您实际上需要sem=sd(FID)/sqrt(FID)
但是sqrt
没有给出一个值。你确定你使用的公式是正确的吗?嗨,Ronak。我已经接近了预期的结果。但是,dplyr代码没有在3年内逐月对数据帧进行子集划分。我重新编辑了页面,这样你就可以实际看到我的意思。在月份列中,应该有一个一月,2月、3月等,3年内FID的频率计数(我通过data.table实现了这一点-见上文),加上他们的相关摘要统计数据,由于您的建议,这些统计数据已在上文中成功创建。您知道如何解决此问题吗?如果您能提供帮助,谢谢:)是,正如我前面提到的,这段代码就是问题所在sem=sd(FID)/sqrt(FID)
。请注意,前几行中的mean
,sd
,n()
只为每组返回一个值,但这并不sem=sd(FID)/sqrt(FID)
。它为组中的x
行返回x
值,因此数据帧不会每个月折叠成一行。这就是为什么我问你是否知道这句话是正确的。