R 如何聚合数据并运行自定义函数以计算置信区间

R 如何聚合数据并运行自定义函数以计算置信区间,r,R,我有一个包含三个变量的数据框架:年份、位置和集中度,我希望按年份和位置汇总数据,并计算集中度的置信区间 Year <- rep(c(2010, 2011, 2012, 2013), each=15) Location <- rep(c("Texas", "Colorado", "Washington"), times = 4, each = 5) Concentration <- runif(60, 0, 100) conc_data <- cbind.data.fra

我有一个包含三个变量的数据框架:年份、位置和集中度,我希望按年份和位置汇总数据,并计算集中度的置信区间

Year <- rep(c(2010, 2011, 2012, 2013), each=15)
Location <- rep(c("Texas", "Colorado", "Washington"), times = 4, each = 5)
Concentration <- runif(60, 0, 100)

conc_data <- cbind.data.frame(Year, Location, Concentration)
head(conc_data)

  Year Location Concentration
1 2010    Texas      22.54480
2 2010    Texas      70.38605
3 2010    Texas      79.53292
4 2010    Texas      95.62562
5 2010    Texas      38.81795
6 2010 Colorado      68.69821

如果我们提供匿名函数(
函数(x)
),则“x”返回“浓度”

aggregate(cbind(lwr = Concentration) ~ Location + Year, data = conc_data, 
      function(x) confidence_interval_lwr(x, 0.95))
#  Location Year        lwr
#1    Colorado 2010 13.1289089
#2       Texas 2010 14.3379460
#3  Washington 2010 30.4922382
#4    Colorado 2011 18.9369171
#5       Texas 2011  0.6261571
#6  Washington 2011 12.2817138
#7    Colorado 2012  3.7365737
#8       Texas 2012 11.1165898
#9  Washington 2012 32.9729329
#10   Colorado 2013 23.9445299
#11      Texas 2013  3.0298597
#12 Washington 2013  9.0199863
注意:由于在创建
runif
列时没有设置
seed,因此这些值会有所不同

confidence_interval_lwr <- function(vector, interval) {
  # Standard deviation of sample
  vec_sd <- sd(vector)
  # Sample size
  n <- length(vector)
  # Mean of sample
  vec_mean <- mean(vector)
  # Error according to t distribution
  error <- qt((interval + 1)/2, df = n - 1) * vec_sd / sqrt(n)
  # Confidence interval as a vector
  lwr <- c("lower" = vec_mean - error)
  return(lwr)
}
Year   Location  lwr
1 2010      Texas  8.2
2 2010   Colorado  5.9
3 2010 Washington 15.0
4 2011      Texas 10.0
5 2011   Colorado  2.0
6 2011 Washington 18.0
aggregate(cbind(lwr = Concentration) ~ Location + Year, data = conc_data, 
      function(x) confidence_interval_lwr(x, 0.95))
#  Location Year        lwr
#1    Colorado 2010 13.1289089
#2       Texas 2010 14.3379460
#3  Washington 2010 30.4922382
#4    Colorado 2011 18.9369171
#5       Texas 2011  0.6261571
#6  Washington 2011 12.2817138
#7    Colorado 2012  3.7365737
#8       Texas 2012 11.1165898
#9  Washington 2012 32.9729329
#10   Colorado 2013 23.9445299
#11      Texas 2013  3.0298597
#12 Washington 2013  9.0199863