R 计算数据子集的统计信息
以下是我的数据的一个可复制的小示例:R 计算数据子集的统计信息,r,dataframe,R,Dataframe,以下是我的数据的一个可复制的小示例: > mydata <- structure(list(subject = c(1, 1, 1, 2, 2, 2), time = c(0, 1, 2, 0, 1, 2), measure = c(10, 12, 8, 7, 0, 0)), .Names = c("subject", "time", "measure"), row.names = c(NA, -6L), class = "data.frame") > mydata sub
> mydata <- structure(list(subject = c(1, 1, 1, 2, 2, 2), time = c(0, 1, 2, 0, 1, 2), measure = c(10, 12, 8, 7, 0, 0)), .Names = c("subject", "time", "measure"), row.names = c(NA, -6L), class = "data.frame")
> mydata
subject time measure
1 0 10
1 1 12
1 2 8
2 0 7
2 1 0
2 2 0
除了以编程方式循环浏览所有记录或先重新格式化为宽格式之外,还有什么简单的方法可以做到这一点吗?您可以使用
plyr
软件包中的ddply
:
library(plyr)
res = ddply(mydata, .(subject), mutate, mn_measure = mean(measure))
res
subject time measure mn_measure
1 1 0 10 10.000000
2 1 1 12 10.000000
3 1 2 8 10.000000
4 2 0 7 2.333333
5 2 1 0 2.333333
6 2 2 0 2.333333
require(data.table)
dt <- data.table(mydata, key = "subject")
dt[, mn_measure := mean(measure), by = subject]
# subject time measure mn_measure
# 1: 1 0 10 10.000000
# 2: 1 1 12 10.000000
# 3: 1 2 8 10.000000
# 4: 2 0 7 2.333333
# 5: 2 1 0 2.333333
# 6: 2 2 0 2.333333
使用基本R函数
ave()
,尽管其名称令人困惑,但它可以计算各种统计数据,包括平均值
:
within(mydata, mean<-ave(measure, subject, FUN=mean))
subject time measure mean
1 1 0 10 10.000000
2 1 1 12 10.000000
3 1 2 8 10.000000
4 2 0 7 2.333333
5 2 1 0 2.333333
6 2 2 0 2.333333
或者使用
数据。表包:
library(plyr)
res = ddply(mydata, .(subject), mutate, mn_measure = mean(measure))
res
subject time measure mn_measure
1 1 0 10 10.000000
2 1 1 12 10.000000
3 1 2 8 10.000000
4 2 0 7 2.333333
5 2 1 0 2.333333
6 2 2 0 2.333333
require(data.table)
dt <- data.table(mydata, key = "subject")
dt[, mn_measure := mean(measure), by = subject]
# subject time measure mn_measure
# 1: 1 0 10 10.000000
# 2: 1 1 12 10.000000
# 3: 1 2 8 10.000000
# 4: 2 0 7 2.333333
# 5: 2 1 0 2.333333
# 6: 2 2 0 2.333333
require(data.table)
dtFUN=mean
是不必要的,对吗?这是默认的FUN