R 在原始数据中按组添加平均值列
我想在R 在原始数据中按组添加平均值列,r,dataframe,dplyr,data.table,R,Dataframe,Dplyr,Data.table,我想在Rdata.frame中添加一列基于因子列的均值。像这样: df1 <- data.frame(X = rep(x = LETTERS[1:2], each = 3), Y = 1:6) df2 <- aggregate(data = df1, Y ~ X, FUN = mean) df3 <- merge(x = df1, y = df2, by = "X", suffixes = c(".Old",".New")) df3 # X Y.Old Y.New # 1
R
data.frame
中添加一列基于因子列的均值。像这样:
df1 <- data.frame(X = rep(x = LETTERS[1:2], each = 3), Y = 1:6)
df2 <- aggregate(data = df1, Y ~ X, FUN = mean)
df3 <- merge(x = df1, y = df2, by = "X", suffixes = c(".Old",".New"))
df3
# X Y.Old Y.New
# 1 A 1 2
# 2 A 2 2
# 3 A 3 2
# 4 B 4 5
# 5 B 5 5
# 6 B 6 5
df1ddply
和transform
来拯救(尽管我相信你至少会有4种不同的方法来做到这一点):
乔兰回答得很漂亮,这不是对你问题的回答,而是对话的延伸。如果您要查找两个分类变量与依赖项的关系的平均值表,这里有一个Hadley函数:
cast(CO2, Type ~ Treatment, value="uptake", fun.aggregate=mean, margins=TRUE)
以下是CO2数据的概览,以及均值表:
> head(CO2)
Plant Type Treatment conc uptake
1 Qn1 Quebec nonchilled 95 16.0
2 Qn1 Quebec nonchilled 175 30.4
3 Qn1 Quebec nonchilled 250 34.8
4 Qn1 Quebec nonchilled 350 37.2
5 Qn1 Quebec nonchilled 500 35.3
6 Qn1 Quebec nonchilled 675 39.2
> library(reshape)
> cast(CO2, Type ~ Treatment, mean, margins=TRUE)
Type nonchilled chilled (all)
1 Quebec 35.33333 31.75238 33.54286
2 Mississippi 25.95238 15.81429 20.88333
3 (all) 30.64286 23.78333 27.21310
这就是ave
功能的作用
df1$Y.New <- ave(df1$Y, df1$X)
df1$Y.New两种替代方法:
1.随附包装:
两者都给出了以下结果:
真棒的回答。你想在dplyr
答案中group\u by
后面跟着mutate
,这一点并不明显,所以这让我学到了这一点。
df1$Y.New <- ave(df1$Y, df1$X)
library(dplyr)
df1 <- df1 %>%
group_by(X) %>%
mutate(Y.new = mean(Y))
library(data.table)
setDT(df1)[, Y.new := mean(Y), by = X]
> df1
X Y Y.new
1: A 1 2
2: A 2 2
3: A 3 2
4: B 4 5
5: B 5 5
6: B 6 5