创建一个新变量,该变量取R中现有变量的相同值的平均值
我的数据看起来像这样,我现在要做的是创建一个新的列means,在这里我想找到客户接受的价格列中所有相同值的平均值创建一个新变量,该变量取R中现有变量的相同值的平均值,r,R,我的数据看起来像这样,我现在要做的是创建一个新的列means,在这里我想找到客户接受的价格列中所有相同值的平均值 Product | Price | Customer Accepted A 17.2 1 A 16.8 0 A 17.2 1 B 21 1 B 16.8 0 A
Product | Price | Customer Accepted
A 17.2 1
A 16.8 0
A 17.2 1
B 21 1
B 16.8 0
A 21 0
C 17.2 0
例如,17.2重复3次,其对应的客户接受值的平均值为1+1+0/3=0.66;同样,对于16.8,它的0+0/2=0,对于21,它是1+0/2=0.50;当重复相同的价格时,新列“平均值”应具有这些值
我的预期产出
价格值大约有950个不同的级别,每个值重复的次数并不一致。有人能帮我吗?非常感谢在大多数大数据分析包中都有一种称为分组的东西,例如data.table包。当然,你可以调查一下。但这里可能会有这样一种情况:然而,为了可读性,这并不是最优的。虽然可以缓存值,但会重新计算这些值
a = data.frame(
product = c("A", "A", "A", "B", "B", "A", "C"),
price = c(17.2, 16.8, 17.2, 21, 16.8, 21, 17.2),
accepted = c(1, 0, 1, 1, 0, 0)
)
invisible(
lapply(1:nrow(a), function(i) {
a[i, "mean"] <<- mean(a[a$price == a[i, "price"], "accepted"])
})
)
这实际上就是您试图做的:遍历每一行,为data.frame分配一个新值,这是所有接受值的平均值,其中价格与这一行中的价格相同。我希望我正确理解您的意思,下面是可以用于执行相同操作的代码
df <- data.frame(Product = c("A","A","A","B","B","A","C"),Price = c(17.2,16.8,17.2,21,16.8,21,17.2),Accpeted = c(1,0,1,1,0,0,0))
df$mean <- ave(df$Accpeted,df$Price,FUN=mean)
dplyr方法看起来是这样的
library(dplyr)
df <- data.frame(Product = c("A","A","A","B","B","A","C"), Price = c(17.2,16.8,17.2,21,16.8,21,17.2), CustomerAccepted=c(1,0,1,1,0,0,0))
df.summ <-
df %>%
group_by(Price) %>%
summarise(Mean = mean(CustomerAccepted))
我们可以使用data.table
非常感谢,伙计,这很有帮助:@David如果这对你有帮助,那么别忘了接受答案,谢谢
Product Price Accpeted mean
1 A 17.2 1 0.6666667
2 A 16.8 0 0.0000000
3 A 17.2 1 0.6666667
4 B 21.0 1 0.5000000
5 B 16.8 0 0.0000000
6 A 21.0 0 0.5000000
7 C 17.2 0 0.6666667
library(dplyr)
df <- data.frame(Product = c("A","A","A","B","B","A","C"), Price = c(17.2,16.8,17.2,21,16.8,21,17.2), CustomerAccepted=c(1,0,1,1,0,0,0))
df.summ <-
df %>%
group_by(Price) %>%
summarise(Mean = mean(CustomerAccepted))
library(data.table)
setDT(df)[, Mean := mean(Accpeted), Price]