基于其他列R中的多个条件平均列值
基于其他列R中的多个条件平均列值,r,tidyverse,R,Tidyverse,my.df1是一个数据.frame,具有许多独特的观察结果,但具有类似的特征(在本例中,颜色,类型和大小)。对于my.df2中的每个特征组合,我想计算my.df1中符合标准的所有观察值的平均值和SD。因此,例如,在my.df2的第一行中,我想计算my.df1中具有以下特征的所有观察值的平均值和SD,类型1和大小S。注意:对于第5行,我想计算my.df1中所有观察值的平均值和SD,无论其类型和大小,都是蓝色的。我的原始数据集有更多的观察值、标准变量和价格列,因此高度赞赏可扩展的解决方案 m
my.df1
是一个数据.frame
,具有许多独特的观察结果,但具有类似的特征(在本例中,颜色
,类型
和大小
)。对于my.df2
中的每个特征组合,我想计算my.df1
中符合标准的所有观察值的平均值
和SD
。因此,例如,在my.df2
的第一行中,我想计算my.df1
中具有以下特征的所有观察值的平均值
和SD
,类型1和大小S。注意:对于第5行,我想计算my.df1
中所有观察值的平均值和SD
,无论其类型和大小,都是蓝色的。我的原始数据集有更多的观察值、标准变量和价格列,因此高度赞赏可扩展的解决方案
my.df1 <- data.frame(Colour = c('Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Red','Red','Red','Red','Red','Red','Red','Red','Red','Red','Red','Red'),
Type = c(1,1,2,2,1,2,1,1,2,2,1,2,1,1,2,2,1,2,1,1,2,2,1,2),
Size = c('S','S','S','S','S','S','M','M','M','M','M','M','S','S','S','S','S','S','M','M','M','M','M','M'),
PriceOne = c(10,15,20,18,19,11,12,16,20,21,10,11,10,15,10,18,20,14,21,15,28,19,10,11),
PriceTwo = c(10,15,10,18,20,14,21,15,28,19,10,11,10,15,20,18,19,11,12,16,20,21,10,11))
my.df1(head)
Colour Type Size PriceOne PriceTwo
1 Blue 1 S 10 10
2 Blue 1 S 15 15
3 Blue 2 S 20 10
4 Blue 2 S 18 18
5 Blue 1 S 19 20
my.df2 <- data.frame(Colour = c('Blue','Blue','Blue','Blue','Blue','Blue','Red','Red','Red','Red','Red','Red'),
Type = c(1,1,2,2,2,'-',1,1,2,2,2,'-'),
Size = c('S','M','S','M','-','-','S','M','S','M','-','-'),
PriceOneMean = NA,
PriceOneStDev = NA,
PriceTwoMean = NA,
PriceTwoStDev = NA)
my.df2
Colour Type Size PriceOneMean PriceOneStDev PriceTwoMean PriceTwoStDev
1 Blue 1 S NA NA NA NA
2 Blue 1 M NA NA NA NA
3 Blue 2 S NA NA NA NA
4 Blue 2 M NA NA NA NA
5 Blue 2 - NA NA NA NA
6 Blue - - NA NA NA NA
7 Red 1 S NA NA NA NA
8 Red 1 M NA NA NA NA
9 Red 2 S NA NA NA NA
10 Red 2 M NA NA NA NA
11 Red 2 - NA NA NA NA
12 Red - - NA NA NA NA
dplyr库允许您分组、汇总和绑定。编辑以添加额外分组。为了简洁起见,我更喜欢@Jimbou的答案——他/她的答案可能只有一行
my.df1 <- data.frame(Colour = c('Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Red','Red','Red','Red','Red','Red','Red','Red','Red','Red','Red','Red'),
Type = c(1,1,2,2,1,2,1,1,2,2,1,2,1,1,2,2,1,2,1,1,2,2,1,2),
Size = c('S','S','S','S','S','S','M','M','M','M','M','M','S','S','S','S','S','S','M','M','M','M','M','M'),
PriceOne = c(10,15,20,18,19,11,12,16,20,21,10,11,10,15,10,18,20,14,21,15,28,19,10,11),
PriceTwo = c(10,15,10,18,20,14,21,15,28,19,10,11,10,15,20,18,19,11,12,16,20,21,10,11))
library(dplyr)
# make detailed summaries
my.df1.ColourTypeSize = my.df1 %>%
group_by(Colour, Type, Size) %>%
summarise(
PriceOneMean = mean(PriceOne),
PriceOneStDev = sd(PriceOne),
PriceTwoMean = mean(PriceTwo),
PriceTwoStDev = sd(PriceTwo))
my.df1.ColourType = my.df1 %>%
group_by(Colour, Type) %>%
summarise(
PriceOneMean = mean(PriceOne),
PriceOneStDev = sd(PriceOne),
PriceTwoMean = mean(PriceTwo),
PriceTwoStDev = sd(PriceTwo)) %>%
mutate(Size = NA)
# Make summary for colour alone and add NA for Size and Type
my.df1.Colour = my.df1 %>%
group_by(Colour) %>%
summarise(
PriceOneMean = mean(PriceOne),
PriceOneStDev = sd(PriceOne),
PriceTwoMean = mean(PriceTwo),
PriceTwoStDev = sd(PriceTwo)) %>%
mutate(Type = NA, Size = NA)
# Bind the summaries together and sort and arrange to make it look nice
my.df2 =
my.df1.Colour %>%
bind_rows(my.df1.ColourTypeSize) %>%
bind_rows(my.df1.ColourType) %>%
arrange(Colour, Type, Size) %>%
select(Colour, Type, Size, everything())
my.df1%
分组依据(颜色、类型、尺寸)%>%
总结(
PriceOne平均值=平均值(PriceOne),
PriceOneStDev=sd(PriceOne),
PriceTwoMean=平均值(PriceTwo),
PriceTwoStDev=sd(PriceTwo))
my.df1.colorType=my.df1%>%
分组依据(颜色、类型)%>%
总结(
PriceOne平均值=平均值(PriceOne),
PriceOneStDev=sd(PriceOne),
PriceTwoMean=平均值(PriceTwo),
PriceTwoStDev=sd(PriceTwo))%>%
变异(大小=NA)
#仅对颜色进行总结,并为尺寸和类型添加NA
my.df1.color=my.df1%>%
组别(颜色)%>%
总结(
PriceOne平均值=平均值(PriceOne),
PriceOneStDev=sd(PriceOne),
PriceTwoMean=平均值(PriceTwo),
PriceTwoStDev=sd(PriceTwo))%>%
变异(类型=NA,大小=NA)
#将摘要装订在一起,并进行排序和安排,使其看起来美观
my.df2=
my.df1.color%>%
绑定行(my.df1.colortypesize)%>%
绑定行(my.df1.colortype)%>%
排列(颜色、类型、大小)%>%
选择(颜色、类型、大小、所有内容())
dplyr库允许您分组、汇总和绑定。编辑以添加额外分组。为了简洁起见,我更喜欢@Jimbou的答案——他/她的答案可能只有一行
my.df1 <- data.frame(Colour = c('Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Blue','Red','Red','Red','Red','Red','Red','Red','Red','Red','Red','Red','Red'),
Type = c(1,1,2,2,1,2,1,1,2,2,1,2,1,1,2,2,1,2,1,1,2,2,1,2),
Size = c('S','S','S','S','S','S','M','M','M','M','M','M','S','S','S','S','S','S','M','M','M','M','M','M'),
PriceOne = c(10,15,20,18,19,11,12,16,20,21,10,11,10,15,10,18,20,14,21,15,28,19,10,11),
PriceTwo = c(10,15,10,18,20,14,21,15,28,19,10,11,10,15,20,18,19,11,12,16,20,21,10,11))
library(dplyr)
# make detailed summaries
my.df1.ColourTypeSize = my.df1 %>%
group_by(Colour, Type, Size) %>%
summarise(
PriceOneMean = mean(PriceOne),
PriceOneStDev = sd(PriceOne),
PriceTwoMean = mean(PriceTwo),
PriceTwoStDev = sd(PriceTwo))
my.df1.ColourType = my.df1 %>%
group_by(Colour, Type) %>%
summarise(
PriceOneMean = mean(PriceOne),
PriceOneStDev = sd(PriceOne),
PriceTwoMean = mean(PriceTwo),
PriceTwoStDev = sd(PriceTwo)) %>%
mutate(Size = NA)
# Make summary for colour alone and add NA for Size and Type
my.df1.Colour = my.df1 %>%
group_by(Colour) %>%
summarise(
PriceOneMean = mean(PriceOne),
PriceOneStDev = sd(PriceOne),
PriceTwoMean = mean(PriceTwo),
PriceTwoStDev = sd(PriceTwo)) %>%
mutate(Type = NA, Size = NA)
# Bind the summaries together and sort and arrange to make it look nice
my.df2 =
my.df1.Colour %>%
bind_rows(my.df1.ColourTypeSize) %>%
bind_rows(my.df1.ColourType) %>%
arrange(Colour, Type, Size) %>%
select(Colour, Type, Size, everything())
my.df1%
分组依据(颜色、类型、尺寸)%>%
总结(
PriceOne平均值=平均值(PriceOne),
PriceOneStDev=sd(PriceOne),
PriceTwoMean=平均值(PriceTwo),
PriceTwoStDev=sd(PriceTwo))
my.df1.colorType=my.df1%>%
分组依据(颜色、类型)%>%
总结(
PriceOne平均值=平均值(PriceOne),
PriceOneStDev=sd(PriceOne),
PriceTwoMean=平均值(PriceTwo),
PriceTwoStDev=sd(PriceTwo))%>%
变异(大小=NA)
#仅对颜色进行总结,并为尺寸和类型添加NA
my.df1.color=my.df1%>%
组别(颜色)%>%
总结(
PriceOne平均值=平均值(PriceOne),
PriceOneStDev=sd(PriceOne),
PriceTwoMean=平均值(PriceTwo),
PriceTwoStDev=sd(PriceTwo))%>%
变异(类型=NA,大小=NA)
#将摘要装订在一起,并进行排序和安排,使其看起来美观
my.df2=
my.df1.color%>%
绑定行(my.df1.colortypesize)%>%
绑定行(my.df1.colortype)%>%
排列(颜色、类型、大小)%>%
选择(颜色、类型、大小、所有内容())
您可以试试
library(tidyverse)
as.tbl(my.df1) %>%
mutate(Type=NA, Size=NA) %>%
bind_rows(my.df1) %>%
group_by(Colour, Type, Size) %>%
summarise_all(c("mean", "sd"))
# A tibble: 10 x 7
# Groups: Colour, Type [?]
Colour Type Size PriceOne_mean PriceTwo_mean PriceOne_sd PriceTwo_sd
<fctr> <dbl> <fctr> <dbl> <dbl> <dbl> <dbl>
1 Blue 1 M 12.66667 15.33333 3.055050 5.507571
2 Blue 1 S 14.66667 15.00000 4.509250 5.000000
3 Blue 2 M 17.33333 19.33333 5.507571 8.504901
4 Blue 2 S 16.33333 14.00000 4.725816 4.000000
5 Blue NA <NA> 15.25000 15.91667 4.287932 5.534328
6 Red 1 M 15.33333 12.66667 5.507571 3.055050
7 Red 1 S 15.00000 14.66667 5.000000 4.509250
8 Red 2 M 19.33333 17.33333 8.504901 5.507571
9 Red 2 S 14.00000 16.33333 4.000000 4.725816
10 Red NA <NA> 15.91667 15.25000 5.534328 4.287932
你可以试试
library(tidyverse)
as.tbl(my.df1) %>%
mutate(Type=NA, Size=NA) %>%
bind_rows(my.df1) %>%
group_by(Colour, Type, Size) %>%
summarise_all(c("mean", "sd"))
# A tibble: 10 x 7
# Groups: Colour, Type [?]
Colour Type Size PriceOne_mean PriceTwo_mean PriceOne_sd PriceTwo_sd
<fctr> <dbl> <fctr> <dbl> <dbl> <dbl> <dbl>
1 Blue 1 M 12.66667 15.33333 3.055050 5.507571
2 Blue 1 S 14.66667 15.00000 4.509250 5.000000
3 Blue 2 M 17.33333 19.33333 5.507571 8.504901
4 Blue 2 S 16.33333 14.00000 4.725816 4.000000
5 Blue NA <NA> 15.25000 15.91667 4.287932 5.534328
6 Red 1 M 15.33333 12.66667 5.507571 3.055050
7 Red 1 S 15.00000 14.66667 5.000000 4.509250
8 Red 2 M 19.33333 17.33333 8.504901 5.507571
9 Red 2 S 14.00000 16.33333 4.000000 4.725816
10 Red NA <NA> 15.91667 15.25000 5.534328 4.287932
创建要在子集函数中调用的所有可用特征组合:
call_combo <- function(frame) {
combo_list <- list()
for(i in 1:nrow(frame)) {
combo <- frame[i,c(1,2,3)]
combo_left <- combo[combo != '-']
combo_left_cols <- names(combo[1:length(combo_left)])
call_string <- paste(combo_left_cols, '==', combo_left, '&', sep=' ', collapse=' ')
ind <- unlist(gregexpr('&',call_string))
res <- substring(call_string, 1, ind[length(ind)]-1)
combo_list[i] <- list(res)
}
return(combo_list)
}
my.df2
创建要在子集函数中调用的所有可用特征组合:
call_combo <- function(frame) {
combo_list <- list()
for(i in 1:nrow(frame)) {
combo <- frame[i,c(1,2,3)]
combo_left <- combo[combo != '-']
combo_left_cols <- names(combo[1:length(combo_left)])
call_string <- paste(combo_left_cols, '==', combo_left, '&', sep=' ', collapse=' ')
ind <- unlist(gregexpr('&',call_string))
res <- substring(call_string, 1, ind[length(ind)]-1)
combo_list[i] <- list(res)
}
return(combo_list)
}
my.df2
谢谢,这很好用!我已经编辑了我的MRE,请参阅我的最后一行,以“编辑”开头。我怎样才能使您的代码在第5行和第11行工作?谢谢,这非常好用!我已经编辑了我的MRE,请参阅我的最后一行,以“编辑”开头。我如何使您的代码适用于第5行和第11行?