如何从R中的分组数据帧规范化子组
我有一个数据框架,包含两个数字变量fatcontent和saltcontent,再加上两个因子变量cond和spice,它们描述了不同的处理方法。在该数据框中,对数值变量进行了两次测量如何从R中的分组数据帧规范化子组,r,dplyr,R,Dplyr,我有一个数据框架,包含两个数字变量fatcontent和saltcontent,再加上两个因子变量cond和spice,它们描述了不同的处理方法。在该数据框中,对数值变量进行了两次测量 a <- data.frame(cond = rep(c("uncooked", "fried", "steamed", "baked", "grilled"), each = 2, times = 3), spice = rep
a <- data.frame(cond = rep(c("uncooked", "fried", "steamed", "baked", "grilled"),
each = 2, times = 3),
spice = rep(c("none", "chilli", "basil"), each = 10),
fatcontent = c(4, 5, 6828, 7530, 6910, 7132, 5885, 613, 2845, 2867,
25, 18, 2385, 33227, 4233, 4023, 953, 1025, 4465, 5016,
5, 5, 10235, 12545, 5511, 5111, 596, 585, 4012, 3633),
saltcontent = c(2, 5, 4733, 5500, 5724, 15885, 14885, 217, 193, 148,
6, 4, 26738, 24738, 22738, 23738, 267, 256, 1121, 1558,
1, 1, 21738, 20738, 26738, 27738, 195, 202, 129, 131)
)
正常化后:
cond spice fatcontent saltcontent
1 uncooked none 0.8888889 0.5714286
2 uncooked none 1.1111111 1.4285714
3 fried none 1517.3333333 1352.2857143
4 fried none 1673.3333333 1571.4285714
5 steamed none 1535.5555556 1635.4285714
6 steamed none 1584.8888889 4538.5714286
7 baked none 1307.7777778 4252.8571429
8 baked none 136.2222222 62.0000000
9 grilled none 632.2222222 55.1428571
10 grilled none 637.1111111 42.2857143
我的问题是如何对数据框中的所有组和变量执行此操作?我想我可以使用dplyr包,但我不确定什么是最好的方法。谢谢你的帮助 您需要做的就是根据条件和香料进行分组,如下所示:
library(dplyr)
a %>% group_by(spice, cond) %>%
mutate(fat.norm = fatcontent / mean(fatcontent),
salt.norm = saltcontent / mean(saltcontent))
# Source: local data frame [90 x 6]
# Groups: spice, cond
#
# cond spice fatcontent saltcontent fat.norm salt.norm
# 1 uncooked none 4 2 0.8888889 0.57142857
# 2 uncooked none 5 5 1.1111111 1.42857143
# 3 fried none 6828 4733 0.9511074 0.92504642
# 4 fried none 7530 5500 1.0488926 1.07495358
# 5 steamed none 6910 5724 0.9841903 0.52977926
# 6 steamed none 7132 15885 1.0158097 1.47022074
# 7 baked none 5885 14885 1.8113266 1.97126208
# 8 baked none 613 217 0.1886734 0.02873792
# 9 grilled none 2845 193 0.9961485 1.13196481
# 10 grilled none 2867 148 1.0038515 0.86803519
或者,如果不想指定每个列,可以使用mutate\u each
或summary\u each
:
group.norm <- function(x) {
x / mean(x)
}
a %>% group_by(spice, cond) %>%
mutate_each(funs(group.norm))
group.norm%group\u by(spice,cond)%>%
每个变异(funs(group.norm))
您可以在
mutate_each()
中排除列或仅指定特定列,例如mutate_each(funs(group.norm),-notthisone)
或mutate_each(funs(group.norm),onlythisone)
我想这就是您想要的。您希望使用未煮熟的数据点找到每个spice条件的平均值。这是我第一步做的事情。然后,我想将ana
中的fatmean
和saltmean
添加到您的数据框a
。如果您的数据非常庞大,这可能不是一种节省内存的方法。但是,我使用了left\u join
来合并ana
和a
。一、 然后,针对每个spice条件,中的除法是否发生了变异。最后,我删除了两列,用于使用select
整理结果
### Find mean for each spice condition using uncooked data points
ana <- group_by(filter(a, cond == "uncooked"), spice) %>%
summarise(fatmean = mean(fatcontent), saltmean = mean(saltcontent))
# spice fatmean saltmean
#1 basil 5.0 1.0
#2 chilli 21.5 5.0
#3 none 4.5 3.5
left_join(a, ana, by = "spice") %>%
group_by(spice) %>%
mutate(fatcontent = fatcontent / fatmean,
saltcontent = saltcontent / saltmean) %>%
select(-c(fatmean, saltmean))
# A part of the results
# cond spice fatcontent saltcontent
#1 uncooked none 0.8888889 0.5714286
#2 uncooked none 1.1111111 1.4285714
#3 fried none 1517.3333333 1352.2857143
#4 fried none 1673.3333333 1571.4285714
#5 steamed none 1535.5555556 1635.4285714
#6 steamed none 1584.8888889 4538.5714286
#7 baked none 1307.7777778 4252.8571429
#8 baked none 136.2222222 62.0000000
#9 grilled none 632.2222222 55.1428571
#10 grilled none 637.1111111 42.2857143
标准化数据的一种简洁方法是在平均值计算中包含“未烹饪”条件,这样您就不需要过滤、汇总、合并和重新计算。使用mutate\u each执行此操作意味着您只需键入一次
group_by(a, spice) %>%
mutate_each(funs(./mean(.[cond == "uncooked"])), -cond)
#Source: local data frame [30 x 4]
#Groups: spice
#
# cond spice fatcontent saltcontent
#1 uncooked none 0.8888889 5.714286e-01
#2 uncooked none 1.1111111 1.428571e+00
#3 fried none 1517.3333333 1.352286e+03
#4 fried none 1673.3333333 1.571429e+03
#5 steamed none 1535.5555556 1.635429e+03
#6 steamed none 1584.8888889 4.538571e+03
#7 baked none 1307.7777778 4.252857e+03
#8 baked none 136.2222222 6.200000e+01
#9 grilled none 632.2222222 5.514286e+01
#10 grilled none 637.1111111 4.228571e+01
# ... etc
谢谢jazurro,这正是我想要做的。干杯Alex@karnowski不客气。:)作为第一个问题,这篇文章写得非常好!阿克,我误读了操作。这只是通过组平均值,而不是未煮熟的平均值来正常化@爵士乐的答案是正确的。超级简洁!这太棒了+1.
group_by(filter(a, cond == "uncooked"), spice) %>%
summarise(fatmean = mean(fatcontent), saltmean = mean(saltcontent)) %>%
left_join(a, ., by = "spice") %>% #right_join is possible with the dev dplyr
group_by(spice) %>%
mutate(fatcontent = fatcontent / fatmean,
saltcontent = saltcontent / saltmean) %>%
select(-c(fatmean, saltmean))
group_by(a, spice) %>%
mutate_each(funs(./mean(.[cond == "uncooked"])), -cond)
#Source: local data frame [30 x 4]
#Groups: spice
#
# cond spice fatcontent saltcontent
#1 uncooked none 0.8888889 5.714286e-01
#2 uncooked none 1.1111111 1.428571e+00
#3 fried none 1517.3333333 1.352286e+03
#4 fried none 1673.3333333 1.571429e+03
#5 steamed none 1535.5555556 1.635429e+03
#6 steamed none 1584.8888889 4.538571e+03
#7 baked none 1307.7777778 4.252857e+03
#8 baked none 136.2222222 6.200000e+01
#9 grilled none 632.2222222 5.514286e+01
#10 grilled none 637.1111111 4.228571e+01
# ... etc