Dplyr对每个结果进行汇总_R_Dplyr_Tidyr

Dplyr对每个结果进行汇总

Dplyr对每个结果进行汇总,r,dplyr,tidyr,R,Dplyr,Tidyr,我有这样一个数据帧： metric1 metric2 metric3 field1 field2 1 1.07809668 4.2569882 7.1710095 L S1 2 0.56174763 1.2660273 -0.3751915 L S2 3 1.17447327 5.5186679 11.6868322 L S2 4 0.32830724 -0.8374830 1.8973718

我有这样一个数据帧：

    metric1    metric2    metric3 field1 field2
1   1.07809668  4.2569882  7.1710095      L     S1
2   0.56174763  1.2660273 -0.3751915      L     S2
3   1.17447327  5.5186679 11.6868322      L     S2
4   0.32830724 -0.8374830  1.8973718      S     S2
5  -0.51213503 -0.3076640 10.0730274      S     S1
6   0.24133119  2.7984703 15.9622215      S     S1
7   1.96664414  0.1818531  2.7416768      S     S3
8   0.06669409  3.8652075 10.5066330      S     S3
9   1.14660437  8.5703119  3.4294062      L     S4
10 -0.72785683  9.3320762  1.3827989      L     S4

我显示了两个字段，但还有几个字段。我需要对每个字段分组的指标求和，例如字段1：

DF %>% group_by(field1) %>% summarise_each(funs(sum),metric1,metric2,metric3)

我可以对列为sum（metric1）、sum（metric2）、sum（metric3）的每个字段执行此操作，但我需要的表输出如下所示：

L(field1) S(field1) S1(field2)  S2(field2) S3(field2)  S4(field2)
sum(metric1)

sum(metric2)

sum(metric3)

我相信使用tidyr和dplyr一定有办法做到这一点，但无法解决它

请尝试从

reforme2

包中重铸


library(reshape2)
recast(DF, variable ~ field1 + field2, sum)
#   variable     L_S1      L_S2       L_S4       S_S1       S_S2      S_S3
# 1  metric1 1.078097  1.736221  0.4187475 -0.2708038  0.3283072  2.033338
# 2  metric2 4.256988  6.784695 17.9023881  2.4908063 -0.8374830  4.047061
# 3  metric3 7.171010 11.311641  4.8122051 26.0352489  1.8973718 13.248310

这和
dcast(melt(DF, c("field1", "field2")), variable ~ field1 + field2, sum)

如果需要，还可以将其与tidyr:：gather
组合，但不能使用tidyr:：spread
，因为它没有乐趣。aggregate
参数
DF %>%
  gather(variable, value, -(field1:field2)) %>%
  dcast(variable ~ field1 + field2, sum)
#   variable     L_S1      L_S2       L_S4       S_S1       S_S2      S_S3
# 1  metric1 1.078097  1.736221  0.4187475 -0.2708038  0.3283072  2.033338
# 2  metric2 4.256988  6.784695 17.9023881  2.4908063 -0.8374830  4.047061
# 3  metric3 7.171010 11.311641  4.8122051 26.0352489  1.8973718 13.248310

对于所有dplyr
和tidyr
解决方案，您可以执行以下操作：
library(dplyr)
library(tidyr)

df %>% 
  unite(variable, field1, field2) %>% 
  group_by(variable) %>% 
  summarise_each(funs(sum)) %>% 
  gather(metrics, value, -variable) %>%
  spread(variable, value)

其中：
#Source: local data frame [3 x 7]
#
#  metrics     L_S1      L_S2       L_S4       S_S1       S_S2      S_S3
#1 metric1 1.078097  1.736221  0.4187475 -0.2708038  0.3283072  2.033338
#2 metric2 4.256988  6.784695 17.9023881  2.4908063 -0.8374830  4.047061
#3 metric3 7.171010 11.311641  4.8122051 26.0352489  1.8973718 13.248310

#Source: local data frame [3 x 7]
#
#  metrics         L         S         S1        S2        S3         S4
#1 metric1  3.233065  2.090842  0.8072928  2.064528  2.033338  0.4187475
#2 metric2 28.944071  5.700384  6.7477945  5.947212  4.047061 17.9023881
#3 metric3 23.294855 41.180931 33.2062584 13.209013 13.248310  4.8122051

编辑
在阅读了您对David回答的评论后，我认为这更接近您的预期结果：
field1 <- group_by(df, field = field1) %>% summarise_each(funs(sum), -(field1:field2)) 
field2 <- group_by(df, field = field2) %>% summarise_each(funs(sum), -(field1:field2)) 

bind_rows(field1, field2) %>%
  gather(metrics, value, -field) %>%
  spread(field, value)

谢谢你的回答，但是，这并没有给出我想要的。这些列是field1和field2的兴趣部分。我想要一个表，首先将field1的元素作为列，然后将field2的元素作为列……等等。非常感谢Steven！我需要更多的练习使用tidyr。