对于不包含要分组的变量的所有行，使用group_by并从dplyr汇总_R_Dplyr

对于不包含要分组的变量的所有行，使用group_by并从dplyr汇总

对于不包含要分组的变量的所有行，使用group_by并从dplyr汇总,r,dplyr,R,Dplyr,我有一个数据框，比如 df1 <- data.frame(id = c("A", "A", "B", "B", "B"), cost = c(100, 10, 120, 102, 102) df1在找这样的东西吗？这将首先计算总成本和总行数，然后减去每组的总成本和总行数，并取成本的平均值： sumCost = sum(df1$cost) totRows = nrow(df1) df1 %>% group_by(id) %&g

我有一个数据框，比如

df1 <- data.frame(id = c("A", "A", "B", "B", "B"), 
                  cost = c(100, 10, 120, 102, 102)

df1在找这样的东西吗？这将首先计算总成本和总行数，然后减去每组的总成本和总行数，并取成本的平均值：
sumCost = sum(df1$cost)
totRows = nrow(df1)

df1 %>% 
        group_by(id) %>% 
        summarise(no.c = totRows - n(), 
                  m.costs = (sumCost - sum(cost))/no.c)

# A tibble: 2 x 3
#      id  no.c m.costs
#  <fctr> <int>   <dbl>
#1      A     3     108
#2      B     2      55

sumCost=sum（df1$cost）
totRows=nrow（df1）
df1%>%
分组依据（id）%>%
总结（no.c=totRows-n（），
m、 成本=（总成本-总和（成本））/no.c）
#一个tibble:2x3
#身份证号码c.m.费用
#      
#1 A 3 108
#2B255
您可以使用
引用整个data.frame，它允许您计算组与整体之间的差异：
df1 %>% group_by(id) %>% 
    summarise(n = n(), 
              n_other = nrow(.) - n, 
              mean_cost = mean(cost), 
              mean_other = (sum(.$cost) - sum(cost)) / n_other)

## # A tibble: 2 × 5
##       id     n n_other mean_cost mean_other
##   <fctr> <int>   <int>     <dbl>      <dbl>
## 1      A     2       3        55        108
## 2      B     3       2       108         55

df1%>%group\u by（id）%>%
总结（n=n（），
n_其他=nrow（.）-n，
平均成本=平均（成本），
平均其他=（总和（.$成本）-总和（成本））/n其他）
###A tible:2×5
##id n其他平均值成本平均值其他
##                  
##1A2355108
##2B3210855

从结果中可以看出，对于两个组，您可以只使用rev
，但这种方法可以方便地扩展到更多组或计算。
非常感谢。它可以工作，但是当我尝试像这样的东西时，df1.a%summary（nrow_other=nrow（.[！.$id==id，]）
然后它就工作了，但是它会返回警告：In
==.default（.$id，c（2L，2L，2L））：较长的对象长度不是较短对象长度的倍数。你能概括一下我的错误是什么吗？您的行数解决方案有效，但我希望在Summary函数中为不包含分组依据的变量的所有行创建子集。谢谢大家!id
和$id
既不是相同的长度，也不是相同的长度1，因此当您比较它们时，==
正在使用向量循环使id
成为正确的长度。因为它都是一个值，所以很好，但是用unique
将其包装起来，使其消失，或者使用上面的方法。好的，太棒了，谢谢。您能否更详细地概述一下您对unique
的提示？如果可能，请举例说明。摘要（nrow_other=nrow（..$id！=unique（id），]）
。另一种选择是summary（nrow_other=sum（.$id！=unique（id））
，尽管我仍然坚持使用summary（nrow_other=nrow（.）-n（））
，只要它是按id进行分组的。
df1 %>% group_by(id) %>% 
    summarise(n = n(), 
              n_other = nrow(.) - n, 
              mean_cost = mean(cost), 
              mean_other = (sum(.$cost) - sum(cost)) / n_other)

## # A tibble: 2 × 5
##       id     n n_other mean_cost mean_other
##   <fctr> <int>   <int>     <dbl>      <dbl>
## 1      A     2       3        55        108
## 2      B     3       2       108         55