R中的条件聚合
考虑以下矩阵:R中的条件聚合,r,conditional-statements,aggregation,R,Conditional Statements,Aggregation,考虑以下矩阵: d <- data.frame(c("a","a","a","a","b","b","b","b"),c("a1","a1","a2","a2","a1","a1","a2","a2"),"c","d",c(1:8)) 我想聚合第5列中的值,因此得到以下data.frame: d1 <- data.frame(c("a","a","b","b"),c("a1","a2","a1","a2"),"c","d",c(3,7,11,15)) 也就是说,我想根据第2列中的
d <- data.frame(c("a","a","a","a","b","b","b","b"),c("a1","a1","a2","a2","a1","a1","a2","a2"),"c","d",c(1:8))
我想聚合第5列中的值,因此得到以下data.frame:
d1 <- data.frame(c("a","a","b","b"),c("a1","a2","a1","a2"),"c","d",c(3,7,11,15))
也就是说,我想根据第2列中的名称聚合第5列中的值。因此,我想保留第1、3和4列中的名称在本例中,第3和4列中的名称是相同的,但在本例中,名称不同
如何在R中做到这一点?使用tidyverse,您可以通过按id变量对数据进行分组,然后在这些组中进行汇总来做到这一点:
library(tidyverse)
d %>%
group_by(v1, v2) %>%
summarize(v3 = first(v3),
v4 = first(v4),
v5 = sum(v5))
结果:
# A tibble: 4 x 5
# Groups: v1 [2]
v1 v2 v3 v4 v5
<fct> <fct> <fct> <fct> <int>
1 a a1 c d 3
2 a a2 c d 7
3 b a1 c d 11
4 b a2 c d 15
对first的调用只是为重复值的列任意获取单个值的一种方法。使用data.table:
代码
具体来说,语法遵循dt[i,j,by]。i声明data.table对象的行子集,j声明列表速记。要在此子集上执行的操作的集合,并通过分配变量的分组。在您的例子中,您希望对V1-V2对的V3求和。此外,我们在V4和V5上应用unique以防止重复行
结果
资料
展示你迄今为止所做的工作:使用d,tapplyd[,1],d[,2],d[,3],d[,4],d[,5],sum-但这不起作用。
require(data.table)
d[, .(unique(V3), unique(V4), sum(V5)), .(V1, V2)]
V1 V2 V1 V2 V3
1: a a1 c d 3
2: a a2 c d 7
3: b a1 c d 11
4: b a2 c d 15
d = data.table(V1 = c("a","a","a","a","b","b","b","b"),
V2 = c("a1","a1","a2","a2","a1","a1","a2","a2"),
V3 = "c",
V4 = "d",
V5 = c(1:8))