Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/66.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R中的条件聚合_R_Conditional Statements_Aggregation - Fatal编程技术网

R中的条件聚合

R中的条件聚合,r,conditional-statements,aggregation,R,Conditional Statements,Aggregation,考虑以下矩阵: d <- data.frame(c("a","a","a","a","b","b","b","b"),c("a1","a1","a2","a2","a1","a1","a2","a2"),"c","d",c(1:8)) 我想聚合第5列中的值,因此得到以下data.frame: d1 <- data.frame(c("a","a","b","b"),c("a1","a2","a1","a2"),"c","d",c(3,7,11,15)) 也就是说,我想根据第2列中的

考虑以下矩阵:

d <- data.frame(c("a","a","a","a","b","b","b","b"),c("a1","a1","a2","a2","a1","a1","a2","a2"),"c","d",c(1:8))
我想聚合第5列中的值,因此得到以下data.frame:

d1 <- data.frame(c("a","a","b","b"),c("a1","a2","a1","a2"),"c","d",c(3,7,11,15))
也就是说,我想根据第2列中的名称聚合第5列中的值。因此,我想保留第1、3和4列中的名称在本例中,第3和4列中的名称是相同的,但在本例中,名称不同

如何在R中做到这一点?

使用tidyverse,您可以通过按id变量对数据进行分组,然后在这些组中进行汇总来做到这一点:

library(tidyverse)

d %>%
    group_by(v1, v2) %>%
    summarize(v3 = first(v3),
              v4 = first(v4),
              v5 = sum(v5))
结果:

# A tibble: 4 x 5
# Groups:   v1 [2]
  v1    v2    v3    v4       v5
  <fct> <fct> <fct> <fct> <int>
1 a     a1    c     d         3
2 a     a2    c     d         7
3 b     a1    c     d        11
4 b     a2    c     d        15
对first的调用只是为重复值的列任意获取单个值的一种方法。

使用data.table:

代码

具体来说,语法遵循dt[i,j,by]。i声明data.table对象的行子集,j声明列表速记。要在此子集上执行的操作的集合,并通过分配变量的分组。在您的例子中,您希望对V1-V2对的V3求和。此外,我们在V4和V5上应用unique以防止重复行

结果

资料


展示你迄今为止所做的工作:使用d,tapplyd[,1],d[,2],d[,3],d[,4],d[,5],sum-但这不起作用。
require(data.table)
d[, .(unique(V3), unique(V4), sum(V5)), .(V1, V2)]
   V1 V2 V1 V2 V3
1:  a a1  c  d  3
2:  a a2  c  d  7
3:  b a1  c  d 11
4:  b a2  c  d 15
d = data.table(V1 = c("a","a","a","a","b","b","b","b"), 
                V2 = c("a1","a1","a2","a2","a1","a1","a2","a2"), 
                V3 = "c", 
                V4 = "d", 
                V5 = c(1:8))