通过赋值列聚集data.frame中的行

通过赋值列聚集data.frame中的行,r,dataframe,aggregate,R,Dataframe,Aggregate,我有一个例子data.frame: set.seed(1) df <- data.frame(id = letters[1:10], a = sample(100,10), b = sample(100,10), aggregate_with = c(rep(NA,6),"y","b","b","e"), aggregate_order = c(rep(NA,6),"a,b","a,b","b,a","a,b")) > df id a b

我有一个例子
data.frame

set.seed(1)
df <- data.frame(id = letters[1:10], a = sample(100,10), b = sample(100,10),
                 aggregate_with = c(rep(NA,6),"y","b","b","e"), aggregate_order = c(rep(NA,6),"a,b","a,b","b,a","a,b"))

> df
   id  a  b aggregate_with aggregate_order
1   a 27 21           <NA>            <NA>
2   b 37 18           <NA>            <NA>
3   c 57 68           <NA>            <NA>
4   d 89 38           <NA>            <NA>
5   e 20 74           <NA>            <NA>
6   f 86 48           <NA>            <NA>
7   g 97 98              y             a,b
8   h 62 93              b             a,b
9   i 58 35              b             b,a
10  j  6 71              e             a,b

如您所见,聚合了
中第2行的列
a
。df
分别是
df
中第2行、第8行和第9行的列
a
a
b
,反之亦然。聚合
中第5行的
a
b
。df
df
中第5行和第10行的
a
b
列求和。虽然
df
中的第7行有一个
aggregate\u和
值,但它在
df
中不存在,因此没有被聚合。

我使用的是
数据表

library(data.table)
dt <- as.data.table(df)

#a table to join with
dt2 <- dt[, list(id = aggregate_with, a, b, aggregate_order)]
#set the right order
dt2[, c('a', 'b') := list(ifelse(aggregate_order == 'a,b', a, b), ifelse(aggregate_order == 'a,b', b, a))]
setkey(dt2, id)

#joining tables
res <- dt2[dt]

#replacing NA's with 0 and summing
for (j in c('a', 'b')) set(res, which(is.na(res[[j]])), j, 0)
res[!aggregate_with %in% id, list(a = sum(a) + i.a[1], b = sum(b) + i.b[1]), by = id]
库(data.table)

dt循环-但是认为有一个更优雅的解决方案。你应该用你所拥有的进行编辑,这样人们就不会花太多的精力去达到你已经达到的程度。
library(data.table)
dt <- as.data.table(df)

#a table to join with
dt2 <- dt[, list(id = aggregate_with, a, b, aggregate_order)]
#set the right order
dt2[, c('a', 'b') := list(ifelse(aggregate_order == 'a,b', a, b), ifelse(aggregate_order == 'a,b', b, a))]
setkey(dt2, id)

#joining tables
res <- dt2[dt]

#replacing NA's with 0 and summing
for (j in c('a', 'b')) set(res, which(is.na(res[[j]])), j, 0)
res[!aggregate_with %in% id, list(a = sum(a) + i.a[1], b = sum(b) + i.b[1]), by = id]