通过赋值列聚集data.frame中的行_R_Dataframe_Aggregate

通过赋值列聚集data.frame中的行

r dataframe

通过赋值列聚集data.frame中的行,r,dataframe,aggregate,R,Dataframe,Aggregate,我有一个例子data.frame： set.seed(1) df <- data.frame(id = letters[1:10], a = sample(100,10), b = sample(100,10), aggregate_with = c(rep(NA,6),"y","b","b","e"), aggregate_order = c(rep(NA,6),"a,b","a,b","b,a","a,b")) > df id a b

我有一个例子

data.frame

：

set.seed(1)
df <- data.frame(id = letters[1:10], a = sample(100,10), b = sample(100,10),
                 aggregate_with = c(rep(NA,6),"y","b","b","e"), aggregate_order = c(rep(NA,6),"a,b","a,b","b,a","a,b"))

> df
   id  a  b aggregate_with aggregate_order
1   a 27 21           <NA>            <NA>
2   b 37 18           <NA>            <NA>
3   c 57 68           <NA>            <NA>
4   d 89 38           <NA>            <NA>
5   e 20 74           <NA>            <NA>
6   f 86 48           <NA>            <NA>
7   g 97 98              y             a,b
8   h 62 93              b             a,b
9   i 58 35              b             b,a
10  j  6 71              e             a,b

如您所见，聚合了

中第2行的列a
。df

分别是

df

中第2行、第8行和第9行的列

、

和

，反之亦然。聚合

中第5行的a
和b
。df

对

df

中第5行和第10行的

和

列求和。虽然

df

中的第7行有一个

aggregate\u和值，但它在df
中不存在，因此没有被聚合。
我使用的是数据表
库
library(data.table)
dt <- as.data.table(df)

#a table to join with
dt2 <- dt[, list(id = aggregate_with, a, b, aggregate_order)]
#set the right order
dt2[, c('a', 'b') := list(ifelse(aggregate_order == 'a,b', a, b), ifelse(aggregate_order == 'a,b', b, a))]
setkey(dt2, id)

#joining tables
res <- dt2[dt]

#replacing NA's with 0 and summing
for (j in c('a', 'b')) set(res, which(is.na(res[[j]])), j, 0)
res[!aggregate_with %in% id, list(a = sum(a) + i.a[1], b = sum(b) + i.b[1]), by = id]

库（data.table）
dt循环-但是认为有一个更优雅的解决方案。你应该用你所拥有的进行编辑，这样人们就不会花太多的精力去达到你已经达到的程度。
library(data.table)
dt <- as.data.table(df)

#a table to join with
dt2 <- dt[, list(id = aggregate_with, a, b, aggregate_order)]
#set the right order
dt2[, c('a', 'b') := list(ifelse(aggregate_order == 'a,b', a, b), ifelse(aggregate_order == 'a,b', b, a))]
setkey(dt2, id)

#joining tables
res <- dt2[dt]

#replacing NA's with 0 and summing
for (j in c('a', 'b')) set(res, which(is.na(res[[j]])), j, 0)
res[!aggregate_with %in% id, list(a = sum(a) + i.a[1], b = sum(b) + i.b[1]), by = id]