R 按列和行组快速聚合矩阵

R 按列和行组快速聚合矩阵,r,matrix,aggregate,R,Matrix,Aggregate,我有一个大矩阵mat,其中有行名group\u label\u x和列名group\u label\u y。我想通过group\u label\u x和group\u label\u y将mat聚合成ave\u mat,其中ave\u mat[I,j]的值是mat[group\u label\u x[I],group\u label\u y[j]的平均值。这可以通过使用双forloop或应用两次aggregate函数(aggregate(mat,by=list(group\u label\u x

我有一个大矩阵
mat
,其中有行名
group\u label\u x
和列名
group\u label\u y
。我想通过
group\u label\u x
group\u label\u y
mat
聚合成
ave\u mat
,其中
ave\u mat[I,j]
的值是
mat[group\u label\u x[I],group\u label\u y[j]
的平均值。这可以通过使用双forloop或应用两次
aggregate
函数(
aggregate(mat,by=list(group\u label\u x),FUN='mean')
)来实现。但是有什么方法可以实现更快的速度吗?(因为我有许多矩阵要聚合)

下面的代码生成一个大约包含1E4行和2E4列的演示随机矩阵,我想将其聚合为~1E3 x 1E3矩阵:

set.seed(1)

dim_x_raw = 1E4
dim_y_raw = 2E4

n_groups_x = 1E3
n_groups_y = 1E3

group_len_x = diff(sort(sample( 1:dim_x_raw, n_groups_x )))
group_label_x = rep( paste0('group_', 1:length(group_len_x)), group_len_x )

group_len_y = diff(sort(sample( 1:dim_y_raw, n_groups_y )))
group_label_y = rep( paste0('group_', 1:length(group_len_y)), group_len_y )

mat = matrix( runif( length(group_label_x)*length(group_label_y) ), length(group_label_x) )

######################################
我的聚合代码(速度较慢):

你可以试试

library(data.table)
# add row and colnames
mat = matrix(runif( length(group_label_x)*length(group_label_y)), length(group_label_x), 
              dimnames = list(group_label_x, group_label_y))
# transform to data.table
mat_dt <- data.table(mat, keep.rownames = TRUE, stringsAsFactors = FALSE)
rm(mat) #rmove the old matrix
# melt, summarise per group and calculate mean
mat_dt <- melt(mat_dt, id.vars = "rn")
head(mat_dt)
        rn variable     value
1: group_1  group_1 0.8718050
2: group_1  group_1 0.9671970
3: group_1  group_1 0.8669163
4: group_1  group_1 0.4377153
5: group_1  group_1 0.1919378
6: group_1  group_1 0.0822944
res <- mat_dt[,.(Mean=mean(value)),.(rn, variable)]
head(res)
        rn variable      Mean
1: group_1  group_1 0.4888935
2: group_2  group_1 0.3903115
3: group_3  group_1 0.4601481
4: group_4  group_1 0.5023852
5: group_5  group_1 0.5067483
6: group_6  group_1 0.4851856
dim(res)
[1] 998001      3
库(data.table)
#添加行和列名称
mat=矩阵(runif(长度(组标签)*长度(组标签)),长度(组标签),
dimnames=列表(组标签x、组标签y))
#转换为data.table

您可以试用dplyr软件包吗?一般来说,
data.table
tidyverse
在R中速度非常快。是否有一个函数将res转换为矩阵ave_mat,其中ave_mat[i,j]是mat[group_label_x[i],group_label_y[j]的平均值?实际上,我无法检索res[[2]],这给了我一个错误:as.character.factor(x)中的错误:格式错误的factor。使用方法dcast(res,rn~variable,value.var=“Mean”)也会给出此错误
library(data.table)
# add row and colnames
mat = matrix(runif( length(group_label_x)*length(group_label_y)), length(group_label_x), 
              dimnames = list(group_label_x, group_label_y))
# transform to data.table
mat_dt <- data.table(mat, keep.rownames = TRUE, stringsAsFactors = FALSE)
rm(mat) #rmove the old matrix
# melt, summarise per group and calculate mean
mat_dt <- melt(mat_dt, id.vars = "rn")
head(mat_dt)
        rn variable     value
1: group_1  group_1 0.8718050
2: group_1  group_1 0.9671970
3: group_1  group_1 0.8669163
4: group_1  group_1 0.4377153
5: group_1  group_1 0.1919378
6: group_1  group_1 0.0822944
res <- mat_dt[,.(Mean=mean(value)),.(rn, variable)]
head(res)
        rn variable      Mean
1: group_1  group_1 0.4888935
2: group_2  group_1 0.3903115
3: group_3  group_1 0.4601481
4: group_4  group_1 0.5023852
5: group_5  group_1 0.5067483
6: group_6  group_1 0.4851856
dim(res)
[1] 998001      3
system.time(
 res <- melt(data.table(mat, keep.rownames = TRUE, stringsAsFactors = FALSE), id.vars = "rn")[,.(Mean=mean(value)),.(rn, variable)]
+ )
       User      System verstrichen 
       8.15        0.01        8.19