基R或dplyr中矩阵内矩阵的平均值

基R或dplyr中矩阵内矩阵的平均值,r,dplyr,R,Dplyr,考虑以下矩阵: tt <- structure(c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 223.26217771938, NA, NA, NA, NA, NA, NA, NA, NA, NA, 233.317380407033, 228.230147000785, NA, NA, NA, NA, NA, NA, NA, NA, 213.976634238414, 202.420354707722, 235.306183514161,

考虑以下矩阵:

  tt <-  structure(c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 223.26217771938, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, 233.317380407033, 228.230147000785, 
NA, NA, NA, NA, NA, NA, NA, NA, 213.976634238414, 202.420354707722, 
235.306183514161, NA, NA, NA, NA, NA, NA, NA, 234.959570990415, 
209.098063118719, 218.561204242656, 222.512920973143, NA, NA, 
NA, NA, NA, NA, 208.300264042079, 215.937490955137, 237.957979483774, 
192.688868386319, 235.076583265965, NA, NA, NA, NA, NA, 206.523606398881, 
223.937491278258, 223.926327170344, 214.32218737219, 226.512692801088, 
201.218786399282, NA, NA, NA, NA, 224.281073655358, 213.943917885038, 
238.593797069413, 203.435493461687, 229.752040252094, 219.155196151038, 
218.091723822799, NA, NA, NA, 220.671701855947, 201.380237362061, 
232.187424293393, 191.10206696946, 234.448288541418, 178.759615126012, 
214.037379912949, 204.514058196497, NA, NA, 232.924880594581, 
229.573517636508, 197.886331008486, 231.900840878165, 221.634834807167, 
227.927620090238, 232.886238322491, 239.428486191598, 231.987068605127, 
NA), .Dim = c(10L, 10L), .Dimnames = list(c("SA1", "SA1", "SA1", 
"SA1", "SA2", "SA2", "SA2", "SA2", "SA2", "SA2"), c("SA1", "SA1", 
"SA1", "SA1", "SA2", "SA2", "SA2", "SA2", "SA2", "SA2")))

我想计算SA1和SA2子矩阵的平均值。所谓sub_矩阵,我指的是只有SA1个相等的行名和列名,也只有SA2个相等的行名和列名。对于SA1,这类似于
平均值(tt[1:4,1:4],na.rm=T)
,但是我的实矩阵比这个示例大得多,因此基本子设置不是一个解决方案,而是通过不同的
行.names
colnames
进行某种分组。如果有人能向我展示一个同时使用base R和dplyr的解决方案,那就太棒了。

我们可以使用
sapply
循环矩阵的所有
唯一的
列名,对它们进行子集划分,并取每个子矩阵的
平均值

sapply(unique(colnames(tt)), function(x) 
     mean(tt[rownames(tt) == x, colnames(tt) == x], na.rm = TRUE))

#  SA1   SA2 
#222.8 221.0 

这就产生了一个名为
sub_list
的向量,该向量以唯一列名的向量开始,然后在子集中迭代,这些名称被替换为means(您可以将它们输出到另一个向量,但如果一个就足够了,为什么要生成两个呢?)


sub_list带有
tidyverse
的选项。我们可以
的“tt”转换为“long”格式。筛选行名和列名相同的行,然后按“Var1”分组,获得“value”列的
mean

library(dplyr)
library(reshape2)
melt(tt) %>% 
   filter(Var1 == Var2) %>%
   group_by(Var1) %>%
   summarise(value = mean(value, na.rm = TRUE))
# A tibble: 2 x 2
#  Var1  value
#  <fct> <dbl>
#1 SA1    223.
#2 SA2    221.
库(dplyr)
图书馆(E2)
熔体(tt)%>%
过滤器(Var1==Var2)%>%
分组依据(Var1)%>%
总结(值=平均值(值,na.rm=真))
#一个tibble:2x2
#Var1值
#   
#1 SA1223。
#2 SA2 221。
sub_list <- unique(colnames(tt))

for(j in 1:length(sub_list)){
  sub_list[j] <- mean(tt[,colnames(tt) == sub_list[j]], na.rm =  TRUE)
}
library(dplyr)
library(reshape2)
melt(tt) %>% 
   filter(Var1 == Var2) %>%
   group_by(Var1) %>%
   summarise(value = mean(value, na.rm = TRUE))
# A tibble: 2 x 2
#  Var1  value
#  <fct> <dbl>
#1 SA1    223.
#2 SA2    221.