R 列队行进_R_Dataframe - Fatal编程技术网

R 列队行进

r dataframe

R 列队行进,r,dataframe,R,Dataframe,我想在数据框的许多列（十）上使用ave函数： ave(df[,the_cols], df[,c('site', 'month')], FUN = mean) 问题是ave在所有列上同时运行mean函数。有没有办法分别为每个列运行它我试着查看其他函数taply和aggregate是不同的，它们每个组只返回一行。我需要ave行为，即返回与原始df中给出的行数相同的行数。还有一个by函数，但是使用它会非常笨拙，因为它返回一个复杂的列表结构，必须以某种方式进行转换当然存在许多笨拙而丑陋的解决方案（

我想在数据框的许多列（十）上使用

ave

函数：

ave(df[,the_cols], df[,c('site', 'month')], FUN = mean)

问题是

ave

在所有

列上同时运行mean
函数。有没有办法分别为每个列运行它
我试着查看其他函数taply
和aggregate
是不同的，它们每个组只返回一行。我需要ave
行为，即返回与原始df
中给出的行数相同的行数。还有一个by
函数，但是使用它会非常笨拙，因为它返回一个复杂的列表结构，必须以某种方式进行转换
当然存在许多笨拙而丑陋的解决方案（by&do.call、multiple*apply函数调用等），但是否有一些真正简单而优雅的解决方案？
您可以使用by
和colMeans

by(df[,the_cols], df[,c('site', 'month')], FUN = colMeans)

您还可以在lappy
内使用ave
：
res <- lapply(df[,the_cols], function(x) 
                               ave(x, df[,c('site', 'month')], FUN = mean))

data.frame(res) # create data frame

res如果您想要返回data.frame
library(plyr)
## assuming that the_cols are string
## if col index just add the index of site and month
the_cols <- c("site", "month", the_cols)
ddply(df, c('site', 'month'), FUN = numcolwise(mean))[,the_cols]

库（plyr）
##假设这些列是字符串
##如果是col索引，只需添加站点和月份的索引即可
_cols也许我遗漏了什么，但是这里的apply（）
方法将非常有效，不会很难看或者需要任何难看的黑客。一些虚拟数据：
df <- data.frame(A = rnorm(20), B = rnorm(20), site = gl(5,4), month = gl(10, 2))

?？如果确实需要，可以通过data.frame（）
将其强制到数据帧
R> sapply(df[, c("A","B")], ave, df$site, df$month)
            A        B
 [1,]  0.0775  0.04845
 [2,]  0.0775  0.04845
 [3,] -1.5563  0.43443
 [4,] -1.5563  0.43443
 [5,]  0.7193  0.01151
 [6,]  0.7193  0.01151
 [7,] -0.9243 -0.28483
 [8,] -0.9243 -0.28483
 [9,]  0.3316  0.14473
[10,]  0.3316  0.14473
[11,] -0.2539  0.20384
[12,] -0.2539  0.20384
[13,]  0.5558 -0.37239
[14,]  0.5558 -0.37239
[15,]  0.1976 -0.22693
[16,]  0.1976 -0.22693
[17,]  0.2031  1.11041
[18,]  0.2031  1.11041
[19,]  0.3229 -0.53818
[20,]  0.3229 -0.53818

再加一点，怎么样
AVE <- function(df, cols, ...) {
  dots <- list(...)
  out <- sapply(df[, cols], ave, ...)
  out <- data.frame(as.data.frame(dots), out)
  names(out) <- c(paste0("Fac", seq_along(dots)), cols)
  out
}

R> AVE(df, c("A","B"), df$site, df$month)
   Fac1 Fac2       A        B
1     1    1  0.0775  0.04845
2     1    1  0.0775  0.04845
3     1    2 -1.5563  0.43443
4     1    2 -1.5563  0.43443
5     2    3  0.7193  0.01151
6     2    3  0.7193  0.01151
7     2    4 -0.9243 -0.28483
8     2    4 -0.9243 -0.28483
9     3    5  0.3316  0.14473
10    3    5  0.3316  0.14473
11    3    6 -0.2539  0.20384
12    3    6 -0.2539  0.20384
13    4    7  0.5558 -0.37239
14    4    7  0.5558 -0.37239
15    4    8  0.1976 -0.22693
16    4    8  0.1976 -0.22693
17    5    9  0.2031  1.11041
18    5    9  0.2031  1.11041
19    5   10  0.3229 -0.53818
20    5   10  0.3229 -0.53818

请注意所述的输出，但如果需要，可以很容易地对其进行重塑。
正如我在问题中所述，by
不会返回数据帧。
！需要大量难看的代码才能将其恢复到原始结构！看我的question@Tomas函数ave
也不会返回数据帧。哇，sapply
one liner非常简单，非常有效，谢谢！！我不想在sapply
中运行ave
，神奇的codegolf:-）但我讨厌你的ave
功能。如果你想要站点
和月
信息，并且不得不经常这样做，我会很快写一个这样的包装器来帮我做。如果您不需要附加的因素，只需使用一行程序。啊哈，现在我明白了：这只是将站点和月份与结果捆绑在一起。。。顺便说一句，在AVE
函数中使用cbind
不是更好吗？我猜names（）感谢您提供了另一种aggregate
解决方案。是否为组中的每条记录创建一列？看起来很糟糕：）如果两个组没有相同数量的记录怎么办？是的，假设它会弹出一个NA
。我还没查过。试用并反馈…？我总是喜欢只使用基本库，但感谢plyr解决方案！
AVE <- function(df, cols, ...) {
  dots <- list(...)
  out <- sapply(df[, cols], ave, ...)
  out <- data.frame(as.data.frame(dots), out)
  names(out) <- c(paste0("Fac", seq_along(dots)), cols)
  out
}

R> AVE(df, c("A","B"), df$site, df$month)
   Fac1 Fac2       A        B
1     1    1  0.0775  0.04845
2     1    1  0.0775  0.04845
3     1    2 -1.5563  0.43443
4     1    2 -1.5563  0.43443
5     2    3  0.7193  0.01151
6     2    3  0.7193  0.01151
7     2    4 -0.9243 -0.28483
8     2    4 -0.9243 -0.28483
9     3    5  0.3316  0.14473
10    3    5  0.3316  0.14473
11    3    6 -0.2539  0.20384
12    3    6 -0.2539  0.20384
13    4    7  0.5558 -0.37239
14    4    7  0.5558 -0.37239
15    4    8  0.1976 -0.22693
16    4    8  0.1976 -0.22693
17    5    9  0.2031  1.11041
18    5    9  0.2031  1.11041
19    5   10  0.3229 -0.53818
20    5   10  0.3229 -0.53818

R> aggregate(cbind(A, B) ~ site + month, data = df, ave)
   site month     A.1     A.2      B.1      B.2
1     1     1  0.0775  0.0775  0.04845  0.04845
2     1     2 -1.5563 -1.5563  0.43443  0.43443
3     2     3  0.7193  0.7193  0.01151  0.01151
4     2     4 -0.9243 -0.9243 -0.28483 -0.28483
5     3     5  0.3316  0.3316  0.14473  0.14473
6     3     6 -0.2539 -0.2539  0.20384  0.20384
7     4     7  0.5558  0.5558 -0.37239 -0.37239
8     4     8  0.1976  0.1976 -0.22693 -0.22693
9     5     9  0.2031  0.2031  1.11041  1.11041
10    5    10  0.3229  0.3229 -0.53818 -0.53818