R 具有唯一名称的所有向量值的平均值_R

R 具有唯一名称的所有向量值的平均值

R 具有唯一名称的所有向量值的平均值,r,R,我有一个非唯一命名值的大列表，即： tscores 11461 11461 11461 11461 14433 -1.966196e+01 7.808853e-01 2.065178e+01 5.630565e+00 -7.295436e+00 14433 14433 14433 14433 14433 2.036339e+00

我有一个非唯一命名值的大列表，即：

tscores
        11461         11461         11461         11461         14433
-1.966196e+01  7.808853e-01  2.065178e+01  5.630565e+00 -7.295436e+00
        14433         14433         14433         14433         14433
 2.036339e+00 -6.704906e+00  1.603803e+00 -1.118324e+01  1.450554e+00
        14102         16153         16189         18563         18563
-1.137429e+01  7.053336e-02  1.011208e+00 -7.811194e+00 -6.749376e-01
        18563         18563         22042         22042         22042
 7.480217e-01 -9.909211e-01 -9.577424e-01 -7.887699e-02 -4.867706e-01

我希望能够更有效地提取与名称对应的所有值的子向量。目前，我正在使用：

u_tscores <- sapply(unique(names(tscores)), function(name, scores) {mean(scores[names(scores)==name])}, scores=tscores)

u\u tscores试试这个：
tapply(tscores, names(tscores), mean)

我敢肯定，如果这是更有效的，但可能不是更低的效率…嘿，
看起来您将多次对其进行子集设置（也就是说，您不会每次仅从该类型的许多元素中选择一次）。您的数据格式似乎不适合此目的。因此，请按名称列出这些值
tvalues <- sapply(unique(names(tscores)), function(x, tscores) as.numeric(tscores[names(tscores) == x])), tscores=tscores)

t值您的最佳选择是在通过split（tscores，names（tscores））
获得的列表上使用lappy
。为您赢得约五倍的速度：
n <- 1000000
tscores <- runif(n)
names(tscores) <- sample(letters,n,replace=T)

system.time(
   X <- tapply(tscores, names(tscores), mean)
)
   user  system elapsed 
   0.89    0.00    0.89 

 system.time(
   X2 <- sapply(unique(names(tscores)), function(name, scores){   
            mean(scores[names(scores)==name])}, scores=tscores)
)
   user  system elapsed 
   0.73    0.05    0.78 

system.time(
  X3 <- unlist(lapply(split(tscores,names(tscores)),mean))
)
   user  system elapsed 
   0.11    0.02    0.13 

n谢谢！那么分组实际上是什么呢？这很有用@文森特：你不是在名字上循环，而是在一个列表上循环，列表是根据名字分开的。这需要更少的计算，因为您不必在每个循环周期中遍历完整的向量tscores。通常，tapply
和apply
在内部依赖于lapply/sapply
，但在准备数据时会有更多开销。正如我在编辑中所展示的，sapply
实际上是lappy
的包装器sapply
with USE.NAMES=FALSE和simplify=FALSE与lappy
完全相同。一般来说，如果您可以使用列表，您将比依赖其他解决方案更快。@Edd:split
返回根据第二个参数拆分的向量列表。当您获得一个列表时，您可以使用apply系列的全部功能。另请参见？拆分。感谢Joris的解释！现在很有道理了。
system.time(X3 <- sapply(split(tscores,names(tscores)),mean))
   user  system elapsed 
   0.14    0.00    0.14