用apply-in-R对同一矩阵进行重复子集_R_Subset_Apply

用apply-in-R对同一矩阵进行重复子集

用apply-in-R对同一矩阵进行重复子集,r,subset,apply,R,Subset,Apply,动机：我目前正试图重新思考我的编码，比如尽可能排除for循环。下面的问题可以很容易地用常规for循环解决，但我想知道R是否提供了一种利用apply族使问题更容易解决的可能性问题：我有一个矩阵，比如xnxk矩阵和两个开始和停止索引的矩阵，分别称为index.start和index.stops。它们的大小为nxb，对于某个整数m，它保持index.stops=index.start+m。每对index.start[i，j]和index.stops[i，j]都需要将X子集为X[index.start

动机：我目前正试图重新思考我的编码，比如尽可能排除for循环。下面的问题可以很容易地用常规for循环解决，但我想知道R是否提供了一种利用apply族使问题更容易解决的可能性

问题：我有一个矩阵，比如xnxk矩阵和两个开始和停止索引的矩阵，分别称为index.start和index.stops。它们的大小为nxb，对于某个整数m，它保持index.stops=index.start+m。每对index.start[i，j]和index.stops[i，j]都需要将X子集为X[index.start[i，j]：index.stops[i，j]，]。也就是说，他们应该选择索引范围内X的所有行。我可以使用应用函数之一解决此问题吗

应用：对于理解我的问题不一定重要。如果您感兴趣，这是时间序列应用程序中具有块的引导应用程序所必需的。X代表原始样本。index.starts作为replicateResponsiveNumber、sample.intn-r、ceilingn/r、replace=TRUE进行采样，index.stop作为index.stop=index.starts+m进行采样。最后我想要的是一个X行的集合。特别是，我想从X对重复数乘以m个长度为r的块进行重采样

例如：

这可能比你想要/需要的要复杂得多，但这里有第一种方法。如果这对你有任何帮助，请发表评论，我很乐意提供帮助

我的方法使用多个*apply函数。第一个lappy在1:B的情况下循环，首先计算起点和终点，然后将起点和终点与子集编号合并到take.rows中。接下来，初始矩阵由take.rows子集并在列表中返回。作为最后一步，将子集矩阵的每一列的标准偏差作为伪函数

带有大量注释的代码如下所示：

# you can use lapply in parallel mode if you want to speed up code...
lapply(1:B, function(i){
  starts <- sample.int((n-r), ceiling(n/r), replace=TRUE)
  # [1] 64 22 84 26 40  7 66 12 25 15
  ends <- starts + r

  take.rows <- Map(":", starts, ends)
#   [[1]]
#   [1] 72 73 74 75 76 77 78 79 80 81 82
#   ...

  res <- lapply(take.rows, function(subs) X[subs, ])
#   res is now a list of 10 with the ten subsets
#   [[1]]
#   [,1]        [,2]
#   [1,]  0.2658915 -0.18265235
#   [2,]  1.7397478  0.66315385
#  ...

  # say you want to compute something (sd in this case) you can do the following
  # but better you do the computing directly in the former "lapply(take.rows...)"
  res2 <- t(sapply(res, function(tmp){
    apply(tmp, 2, sd)
  })) # simplify into a vector/data.frame
#   [,1]      [,2]
#   [1,] 1.2345833 1.0927203
#   [2,] 1.1838110 1.0767433
#   [3,] 0.9808146 1.0522117
#   ...
  return(res2)
})

这是否为您指明了正确的方向/为您提供了答案？

您能否给出一个包含一些虚假数据的最小工作示例，例如n=3和k=4或类似的数据？让它更容易理解和解决…嘿，大卫，我希望这对你有帮助！也许，您需要相应地映射：、index.start、index.stops和子集X？是否可以包含您当前使用的for循环？我还在努力让我知道你在做什么！大卫，这太棒了！我甚至不认为它很复杂，而且现在我看到Map命令非常有意义。我真的很喜欢您获得res/res2的解决方案！还感谢您的过度评论，这真的帮助了我！：总是一件愉快的事。最后一件事，如果您想加快代码的速度，可以使用snowfall和sfClusterApplyLB并行执行。请看我之前写的一篇博客文章：。

# you can use lapply in parallel mode if you want to speed up code...
lapply(1:B, function(i){
  starts <- sample.int((n-r), ceiling(n/r), replace=TRUE)
  # [1] 64 22 84 26 40  7 66 12 25 15
  ends <- starts + r

  take.rows <- Map(":", starts, ends)
#   [[1]]
#   [1] 72 73 74 75 76 77 78 79 80 81 82
#   ...

  res <- lapply(take.rows, function(subs) X[subs, ])
#   res is now a list of 10 with the ten subsets
#   [[1]]
#   [,1]        [,2]
#   [1,]  0.2658915 -0.18265235
#   [2,]  1.7397478  0.66315385
#  ...

  # say you want to compute something (sd in this case) you can do the following
  # but better you do the computing directly in the former "lapply(take.rows...)"
  res2 <- t(sapply(res, function(tmp){
    apply(tmp, 2, sd)
  })) # simplify into a vector/data.frame
#   [,1]      [,2]
#   [1,] 1.2345833 1.0927203
#   [2,] 1.1838110 1.0767433
#   [3,] 0.9808146 1.0522117
#   ...
  return(res2)
})