“为什么？”；“矢量化”；这个简单的R循环给出了不同的结果？_R_Loops_For Loop_Vectorization

“为什么？”；“矢量化”；这个简单的R循环给出了不同的结果？

r loops for-loop

“为什么？”；“矢量化”；这个简单的R循环给出了不同的结果？,r,loops,for-loop,vectorization,R,Loops,For Loop,Vectorization,也许是一个非常愚蠢的问题 set.seed(0) x <- round(runif(10), 2) sig <- sample.int(10) # [1] 1 2 9 5 3 4 8 6 7 10 for (i in seq_along(sig)) x[i] <- x[sig[i]] 我正在尝试“矢量化”以下循环： set.seed(0) x <- round(runif(10), 2) # [1] 0.90 0.27 0.37 0.57 0.91 0

也许是一个非常愚蠢的问题

set.seed(0)
x <- round(runif(10), 2)
sig <- sample.int(10)
# [1]  1  2  9  5  3  4  8  6  7 10
for (i in seq_along(sig)) x[i] <- x[sig[i]]

我正在尝试“矢量化”以下循环：

set.seed(0)
x <- round(runif(10), 2)
# [1] 0.90 0.27 0.37 0.57 0.91 0.20 0.90 0.94 0.66 0.63
sig <- sample.int(10)
# [1]  1  2  9  5  3  4  8  6  7 10
for (i in seq_along(sig)) x[i] <- x[sig[i]]
x
# [1] 0.90 0.27 0.66 0.91 0.66 0.91 0.94 0.91 0.94 0.63

怎么了

备注

显然，从输出中，我们可以看到

for

循环和

x[sig]

是不同的。后者的含义很清楚：置换，因此许多人倾向于认为循环只是做了一些错误的事情。但永远不要这么肯定；它可以是一些定义良好的动态过程。本问答的目的不是判断哪一个是正确的，而是解释为什么它们不相等。希望它能为理解“矢量化”提供一个坚实的案例研究。

热身

作为预热，考虑两个更简单的例子。

## example 1
x <- 1:11
for (i in 1:10) x[i] <- x[i + 1]
x
# [1]  2  3  4  5  6  7  8  9 10 11 11

x <- 1:11
x[1:10] <- x[2:11]
x
# [1]  2  3  4  5  6  7  8  9 10 11 11

## example 2
x <- 1:11
for (i in 1:10) x[i + 1] <- x[i]
x
# [1] 1 1 1 1 1 1 1 1 1 1 1

x <- 1:11
x[2:11] <- x[1:10]
x
# [1]  1  1  2  3  4  5  6  7  8  9 10

现在有很多编程语言，比如R。“矢量化”不再明确地指“SIMD”R不是一种可以编程CPU寄存器的语言。R中的“矢量化”只是“SIMD”的一个类比。在之前的问答中：我试图解释这一点。下图说明了如何进行这种类比：

single (assembly) instruction    -> single R instruction
CPU vector registers             -> temporary vectors
parallel processing in registers -> C/C++/FORTRAN loops with temporary vectors

因此，示例1中循环的R“矢量化”类似于

## the C-level loop is implemented by function "["
tmp <- x[2:11]  ## load data into a temporary vector
x[1:10] <- tmp  ## fill temporary vector into x

a[1] <- b[1] + c[1]
a[2] <- b[2] + c[2]
a[3] <- b[3] + c[3]
a[4] <- b[4] + c[4]

展开循环将提供

x[1]  <- x[1]
x[2]  <- x[2]
x[3]  <- x[9]   ## 3rd instruction
x[4]  <- x[5]
x[5]  <- x[3]   ## 5th instruction
x[6]  <- x[4]
x[7]  <- x[8]
x[8]  <- x[6]
x[9]  <- x[7]
x[10] <- x[10]

所以有一个更简单的解释。在循环中，每一步都覆盖

的一个元素，用

的一个其他元素替换它以前的值。所以你得到了你想要的。本质上，这是一种复杂的抽样替换形式（

sample（x，replace=TRUE）

）——您是否需要这种复杂化，取决于您想要实现什么

使用矢量化代码，您只需要请求

的某种排列（不替换），这就是您得到的结果。矢量化代码的作用与循环不同。如果希望通过循环获得相同的结果，则首先需要复制

：

set.seed(0)
x <- x2 <- round(runif(10), 2)
# [1] 0.90 0.27 0.37 0.57 0.91 0.20 0.90 0.94 0.66 0.63
sig <- sample.int(10)
# [1]  1  2  9  5  3  4  8  6  7 10
for (i in seq_along(sig)) x2[i] <- x[sig[i]]
identical(x2, x[sig])
#TRUE

set.seed（0）
这与内存块别名（我以前从未遇到过的术语）无关。以一个特定的排列为例，遍历无论在C语言或汇编语言（或任何语言）级别的实现如何都会发生的赋值；这是任何顺序for循环的行为与任何“真实”置换（通过x[sig]
得到的）发生方式之间的内在联系：
sample(10)
 [1]  3  7  1  5  6  9 10  8  4  2

value at 1 goes to 3, and now there are two of those values
value at 2 goes to 7, and now there are two of those values
value at 3 (which was at 1) now goes back to 1 but the values remain unchanged

。。。可以继续，但这说明了这通常不是一个“真正的”排列，并且非常罕见地会导致值的完全重新分配。我猜想，只有完全有序的排列（我认为只有一种排列，即10:1
）才能产生一组唯一的新x
replicate( 100, {x <- round(runif(10), 2); 
                  sig <- sample.int(10); 
                  for (i in seq_along(sig)){ x[i] <- x[sig[i]]}; 
                  sum(duplicated(x)) } )
 #[1] 4 4 4 5 5 5 4 5 6 5 5 5 4 5 5 6 3 4 2 5 4 4 4 4 3 5 3 5 4 5 5 5 5 5 5 5 4 5 5 5 5 4
 #[43] 5 3 4 6 6 6 3 4 5 3 5 4 6 4 5 5 6 4 4 4 5 3 4 3 4 4 3 6 4 7 6 5 6 6 5 4 7 5 6 3 6 4
 #[85] 8 4 5 5 4 5 5 5 4 5 5 4 4 5 4 5

replicate（100，{x有趣的是，尽管R“矢量化”不同于“SIMD”（正如OP很好地解释的），但在确定循环是否“可矢量化”时，同样的逻辑也适用。下面是一个演示，使用OP自我回答中的示例（稍加修改）
具有“读后写”依赖项的示例1是“可矢量化”的
我同意第一段的一些观点。造成差异的是从x到x的顺序分配与“整体”分配，但决不会过度写入“内存块”。R不会“就地”分配。相反，它会临时复制原始文件并重命名它。我也不会说这是一种危险“矢量化"因为在使用for
-loop时，您并没有真正使用所谓的矢量化。我会认为矢量化的结果是正确的，而for
-loop方法是错误的。如果结果是这样，我会非常惊讶。我会尝试查找更权威的文档。我认为您不理解g如何traceem
工作。试试a@李哲源 @42-还要记住，R3.4.0为顶级循环引入了即时编译。我想最终结果可能取决于编译器的复杂程度。（compiler:：enableJIT（0）
可以在测试时关闭它）编辑删除了我不同意的文本。它是“这里实际上有一个陷阱：地址别名。循环读取x并写入x。它读取的内存块与它写入的内存块重叠。这种自引用引入了循环依赖性，并且对“矢量化”是一种危险”.“for循环错误/混淆不是替换采样。@42当然，我的意思是，您得到的也可能是替换采样的结果。您是对的，有些值被移动了不止一次，但在替换采样中，有些值也可以被选择不止一次。概率分布for-loop过程的概率分布不同于通过替换采样得到的概率分布。你是对的，for
版本进行了更多的替换，因此保留在结果中的原始项会更少。我也做了一个小的模拟，从1:10
进行采样，并且t对于
版本，的结果为5.5，而样本（…，replace=TRUE）
的结果为6.5。根据您的“备注”，您可能应该更改问题的标题。您对“错误”的定义显然与我的不同。确实，对于，循环可能是实现某些随机过程模型的唯一方法。
set.seed(0)
x <- round(runif(10), 2)
sig <- sample.int(10)
# [1]  1  2  9  5  3  4  8  6  7 10
for (i in seq_along(sig)) x[i] <- x[sig[i]]

x[1]  <- x[1]
x[2]  <- x[2]
x[3]  <- x[9]   ## 3rd instruction
x[4]  <- x[5]
x[5]  <- x[3]   ## 5th instruction
x[6]  <- x[4]
x[7]  <- x[8]
x[8]  <- x[6]
x[9]  <- x[7]
x[10] <- x[10]

tmp <- x[sig]
x[] <- tmp

tmp[1]  <- x[1]
tmp[2]  <- x[2]
tmp[3]  <- x[9]
tmp[4]  <- x[5]
tmp[5]  <- x[3]
tmp[6]  <- x[4]
tmp[7]  <- x[8]
tmp[8]  <- x[6]
tmp[9]  <- x[7]
tmp[10] <- x[10]

x[1]  <- tmp[1]
x[2]  <- tmp[2]
x[3]  <- tmp[3]
x[4]  <- tmp[4]
x[5]  <- tmp[5]
x[6]  <- tmp[6]
x[7]  <- tmp[7]
x[8]  <- tmp[8]
x[9]  <- tmp[9]
x[10] <- tmp[10]

for (i in 1:10) tmp[i] <- x[sig[i]]
for (i in 1:10) x[i] <- tmp[i]
rm(tmp); gc()

for (i in 1:num) {
  for (j in 1:num) {
    mat[i, j] <- mat[i, mat[j, "rm"]]
  }
}

mat[1:num, 1:num] <- mat[1:num, mat[1:num, "rm"]]

for (i in 1:num) {
  for (j in 1:num) {
    mat[i, j] <- mat[i, 1 + num + mat[j, "rm"]]
  }
}

set.seed(0)
x <- round(runif(10), 2)
sig <- sample.int(10)
tracemem(x)
#[1] "<0x28f7340>"
for (i in seq_along(sig)) x[i] <- x[sig[i]]
tracemem(x)
#[1] "<0x28f7340>"

set.seed(0)
x <- x2 <- round(runif(10), 2)
# [1] 0.90 0.27 0.37 0.57 0.91 0.20 0.90 0.94 0.66 0.63
sig <- sample.int(10)
# [1]  1  2  9  5  3  4  8  6  7 10
for (i in seq_along(sig)) x2[i] <- x[sig[i]]
identical(x2, x[sig])
#TRUE

sample(10)
 [1]  3  7  1  5  6  9 10  8  4  2

value at 1 goes to 3, and now there are two of those values
value at 2 goes to 7, and now there are two of those values
value at 3 (which was at 1) now goes back to 1 but the values remain unchanged

replicate( 100, {x <- round(runif(10), 2); 
                  sig <- sample.int(10); 
                  for (i in seq_along(sig)){ x[i] <- x[sig[i]]}; 
                  sum(duplicated(x)) } )
 #[1] 4 4 4 5 5 5 4 5 6 5 5 5 4 5 5 6 3 4 2 5 4 4 4 4 3 5 3 5 4 5 5 5 5 5 5 5 4 5 5 5 5 4
 #[43] 5 3 4 6 6 6 3 4 5 3 5 4 6 4 5 5 6 4 4 4 5 3 4 3 4 4 3 6 4 7 6 5 6 6 5 4 7 5 6 3 6 4
 #[85] 8 4 5 5 4 5 5 5 4 5 5 4 4 5 4 5

table( replicate( 1000000, {x <- round(runif(10), 5); 
                            sig <- sample.int(10); 
               for (i in seq_along(sig)){ x[i] <- x[sig[i]]}; 
                            sum(duplicated(x)) } ) )

     0      1      2      3      4      5      6      7      8 
     1    269  13113 126104 360416 360827 125707  13269    294 

// "ex1.c"
#include <stdlib.h>
void ex1 (size_t n, size_t *x) {
  for (size_t i = 1; i < n; i++) x[i - 1] = x[i] + 1;
}

gcc -O2 -c -ftree-vectorize -fopt-info-vec ex1.c
#ex1.c:3:3: note: loop vectorized

// "ex2.c"
#include <stdlib.h>
void ex2 (size_t n, size_t *x) {
  for (size_t i = 1; i < n; i++) x[i] = x[i - 1] + 1;
}

gcc -O2 -c -ftree-vectorize -fopt-info-vec-missed ex2.c
#ex2.c:3:3: note: not vectorized, possible dependence between data-refs
#ex2.c:3:3: note: bad data dependence

// "ex3.c"
#include <stdlib.h>
void ex3 (size_t n, size_t * restrict a, size_t * restrict b, size_t * restrict c) {
  for (size_t i = 0; i < n; i++) a[i] = b[i] + c[i];
}

gcc -O2 -c -ftree-vectorize -fopt-info-vec ex3.c
#ex3.c:3:3: note: loop vectorized