Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/matlab/14.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Performance Matlab';s bsxfun()-如何解释沿不同维度扩展时的性能差异?_Performance_Matlab_Matrix_Vectorization_Bsxfun - Fatal编程技术网

Performance Matlab';s bsxfun()-如何解释沿不同维度扩展时的性能差异?

Performance Matlab';s bsxfun()-如何解释沿不同维度扩展时的性能差异?,performance,matlab,matrix,vectorization,bsxfun,Performance,Matlab,Matrix,Vectorization,Bsxfun,在我的工作(计量经济学/统计学)中,我经常需要将不同大小的矩阵相乘,然后对得到的矩阵执行额外的操作。我一直依靠bsxfun()对代码进行矢量化,通常我发现它比repmat()更有效。但我不明白的是,为什么有时候沿着不同的维度展开矩阵时,bsxfun()的性能会有很大的不同 考虑这个具体的例子: x = ones(j, k, m); beta = rand(k, m, s); exp_xBeta = zeros(j, m, s); for im = 1 : m for

在我的工作(计量经济学/统计学)中,我经常需要将不同大小的矩阵相乘,然后对得到的矩阵执行额外的操作。我一直依靠
bsxfun()
对代码进行矢量化,通常我发现它比
repmat()
更有效。但我不明白的是,为什么有时候沿着不同的维度展开矩阵时,
bsxfun()
的性能会有很大的不同

考虑这个具体的例子:

x      = ones(j, k, m);
beta   = rand(k, m, s);

exp_xBeta   = zeros(j, m, s);
for im = 1 : m
    for is = 1 : s
        xBeta                = x(:, :, im) * beta(:, im, is);
        exp_xBeta(:, im, is) = exp(xBeta);
    end
end

y = mean(exp_xBeta, 3);
上下文

我们有来自m个市场的数据,在每个市场中,我们要计算exp(X*beta)的期望值,其中X是一个jxk矩阵,beta是一个kx1随机向量。我们通过monte-carlo积分计算该期望值-绘制β的s图,计算每次绘制的exp(X*beta),然后取平均值。通常我们用m>k>j来获取数据,并且我们使用非常大的s。在这个例子中,我简单地让X是一个1的矩阵

我使用
bsxfun()
做了3个版本的矢量化,它们的不同之处在于X和beta的形状:

矢量化1

x1      = x;                                    % size [ j k m 1 ]
beta1   = permute(beta, [4 1 2 3]);             % size [ 1 k m s ]

tic
xBeta       = bsxfun(@times, x1, beta1);
exp_xBeta   = exp(sum(xBeta, 2));
y1          = permute(mean(exp_xBeta, 4), [1 3 2 4]);   % size [ j m ]
time1       = toc;
矢量化2

x2      = permute(x, [4 1 2 3]);                % size [ 1 j k m ]
beta2   = permute(beta, [3 4 1 2]);             % size [ s 1 k m ]

tic
xBeta       = bsxfun(@times, x2, beta2);
exp_xBeta   = exp(sum(xBeta, 3));
y2          = permute(mean(exp_xBeta, 1), [2 4 1 3]);   % size [ j m ]
time2       = toc;
矢量化3

x3      = permute(x, [2 1 3 4]);                % size [ k j m 1 ]
beta3   = permute(beta, [1 4 2 3]);             % size [ k 1 m s ]

tic
xBeta       = bsxfun(@times, x3, beta3);
exp_xBeta   = exp(sum(xBeta, 1));
y3          = permute(mean(exp_xBeta, 4), [2 3 1 4]);    % size [ j m ]
time3       = toc;
这就是他们的表现(通常我们用m>k>j获得数据,我们使用了非常大的s):

j=5,k=15,m=100,s=2000

For-loop version took 0.7286 seconds.
Vectorized version 1 took 0.0735 seconds.
Vectorized version 2 took 0.0369 seconds.
Vectorized version 3 took 0.0503 seconds.
For-loop version took 2.7815 seconds.
Vectorized version 1 took 0.3565 seconds.
Vectorized version 2 took 0.2657 seconds.
Vectorized version 3 took 0.3433 seconds.
For-loop version took 3.4881 seconds.
Vectorized version 1 took 1.0687 seconds.
Vectorized version 2 took 0.8465 seconds.
Vectorized version 3 took 0.9414 seconds.
j=10,k=15,m=150,s=5000

For-loop version took 0.7286 seconds.
Vectorized version 1 took 0.0735 seconds.
Vectorized version 2 took 0.0369 seconds.
Vectorized version 3 took 0.0503 seconds.
For-loop version took 2.7815 seconds.
Vectorized version 1 took 0.3565 seconds.
Vectorized version 2 took 0.2657 seconds.
Vectorized version 3 took 0.3433 seconds.
For-loop version took 3.4881 seconds.
Vectorized version 1 took 1.0687 seconds.
Vectorized version 2 took 0.8465 seconds.
Vectorized version 3 took 0.9414 seconds.
j=15,k=35,m=150,s=5000

For-loop version took 0.7286 seconds.
Vectorized version 1 took 0.0735 seconds.
Vectorized version 2 took 0.0369 seconds.
Vectorized version 3 took 0.0503 seconds.
For-loop version took 2.7815 seconds.
Vectorized version 1 took 0.3565 seconds.
Vectorized version 2 took 0.2657 seconds.
Vectorized version 3 took 0.3433 seconds.
For-loop version took 3.4881 seconds.
Vectorized version 1 took 1.0687 seconds.
Vectorized version 2 took 0.8465 seconds.
Vectorized version 3 took 0.9414 seconds.
为什么版本2始终是最快的版本?起初,我认为性能优势是因为s被设置为维度1,而Matlab可能能够更快地计算,因为它以列的主要顺序存储数据。但是Matlab的分析器告诉我,计算平均值所花费的时间并不重要,而且在所有3个版本中都差不多。Matlab花了大部分时间使用
bsxfun()
评估行,这也是3个版本中运行时差异最大的地方

有没有想过为什么版本1总是最慢的,而版本2总是最快的

我已在此处更新了测试代码:

编辑:此帖子的早期版本不正确
beta
的大小应该是
(k,m,s)

当然是矢量化事物的好工具之一,但是如果你能以某种方式引入
矩阵乘法
,那将是最好的方法,就像

在这里,您似乎可以使用
矩阵乘法
获得
exp\u xBeta
类似的结果-

[m1,n1,r1] = size(x);
n2 = size(beta,2);
exp_xBeta_matmult = reshape(exp(reshape(permute(x,[1 3 2]),[],n1)*beta),m1,r1,n2)
或者直接获取
y
,如下所示-

y_matmult = reshape(mean(exp(reshape(permute(x,[1 3 2]),[],n1)*beta),2),m1,r1)
解释

为了更详细地解释它,我们有如下尺寸-

x      : (j, k, m)
beta   : (k, s)
我们的最终目标是使用矩阵乘法从
x
beta
中“消除”k。因此,我们可以使用
permute
x
中的
k
推到末尾,并将
k
作为行重新整形为2D,即(j*m,k),然后使用
beta
(k,s)执行矩阵乘法,得到(j*m,s)。然后可以将产品重塑为3D阵列(j、m、s),并执行元素级指数运算,即
exp\u xBeta


现在,如果最终目标是
y
,这是沿着
expxbeta
的第三维获得平均值,那么它将相当于沿着矩阵乘法乘积(j*m,s)的行计算平均值,然后再重新整形为(j,m)为了让我们直接得到y,我今天早上做了更多的实验。这似乎是因为Matlab毕竟是按列的主要顺序存储数据的

在做这些实验的过程中,我还添加了矢量化版本4,该版本做了同样的事情,但其尺寸顺序与版本1-3略有不同


总而言之,以下是所有4个版本中
x
beta
的订购方式:

矢量化1:

x       :   (j, k, m, 1)
beta    :   (1, k, m, s)
矢量化2:

x       :   (1, j, k, m)
beta    :   (s, 1, k, m)
矢量化3:

x       :   (k, j, m, 1)
beta    :   (k, 1, m, s)
矢量化4:

x       :   (1, k, j, m)
beta    :   (s, k, 1, m)
代码


此代码中成本最高的两个操作是:

(a)
xBeta=bsxfun(@times,x,beta)

(b) exp_xBeta=exp(总和(xBeta,dimK))

其中
dimK
k
的维度

在(a)中,
bsxfun()
必须沿
s
维度扩展
x
,沿
j
维度扩展
beta
。当
s
比其他维度大得多时,我们应该看到向量化2和4的性能优势,因为它们将
s
指定为第一维度

j = 100; k = 100; m = 100; s = 1000;

Vectorized version 1 took 2.4719 seconds.
Vectorized version 2 took 2.1419 seconds.
Vectorized version 3 took 2.5071 seconds.
Vectorized version 4 took 2.0825 seconds.
相反,如果
s
是微不足道的,而
k
是巨大的,那么向量化3应该是最快的,因为它将
k
放在维度1中:

j = 10; k = 10000; m = 100; s = 1;

Vectorized version 1 took 0.0329 seconds.
Vectorized version 2 took 0.1442 seconds.
Vectorized version 3 took 0.0253 seconds.
Vectorized version 4 took 0.1415 seconds.
j = 10000; k = 10; m = 100; s = 1;

Vectorized version 1 took 0.0316 seconds.
Vectorized version 2 took 0.1402 seconds.
Vectorized version 3 took 0.0385 seconds.
Vectorized version 4 took 0.1608 seconds.
如果在上一个示例中交换
k
j
的值,则向量化1变得最快,因为
j
被分配给维度1:

j = 10; k = 10000; m = 100; s = 1;

Vectorized version 1 took 0.0329 seconds.
Vectorized version 2 took 0.1442 seconds.
Vectorized version 3 took 0.0253 seconds.
Vectorized version 4 took 0.1415 seconds.
j = 10000; k = 10; m = 100; s = 1;

Vectorized version 1 took 0.0316 seconds.
Vectorized version 2 took 0.1402 seconds.
Vectorized version 3 took 0.0385 seconds.
Vectorized version 4 took 0.1608 seconds.
但通常,当
k
j
接近时,
j>k
并不一定意味着矢量化1比矢量化3快,因为(a)和(b)中执行的操作不同

实际上,我经常不得不用
s>>m>k>j
运行计算。在这种情况下,以矢量化2或4对其进行排序似乎可以获得最佳结果:

    j = 10; k = 30; m = 100; s = 5000;

Vectorized version 1 took 0.4621 seconds.
Vectorized version 2 took 0.3373 seconds.
Vectorized version 3 took 0.3713 seconds.
Vectorized version 4 took 0.3533 seconds.

    j = 15; k = 50; m = 150; s = 5000;

Vectorized version 1 took 1.5416 seconds.
Vectorized version 2 took 1.2143 seconds.
Vectorized version 3 took 1.2842 seconds.
Vectorized version 4 took 1.2684 seconds.

外卖:如果
bsxfun()
必须沿比其他维度大得多的维度展开,请将该维度指定给维度1

参考另一个和

如果要使用
bsxfun
处理不同维度的矩阵,请确保矩阵的最大维度保留在第一维度中

j = 100; k = 100; m = 100; s = 1000;

Vectorized version 1 took 2.4719 seconds.
Vectorized version 2 took 2.1419 seconds.
Vectorized version 3 took 2.5071 seconds.
Vectorized version 4 took 2.0825 seconds.
这是我的名片