Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/typescript/9.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Matlab 确定矩阵A是否为矩阵B的子集_Matlab_Matrix - Fatal编程技术网

Matlab 确定矩阵A是否为矩阵B的子集

Matlab 确定矩阵A是否为矩阵B的子集,matlab,matrix,Matlab,Matrix,对于矩阵,例如 A = [... 12 34 67; 90 78 15; 10 71 24]; 我们如何有效地确定它是否是较大矩阵的子集 B = [... 12 34 67; % found 89 67 45; 90 78 15; % found 10 71 24; % found, so A is

对于矩阵,例如

A = [...
    12 34 67;
    90 78 15;
    10 71 24];
我们如何有效地确定它是否是较大矩阵的子集

B = [...
    12 34 67;                        % found
    89 67 45;
    90 78 15;                        % found  
    10 71 24;                        % found, so A is subset of B. 
    54 34 11];
以下是条件:

  • 所有数字都是整数
  • 矩阵非常大,即,行#>100000,列#可能在1到10之间变化(A和B相同)
编辑: 对于这个问题,
ismember
只要调用几次就可以了。我最初的印象是由于以前的经验,
ismember
在嵌套循环中被多次调用,导致性能最差

clear all; clc
n = 200000;
k = 10;
B = randi(n,n,k);
f = randperm(n);
A = B(f(1:1000),:);
tic
assert(sum(ismember(A,B,'rows')) == size(A,1));
toc
tic
assert(all(any(all(bsxfun(@eq,B,permute(A,[3,2,1])),2),1))); %user2999345
toc
其结果是:

Elapsed time is 1.088552 seconds.
Elapsed time is 12.154969 seconds.
以下是更多的基准:

clear all; clc
n = 20000;
f = randperm(n);
k = 10;
t1 = 0;
t2 = 0;
t3 = 0;
for i=1:7
    B = randi(n,n,k);
    A = B(f(1:n/10),:);
    %A(100,2) = 0;                                      % to make A not submat of B
    tic
    b = sum(ismember(A,B,'rows')) == size(A,1);
    t1 = t1+toc;
    assert(b);
    tic
    b = ismember_mex(A,sortrows(B));
    t2 = t2+toc;
    assert(b);
    tic
    b = issubmat(A,B);
    t3 = t3+toc;
    assert(b);
end

                             George's       skm's
                ismember | ismember_mex | issubmat
n=20000,k=10      0.6326      0.1064      11.6899
n=1000,k=100      0.2652      0.0155       0.0577
n=1000,k=1000     1.1705      0.1582       0.2202
n=1000,k=10000   13.2470      2.0033       2.6367
*issubmat eats RAM when n or k is over 10000!
*issubmat(A,B), A is being checked as submat of B. 

对于小矩阵
ismember
可能就足够了。 用法:
ismember(B,A,'rows')


我把这个答案放在这里,强调需要更高性能的解决方案。只有在没有更好的解决方案时,我才会接受这个答案。

对于小矩阵
ismember
可能就足够了。 用法:
ismember(B,A,'rows')


我把这个答案放在这里,强调需要更高性能的解决方案。只有在没有更好的解决方案的情况下,我才会接受这个答案。

使用
ismember
,如果
a
的一行在
B
中出现两次,而另一行缺失,可能会错误地指示
a
B
的成员。如果
A
B
的行不需要顺序相同,则以下解决方案适用。但是,我还没有对大型矩阵测试它的性能

A = [...
34 12 67;
90 78 15;
10 71 24];
B = [...
34 12 67;                        % found
89 67 45;
90 78 15;                        % found  
10 71 24;                        % found, so A is subset of B. 
54 34 11];
A = permute(A,[3 2 1]);
rowIdx = all(bsxfun(@eq,B,A),2);
colIdx = any(rowIdx,1);
isAMemberB = all(colIdx);

使用
ismember
,如果
a
的一行在
B
中出现两次,而另一行缺失,则可能错误地指示
a
B
的成员。如果
A
B
的行不需要顺序相同,则以下解决方案适用。但是,我还没有对大型矩阵测试它的性能

A = [...
34 12 67;
90 78 15;
10 71 24];
B = [...
34 12 67;                        % found
89 67 45;
90 78 15;                        % found  
10 71 24;                        % found, so A is subset of B. 
54 34 11];
A = permute(A,[3 2 1]);
rowIdx = all(bsxfun(@eq,B,A),2);
colIdx = any(rowIdx,1);
isAMemberB = all(colIdx);

您已经说过了列数您已经说过了列数看来ismember很难打败,至少使用MATLAB代码是这样。我创建了一个可以使用MEX编译器的C实现

#include "mex.h"

#if MX_API_VER < 0x07030000
typedef int mwIndex;
typedef int mwSize;
#endif /* MX_API_VER */

#include <math.h>
#include <stdlib.h>
#include <string.h>

int ismember(const double *y, const double *x, int yrow, int xrow, int ncol);

void mexFunction(int nlhs, mxArray *plhs[],
        int nrhs, const mxArray *prhs[])
{
    mwSize xcol, ycol, xrow, yrow;

    /* output data */
    int* result;

    /* arguments */
    const mxArray* y;
    const mxArray* x;

    if (nrhs != 2)
    {
        mexErrMsgTxt("2 input required.");
    }

    y = prhs[0];
    x = prhs[1];
    ycol = mxGetN(y);
    yrow = mxGetM(y);
    xcol = mxGetN(x);
    xrow = mxGetM(x);

    /* The first input must be a sparse matrix. */
    if (!mxIsDouble(y) || !mxIsDouble(x))
    {
        mexErrMsgTxt("Input must be of type 'double'.");
    }
    if (xcol != ycol)
    {
        mexErrMsgTxt("Inputs must have the same number of columns");
    }

    plhs[0] = mxCreateLogicalMatrix(1, 1);
    result = mxGetPr(plhs[0]);
    *result = ismember(mxGetPr(y), mxGetPr(x), yrow, xrow, ycol);
}

int ismemberinner(const double *y, int idx, const double *x, int yrow, int xrow, int ncol) {
    int from, to, i;
    from = 0;
    to = xrow-1;

    for(i = 0; i < ncol; ++i) {
        // Perform binary search
        double yi = *(y + i * yrow + idx);
        double *curx = x + i * xrow;
        int l = from;
        int u = to;
        while(l <= u) {
            int mididx = l + (u-l)/2;
            if(yi < curx[mididx]) {
                u = mididx-1;
            }
            else if(yi > curx[mididx]) {
                l = mididx+1;
            }
            else {
                // This can be further optimized by performing additional binary searches
                for(from = mididx; from > l && curx[from-1] == yi; --from);
                for(to = mididx; to < u && curx[to+1] == yi; ++to);
                break;
            }
        }
        if(l > u) {
            return 0;
        }
    }
    return 1;
}

int ismember(const double *y, const double *x, int yrow, int xrow, int ncol) {
    int i;
    for(i = 0; i < yrow; ++i) {
        if(!ismemberinner(y, i, x, yrow, xrow, ncol)) {
            return 0;
        }
    }
    return 1;
}
它可以被称为:

ismember_mex(x, sortrows(x))
首先,它假设矩阵的列具有相同的大小。它首先对较大矩阵的行进行排序(在本例中是x,函数的第二个参数)。然后,使用一种二进制搜索来识别较小矩阵(y)的行是否包含在x中。这是针对y的每一行分别进行的(请参见
ismember
C函数)。 对于给定的y行,它从第一个条目开始,使用二进制搜索查找与x的第一列匹配的索引范围(使用
from
to
变量)。对其余条目重复此操作,除非未找到某个值,在这种情况下,它终止并返回0

我试着在MATLAB中实现这个想法,但效果不是很好。关于性能,我发现:(a)如果存在不匹配,它通常比
ismember
快得多。(b)如果x和y中的值范围较大,它也比
ismember
快,并且(c)如果所有内容都匹配,并且x和y中可能的值的数量较小(例如小于1000),在某些情况下,
ismember
可能会更快。 最后,我想指出,C实现的某些部分可能会进一步优化

编辑1

我修复了警告并进一步改进了功能

#include "mex.h"
#include <math.h>
#include <stdlib.h>
#include <string.h>

int ismember(const double *y, const double *x, unsigned int nrowy, unsigned int nrowx, unsigned int ncol);

void mexFunction(int nlhs, mxArray *plhs[],
        int nrhs, const mxArray *prhs[])
{
    unsigned int xcol, ycol, nrowx, nrowy;

    /* arguments */
    const mxArray* y;
    const mxArray* x;

    if (nrhs != 2)
    {
        mexErrMsgTxt("2 inputs required.");
    }

    y = prhs[0];
    x = prhs[1];
    ycol = (unsigned int) mxGetN(y);
    nrowy = (unsigned int) mxGetM(y);
    xcol = (unsigned int) mxGetN(x);
    nrowx = (unsigned int) mxGetM(x);

    /* The first input must be a sparse matrix. */
    if (!mxIsDouble(y) || !mxIsDouble(x))
    {
        mexErrMsgTxt("Input must be of type 'double'.");
    }
    if (xcol != ycol)
    {
        mexErrMsgTxt("Inputs must have the same number of columns");
    }

    plhs[0] = mxCreateLogicalScalar(ismember(mxGetPr(y), mxGetPr(x), nrowy, nrowx, ycol));
}

int ismemberinner(const double *y, const double *x, unsigned int nrowy, unsigned int nrowx, unsigned int ncol) {
    unsigned int from = 0, to = nrowx-1, i;

    for(i = 0; i < ncol; ++i) {
        // Perform binary search
        const double yi = *(y + i * nrowy);
        const double *curx = x + i * nrowx;
        unsigned int l = from;
        unsigned int u = to;
        while(l <= u) {
            const unsigned int mididx = l + (u-l)/2;
            const double midx = curx[mididx];
            if(yi < midx) {
                u = mididx-1;
            }
            else if(yi > midx) {
                l = mididx+1;
            }
            else {
                {
                    // Binary search to identify smallest index of x that equals yi
                    // Equivalent to for(from = mididx; from > l && curx[from-1] == yi; --from)
                    unsigned int limit = mididx;
                    while(curx[from] != yi) {
                        const unsigned int mididx = from + (limit-from)/2;
                        if(curx[mididx] < yi) {
                            from = mididx+1;
                        }
                        else {
                            limit = mididx-1;
                        }
                    }
                }
                {
                    // Binary search to identify largest index of x that equals yi
                    // Equivalent to for(to = mididx; to < u && curx[to+1] == yi; ++to);
                    unsigned int limit = mididx;
                    while(curx[to] != yi) {
                        const unsigned int mididx = limit + (to-limit)/2;
                        if(curx[mididx] > yi) {
                            to = mididx-1;
                        }
                        else {
                            limit = mididx+1;
                        }
                    }
                }
                break;
            }
        }
        if(l > u) {
            return 0;
        }
    }
    return 1;
}

int ismember(const double *y, const double *x, unsigned int nrowy, unsigned int nrowx, unsigned int ncol) {
    unsigned int i;
    for(i = 0; i < nrowy; ++i) {
        if(!ismemberinner(y + i, x, nrowy, nrowx, ncol)) {
            return 0;
        }
    }
    return 1;
}
#包括“mex.h”
#包括
#包括
#包括
int ismember(常数双精度*y,常数双精度*x,无符号整数nrowy,无符号整数nrowx,无符号整数ncol);
void MEX函数(整数nlhs,mxArray*plhs[],
整数nrhs,常量mxArray*prhs[]
{
无符号整数xcol、ycol、nrowx、nrowy;
/*论据*/
常量MX数组*y;
常量MX数组*x;
如果(nrhs!=2)
{
MEXERMSGSTXT(“需要2个输入”);
}
y=prhs[0];
x=prhs[1];
ycol=(无符号整数)mxGetN(y);
nrowy=(无符号整数)mxGetM(y);
xcol=(无符号整数)mxGetN(x);
nrowx=(无符号整数)mxGetM(x);
/*第一个输入必须是稀疏矩阵*/
如果(!mxidouble(y)| |!mxidouble(x))
{
MEXERMSGSTXT(“输入必须为'double'类型”);
}
如果(xcol!=ycol)
{
MEXERMSGSTXT(“输入必须具有相同的列数”);
}
plhs[0]=mxCreateLogicalScalar(ismember(mxGetPr(y)、mxGetPr(x)、nrowy、nrowx、ycol));
}
int ismemberinner(常数双精度*y,常数双精度*x,无符号整数nrowy,无符号整数nrowx,无符号整数ncol){
无符号整数from=0,to=nrowx-1,i;
对于(i=0;il&&curx[from-1]==yi;--from)
无符号整数限制=mididx;
while(curx[from]!=yi){
const unsigned int mididx=from+(limit from)/2;
if(curx[middx]mex -O ismember_mex.c
ismember_mex(x, sortrows(x))
#include "mex.h"
#include <math.h>
#include <stdlib.h>
#include <string.h>

int ismember(const double *y, const double *x, unsigned int nrowy, unsigned int nrowx, unsigned int ncol);

void mexFunction(int nlhs, mxArray *plhs[],
        int nrhs, const mxArray *prhs[])
{
    unsigned int xcol, ycol, nrowx, nrowy;

    /* arguments */
    const mxArray* y;
    const mxArray* x;

    if (nrhs != 2)
    {
        mexErrMsgTxt("2 inputs required.");
    }

    y = prhs[0];
    x = prhs[1];
    ycol = (unsigned int) mxGetN(y);
    nrowy = (unsigned int) mxGetM(y);
    xcol = (unsigned int) mxGetN(x);
    nrowx = (unsigned int) mxGetM(x);

    /* The first input must be a sparse matrix. */
    if (!mxIsDouble(y) || !mxIsDouble(x))
    {
        mexErrMsgTxt("Input must be of type 'double'.");
    }
    if (xcol != ycol)
    {
        mexErrMsgTxt("Inputs must have the same number of columns");
    }

    plhs[0] = mxCreateLogicalScalar(ismember(mxGetPr(y), mxGetPr(x), nrowy, nrowx, ycol));
}

int ismemberinner(const double *y, const double *x, unsigned int nrowy, unsigned int nrowx, unsigned int ncol) {
    unsigned int from = 0, to = nrowx-1, i;

    for(i = 0; i < ncol; ++i) {
        // Perform binary search
        const double yi = *(y + i * nrowy);
        const double *curx = x + i * nrowx;
        unsigned int l = from;
        unsigned int u = to;
        while(l <= u) {
            const unsigned int mididx = l + (u-l)/2;
            const double midx = curx[mididx];
            if(yi < midx) {
                u = mididx-1;
            }
            else if(yi > midx) {
                l = mididx+1;
            }
            else {
                {
                    // Binary search to identify smallest index of x that equals yi
                    // Equivalent to for(from = mididx; from > l && curx[from-1] == yi; --from)
                    unsigned int limit = mididx;
                    while(curx[from] != yi) {
                        const unsigned int mididx = from + (limit-from)/2;
                        if(curx[mididx] < yi) {
                            from = mididx+1;
                        }
                        else {
                            limit = mididx-1;
                        }
                    }
                }
                {
                    // Binary search to identify largest index of x that equals yi
                    // Equivalent to for(to = mididx; to < u && curx[to+1] == yi; ++to);
                    unsigned int limit = mididx;
                    while(curx[to] != yi) {
                        const unsigned int mididx = limit + (to-limit)/2;
                        if(curx[mididx] > yi) {
                            to = mididx-1;
                        }
                        else {
                            limit = mididx+1;
                        }
                    }
                }
                break;
            }
        }
        if(l > u) {
            return 0;
        }
    }
    return 1;
}

int ismember(const double *y, const double *x, unsigned int nrowy, unsigned int nrowx, unsigned int ncol) {
    unsigned int i;
    for(i = 0; i < nrowy; ++i) {
        if(!ismemberinner(y + i, x, nrowy, nrowx, ncol)) {
            return 0;
        }
    }
    return 1;
}