Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/arrays/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/sql-server/25.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Arrays 如何在两个字符串之间匹配某些单词(在MATLAB中)?_Arrays_Matlab_Match_Words_Stop Words - Fatal编程技术网

Arrays 如何在两个字符串之间匹配某些单词(在MATLAB中)?

Arrays 如何在两个字符串之间匹配某些单词(在MATLAB中)?,arrays,matlab,match,words,stop-words,Arrays,Matlab,Match,Words,Stop Words,在以下两个字符串中,单词“rabbit”和“tree”是匹配的: str1 = ('rabbit is eating grass near a tree'); str2 = ('rabbit is sleeping under tree'); 假设cmp是一个声明用于比较两者的变量。我希望结果如下: cmp = 2 或者是表明两个词是匹配的。如何做到这一点?将其用于不区分大小写 CMP = strcmpi(string,string) 用于区分大小写 CMP = strcmpi(strin

在以下两个字符串中,单词“rabbit”和“tree”是匹配的:

str1 = ('rabbit is eating grass near a tree');
str2 = ('rabbit is sleeping under tree');
假设
cmp
是一个声明用于比较两者的变量。我希望结果如下:

cmp = 2

或者是表明两个词是匹配的。如何做到这一点?

将其用于不区分大小写

CMP = strcmpi(string,string)
用于区分大小写

CMP = strcmpi(string,string)
如果CMP为1,则它们相同;如果不是0,则它们相同

如果你不想删除空白,这使得比较更好,请首先修剪它们并进行比较

用于修剪

newString = strtrim(str)

我假设它们匹配的位置或顺序没有限制。首先,你需要把句子分成几个单词,去掉重复的单词,然后看看第二句中的单词是否与第一句中的单词相匹配

现在,如果排序真的很重要的话,它就不是那么简单了,但是你的问题没有指出这样的限制

str1= ('rabbit is eating grass near a tree');
str2= ('rabbit is sleeping under tree');
split1 = unique(regexp(str1,'\s','Split'));
split2 = unique(regexp(str2,'\s','Split'));

% Storing all words in the first sentence into a map for quick search/access
dict = containers.Map();
for ii = 1:numel(split1)
   dict(split1{ii}) = true; 
end

% create temp holding cell array, then loop through, looking to see if 
% any word in the second sentence is stored in the dictionary made from
% the first sentence. 
matches = {};
for jj = 1:numel(split2)
    if dict.isKey(split2{jj})
        matches = [matches,split2{jj}]; % not best but length initially unknown
    end
end

numMatches = numel(matches) % return the number of matches

变量
matches
将包含两个句子之间匹配的所有单词

根据另一个答案,将字符串拆分为唯一单词的单元格数组

str1= ('rabbit is eating grass near a tree');
str2= ('rabbit is sleeping under tree');

% split string into cell array of unique strings
split1 = regexp(str1,'\s','Split');
split2 = regexp(str2,'\s','Split');
或者,更高版本的MATLAB(IIRC R2013a)包含一个strsplit()函数,因此拆分可以减少到

split1 = strsplit(str1);
split2 = strsplit(str2);
然后使用intersect()函数获取两个单元格数组之间的公共元素数。添加一个长度以返回整数计数

cmp = length(intersect(split1,split2));

使用
ismember
只需一行即可

str1 = ('rabbit is eating grass near a tree');
str2 = ('rabbit is sleeping under a tree');

result = sum( ismember( strsplit(str1), strsplit(str2) ) )

result =

    4               %// I included also the article "a"
注意以下句子的结果是相同的:

str1 = ('rabbit is eating grass near a tree, an oak tree');
str2 = ('rabbit is sleeping under a tree and is dreaming about a tree');

result = sum( ismember( strsplit(str1), strsplit(str2) ) )
str1 = ('rabbit is eating grass near a tree, an oak tree');
str2 = ('rabbit is sleeping under a tree and is dreaming about a tree');
str3 = {'is','a'}

[x,y,z] = deal( strsplit(str1), strsplit(str2), str3 )
result = sum(ismember(x,y)) - sum(ismember(intersect(x,y),z))
       =       4            -            2           =        2
MZimmerman6建议的提前删除副本是不必要的


如果要筛选结果中不需要的字符串,可以引入另一个字符串单元格数组,但所有例外情况除外:

str3 = {'is','a'}
unwanted = sum( ismember( intersect( strsplit(str1), strsplit(str2) ), str3 ) )

unwanted =

     2

总之,它可能看起来像:

str1 = ('rabbit is eating grass near a tree, an oak tree');
str2 = ('rabbit is sleeping under a tree and is dreaming about a tree');

result = sum( ismember( strsplit(str1), strsplit(str2) ) )
str1 = ('rabbit is eating grass near a tree, an oak tree');
str2 = ('rabbit is sleeping under a tree and is dreaming about a tree');
str3 = {'is','a'}

[x,y,z] = deal( strsplit(str1), strsplit(str2), str3 )
result = sum(ismember(x,y)) - sum(ismember(intersect(x,y),z))
       =       4            -            2           =        2
“疯狂”方法,可能类似于,但未经测试-

作用-

function out = cell2_matchind(split1,split2)

c1 = char(split1)-'0';
c2 = char(split2)-'0';
if size(c1,2)<size(c2,2)
    c1 = [c1 -16.*ones(size(c1,1),size(c2,2)-size(c1,2))];
else
    c2 = [c2 -16.*ones(size(c2,1),size(c1,2)-size(c2,2))];
end
out = any(squeeze(sum(bsxfun(@eq,permute(c1,[3 2 1]),c2),2))==size(c2,2),2);
输出-

cw_split2 = 
    'rabbit'    'is'    'sleeping'    'tree'    'and'    'will'    'be'    'eating'    'the'

cw_split2_nostopwd = 
    'rabbit'    'sleeping'    'tree'    'eating'

cmp =
     4


我建议尝试探索MATLAB的字符串搜索/比较函数,尝试一下这个问题,然后再回答一些更具体的问题。我希望返回一个整数值,如果上面的字符串中有3个单词匹配,那么希望输出为3,因为这将有助于我执行if-else条件,这将有助于我今后的工作。cmp不应该是3,因为单词“is”在示例中也匹配?是的,它将是3,但我将尝试删除停止词。您将自己删除它,还是MATLAB代码必须删除它?因为如果必须使用MATLAB代码,这将意味着再次通过比较。谢谢您的帮助,先生,但我希望返回一个整数值,如果上面的字符串中有3个单词匹配,则希望输出为3,因为这将帮助我执行if-else条件,这将对我的未来有所帮助work@user3416063认真地你可以在那上面做个小动作。但如果你想让我补充一点,当然,它是固定的。我要求您在发布问题/评论之前,请研究如何理解语言,这些问题/评论可以很容易地解决,不需要复杂的映射和循环。函数intersect将返回两个单元格字符串数组中的公共元素。谢谢您,先生,@MZimmerman6,我是MATLAB新手,他们没有时间在MATLAB上做一些研究,因为我完成工作的时间有限。将来,我一定会这么做。@Adrian我忘了intersect函数了。这是我能想到的第一件事,很可能类似于intersect在后台所做的事情。还有,如果你想用另一种语言做类似的事情,这提供了一个在线性时间内执行的非常快速和简单的解决方案。这不会比较整个字符串而不是单个单词吗?它应该比较整个字符串为什么不?因为它们在匹配单词的计数之后-而不是整个字符串是否匹配。谢谢,先生,非常有用。我在谷歌上搜索了很多,想找到如何匹配给定字符串中的特定单词,但找不到任何解决方案。这真的很有帮助。+1简短而清晰。我认为
unique
是不必要的,因为
intersect
自动只考虑唯一值。正如
ismember
所做的那样,与我的示例进行比较。谢谢-你是对的,不需要唯一性。让它更简单!谢谢您,先生,这很有帮助,在此之前,我一直在努力学习如何使用setdiff命令。谢谢您,先生,非常有帮助。@user3416063阅读了有关stackoverflow的一些内容-和