String 单元阵列的单元并集_String_Matlab_Cell_Union

String 单元阵列的单元并集

string matlab

String 单元阵列的单元并集,string,matlab,cell,union,String,Matlab,Cell,Union,我正在寻找两个单元格数组和字符串单元格数组的并集。例如： A = {{'one' 'two'};{'three' 'four'};{'five' 'six'}}; B = {{'five' 'six'};{'seven' 'eight'};{'nine' 'ten'}}; 我想得到这样的东西： C = {{'one' 'two'};{'three' 'four'};{'five' 'six'};{'seven' 'eight'};{'nine' 'ten'}}; 但是当我使用C=union（

我正在寻找两个单元格数组和字符串单元格数组的并集。例如：

A = {{'one' 'two'};{'three' 'four'};{'five' 'six'}};
B = {{'five' 'six'};{'seven' 'eight'};{'nine' 'ten'}};

我想得到这样的东西：

C = {{'one' 'two'};{'three' 'four'};{'five' 'six'};{'seven' 'eight'};{'nine' 'ten'}};

但是当我使用

C=union（A，B）

MATLAB返回一个错误，说：

类单元格的输入A和类单元格的输入B必须是字符串的单元格数组，除非其中一个是字符串

有人知道如何用一种简单的方式做这样的事情吗？我将非常感激

另一种方法是：以字符串单元格数组以外的任何其他方式创建分隔字符串的单元格数组也很有用，但据我所知，这是不可能的

谢谢大家!

Union

似乎与单元格的单元格数组不兼容。因此，我们需要寻找一些解决办法

一种方法是从垂直连接的A和B中获取数据。然后，沿着每一列为字符串的每个单元格分配一个唯一的ID。然后，这些ID可以组合成一个双数组，这样就可以使用unique with“rows”选项获得所需的输出。这正是在这里实现的

%// Slightly complicated input for safest verification of results
A = {{'three' 'four'};
    {'five' 'six'};
    {'five' 'seven'};
    {'one' 'two'}};

B = {{'seven' 'eight'};
    {'five' 'six'};
    {'nine' 'ten'};
    {'three' 'six'};};

t1 = [A ; B] %// concatenate all cells from A and B vertically
t2 = vertcat(t1{:}) %// Get all the cells of strings from A and B

t22 = mat2cell(t2,size(t2,1),ones(1,size(t2,2)));
[~,~,row_ind] = cellfun(@(x) unique(x,'stable'),t22,'uni',0)
mat1 = horzcat(row_ind{:})

[~,ind] = unique(mat1,'rows','stable')
out1 = t2(ind,:) %// output as a cell array of strings, used for verification too
out = mat2cell(out1, ones(1,size(out1,1)),size(out1,2)) %//desired output

输出-

out1 = 
    'three'    'four' 
    'five'     'six'  
    'five'     'seven'
    'one'      'two'  
    'seven'    'eight'
    'nine'     'ten'  
    'three'    'six'

我的代码的作用：它建立了一个所有单词的列表，然后这个列表被用来建立一个矩阵，其中包含了行和它们包含的单词之间的相关性。1=匹配第一个字，2=匹配第二个字。最后，在这个数值矩阵上，可以应用

unique

来获得索引

包括我的更新，现在每个单元格的2个字是硬编码的。为了摆脱这一限制，有必要用更通用的实现替换匿名函数（

@（x）（ismember（allWords，x{1}）+2*ismember（allWords，x{2}））

）。可能再次使用cellfun。

您可以使用m倍n的字符串单元格数组，而不是字符串单元格数组。如果所有行的列数都不相同（尽管在您的示例中它们是相同的），则可以使用空字符串填充。使用m倍n单元阵列可能会简化您想要做的事情。更好的做法是每行使用一个字符串，由带有分隔符（保留字符）的单个字符串组成。然后你可以使用

union

我必须修改我的其他函数才能像这样工作，但我认为这实际上是最好的解决方案！非常感谢。在实现Daniel的代码并意识到它认为{'1''two'}等于{'two''one'}后，我尝试了这个方法，这对我不起作用。很抱歉没有指定不应该发生这种情况。好主意。

C=[A;B]    
allWords=unique([A{:};B{:}])
F=cell2mat(cellfun(@(x)(ismember(allWords,x{1})+2*ismember(allWords,x{2}))',C,'uni',false))
[~,uniqueindices,~]=unique(F,'rows')

C(sort(uniqueindices))