matlab将文件解析为单元数组_Matlab_File Io_Matrix_Textscan

matlab将文件解析为单元数组

matlab file-io matrix

matlab将文件解析为单元数组,matlab,file-io,matrix,textscan,Matlab,File Io,Matrix,Textscan,我在matlab中有一个以下格式的文件： user_id_a: (item_1,rating),(item_2,rating),...(item_n,rating) user_id_b: (item_25,rating),(item_50,rating),...(item_x,rating) .... .... 因此，每一行都有由冒号分隔的值，冒号左边的值是表示用户id的数字，右边的值是项目id（也是数字）和等级（数字不是浮动）的元组我想将这些数据读入matlab单元格数组，或者更好地将其最

我在matlab中有一个以下格式的文件：

user_id_a: (item_1,rating),(item_2,rating),...(item_n,rating)
user_id_b: (item_25,rating),(item_50,rating),...(item_x,rating)
....
....

因此，每一行都有由冒号分隔的值，冒号左边的值是表示用户id的数字，右边的值是项目id（也是数字）和等级（数字不是浮动）的元组

我想将这些数据读入matlab单元格数组，或者更好地将其最终转换为稀疏矩阵，其中用户_id表示行索引，项_id表示列索引，并将相应的评级存储在该数组索引中。（这将起作用，因为我预先知道我的宇宙中的用户和项目的数量，所以ID不能大于此）

任何帮助都将不胜感激

到目前为止，我已按如下方式尝试了textscan功能：

c = textscan(f,'%d %s','delimiter',':')   %this creates two cells one with all the user_ids
                                          %and another with all the remaining string values.

现在，如果我尝试执行类似于str2mat（c{2}）的操作，它会工作，但它也会在矩阵中存储“（”和“'）”字符。我想以我上面描述的方式存储稀疏矩阵

我对matlab相当陌生，希望在这件事上能得到任何帮助

f = fopen('data.txt','rt'); %// data file. Open as text ('t')
str = textscan(f,'%s'); %// gives a cell which contains a cell array of strings
str = str{1}; %// cell array of strings
r = str(1:2:end);
r = cellfun(@(s) str2num(s(1:end-1)), r); %// rows; numeric vector
pairs = str(2:2:end); 
pairs = regexprep(pairs,'[(,)]',' ');
pairs = cellfun(@(s) str2num(s(1:end-1)), pairs, 'uni', 0);
%// pairs; cell array of numeric vectors
cols = cellfun(@(x) x(1:2:end), pairs, 'uni', 0);
%// columns; cell array of numeric vectors
vals = cellfun(@(x) x(2:2:end), pairs, 'uni', 0);
%// values; cell array of numeric vectors
rows = arrayfun(@(n) repmat(r(n),1,numel(cols{n})), 1:numel(r), 'uni', 0);
%// rows repeated to match cols; cell array of numeric vectors
matrix = sparse([rows{:}], [cols{:}], [vals{:}]);
%// concat rows, cols and vals into vectors and use as inputs to sparse

对于示例文件

1: (1,3),(2,4),(3,5)
10: (1,1),(2,2)

这将产生以下稀疏矩阵：

matrix =
   (1,1)        3
  (10,1)        1
   (1,2)        4
  (10,2)        2
   (1,3)        5

我认为较新版本的Matlab有一个stringsplit函数，这使得这种方法过于简单，但是下面的方法即使不能很快地工作，也可以工作。如图所示，它将文件拆分为userid和“其他内容”，初始化一个大的空矩阵，然后迭代其他内容，将其拆分并放置在矩阵中的正确位置

（出于某种原因，当我打开这个时，我没有看到前面的答案——它比这个更复杂，尽管这可能会以缓慢为代价更容易理解）。如果间隔不一致，我将

\s*

放入正则表达式中，但在数据健全性检查方面不会执行太多操作。输出是完整数组，如果需要，可以将其转换为稀疏数组

% matlab_test.txt:
% 101: (1,42),(2,65),(5,0)
% 102: (25,78),(50,12),(6,143),(2,123)
% 103: (23,6),(56,3)

clear all;
fclose('all');
% your path will vary, of course
file = '<path>/matlab_test.txt';
f = fopen(file);
c = textscan(f,'%d %s','delimiter',':');
celldisp(c)
uids = c{1}
tuples = c{2}

% These are stated as known
num_users = 3;
num_items = 40;

desired_array = zeros(num_users, num_items);
expression = '\((\d+)\s*,\s*(\d+)\)'
% Assuming length(tuples) == num_users for simplicity
for k = 1:num_users
    uid = uids(k)
    tokens = regexp(tuples{k}, expression, 'tokens');
    for l = 1:length(tokens)
        item_id = str2num(tokens{l}{1})
        rating = str2num(tokens{l}{2})
        desired_array(uid, item_id) = rating;
    end
end

%matlab\u test.txt：
% 101: (1,42),(2,65),(5,0)
% 102: (25,78),(50,12),(6,143),(2,123)
% 103: (23,6),(56,3)
清除所有；
fclose（“全部”）；
%当然，你的道路会有所不同
文件='/matlab_test.txt'；
f=fopen（文件）；
c=文本扫描（f，'%d%s'，'delimiter'，'：'）；
celldisp（c）
uids=c{1}
元组=c{2}
%这些是已知的
用户数=3；
项目数=40；
所需的_数组=零（num_用户、num_项目）；
表达式=“\（\d+）\s*，\s*（\d+）”
%为简单起见，假设长度（元组）=num_用户
对于k=1:num_用户
uid=uids（k）
tokens=regexp（元组{k}，表达式，'tokens'）；
对于l=1：长度（标记）
item_id=str2num（令牌{l}{1}）
rating=str2num（标记{l}{2}）
所需的_数组（uid，item_id）=额定值；
结束
结束

每行（项目、评级）对的数量是否固定？不，它是可变的，但每个元组由逗号“，”分隔