Matlab 单词中字母结尾的概率?
我有一个大约9000个小写单词的文本文件。我想找出每个单词中最后几个字母的概率字母的频率/字数 这是我的第一步:Matlab 单词中字母结尾的概率?,matlab,Matlab,我有一个大约9000个小写单词的文本文件。我想找出每个单词中最后几个字母的概率字母的频率/字数 这是我的第一步: function [ prma ] = problast() counts = zeros(1,26); %refer to cell index here to get alphabetic number of char s = regexp('abcdefghijklmnopqrstuvwxyz','.','match'); f = fopen('nouns.txt'); ns
function [ prma ] = problast()
counts = zeros(1,26);
%refer to cell index here to get alphabetic number of char
s = regexp('abcdefghijklmnopqrstuvwxyz','.','match');
f = fopen('nouns.txt');
ns = textscan(f,'%s');
fclose(f);
%8960 is the length of the file
for i =1:8960
c = substr(ns(i),-1,1);
num = find(s == c);
counts(num) = num;
end
prma = counts / 8960;
disp(prma);
这给了我一个错误:
Undefined function 'substr' for input arguments of type 'cell'.
有什么想法吗?文档中说结果是一个错误。如果您不熟悉cell数组,我强烈建议您阅读我提供的链接,但其长短不一之处在于,您的代码应该如下所示:
c = substr(ns{i},-1,1);
注意从{}到{}的更改-这是访问单元格数组元素的方式。不确定是什么导致了问题,但假设ns{i}包含您的字符串,这应该可以解决问题:
str = ns{i};
c = str(end);
如果这不起作用,那么首先就不太难根据ns创建变量str,因为您的问题不需要regexp。对于您的问题,一个非常简单有效的解决方案是:
clear;
close;
clc;
counts = zeros(1,26);
f = fopen('nouns.txt');
ns = textscan(f,'%s');
fclose(f);
for i =1:numel(ns{1})
c = ns{1}{i}(end);
counts('c'-96) = counts('c'-96)+1;
end
prma = counts / numel(ns{1});
disp(prma);
Columns 1 through 8
0 0 0 0 0 0 0 0
Columns 9 through 16
0 0 0 0 0 0.5000 0 0
Columns 17 through 24
0 0 0.5000 0 0 0 0 0
Columns 25 through 26
0 0
例如,如果noun.txt包含
paris
london
产出将是:
clear;
close;
clc;
counts = zeros(1,26);
f = fopen('nouns.txt');
ns = textscan(f,'%s');
fclose(f);
for i =1:numel(ns{1})
c = ns{1}{i}(end);
counts('c'-96) = counts('c'-96)+1;
end
prma = counts / numel(ns{1});
disp(prma);
Columns 1 through 8
0 0 0 0 0 0 0 0
Columns 9 through 16
0 0 0 0 0 0.5000 0 0
Columns 17 through 24
0 0 0.5000 0 0 0 0 0
Columns 25 through 26
0 0
那么:
f = fopen('nouns.txt');
ns = textscan(f, '%s');
fclose(f);
num = cellfun(@(x)(x(end) - 'a' + 1), ns{:}); %// Convert to 1-26
counts = hist(num, 1:26); %// Count occurrences
prob = counts / numel(ns{:}) %// Compute probabilities
谢谢大家的建议,我自己解决了这个问题,但我回去试了最后一个答案,结果很好。以下是我的想法:
%Keep track of counts
counts = zeros(1,26);
%Refer to this array to get alphabetic numeric value of character
s = regexp('abcdefghijklmnopqrstuvwxyz','.','match');
f = fopen('nouns.txt');
ns = textscan(f,'%s');
fclose(f);
%8960 = length of nouns.txt
for i =1:8960
%string from vs
str = ns{1}{i};
%last character in that string
c = str(length(str));
%index in s
temp = strfind(s,c);
index = find(not(cellfun('isempty',temp)));
counts(index) = counts(index)+1;
end
%Get probabilities
prma = counts / 8960;
disp(prma);
我投票支持每个人帮助我进行头脑风暴。我将括号改为大括号,但我得到的错误与上面相同。我做错什么了吗?textscan已经标记了单词,为什么要使用regexp呢?另外,我认为你需要[^a-z]*而不是[^a-z]在模式中…哦,我认为它应该是xend而不是x1,因为这个问题要求的是单词中字母的概率,而不是第一个。我已经冒昧地修改了你的解决方案…@eitantt是x1,当我使用regexp只获取lase字母时,我明白了。但是如果是这样的话,那么你就不需要索引x了。不管怎么说,+1对于直方图解决方案,当你发布它时,我正要自己建议它,这是最优雅的解决方案。人们可以争论使用for循环的效率。你可以用直方图来代替Shai的解决方案。