Octave 如何计算一系列箱子的概率向量和观测计数向量？_Octave_Statistical Test

Octave 如何计算一系列箱子的概率向量和观测计数向量？

octave

Octave 如何计算一系列箱子的概率向量和观测计数向量？,octave,statistical-test,Octave,Statistical Test,我想检验一下假设，是否有30个事件符合泊松分布 #GNU Octave X = [8 0 0 1 3 4 0 2 12 5 1 8 0 2 0 1 9 3 4 5 3 3 4 7 4 0 1 2 1 2]; #30 observations bins = {0, 1, [2:3], [4:5], [6:20]}; #each bin can be single value or multiple values 我试图在这里使用Pearson的卡方统计，并对下面的函数进行编码。我想要一个泊松向量

我想检验一下假设，是否有30个事件符合泊松分布

#GNU Octave
X = [8 0 0 1 3 4 0 2 12 5 1 8 0 2 0 1 9 3 4 5 3 3 4 7 4 0 1 2 1 2]; #30 observations
bins = {0, 1, [2:3], [4:5], [6:20]}; #each bin can be single value or multiple values

我试图在这里使用Pearson的卡方统计，并对下面的函数进行编码。我想要一个泊松向量来包含每个箱子对应的泊松概率，并计算每个箱子的观测值。我觉得这个循环是相当多余和丑陋的。您能告诉我如何在没有循环的情况下重新计算函数，使整个计算更清晰、更矢量化吗

function result= poissonGoodnessOfFit(bins, observed)
  
  assert(iscell(bins), "bins should be a cell array");
  assert(all(cellfun("ismatrix", bins)) == 1, "bin entries either scalars or matrices");
  assert(ismatrix(observed) && rows(observed) == 1, "observed data should be a 1xn matrix");
  
  lambda_head = mean(observed); #poisson lambda parameter estimate
  k = length(bins); #number of bin groups
  n = length(observed); #number of observations
  
  poisson_probability = []; #variable for poisson probability for each bin
  observations = []; #variable for observation counts for each bin
  
  for i=1:k
    if isscalar(bins{1,i}) #this bin contains a single value
      poisson_probability(1,i) = poisspdf(bins{1, i}, lambda_head);
      observations(1, i) = histc(observed, bins{1, i});
    else  #this bin contains a range of values
      inner_bins = bins{1, i}; #retrieve the range
      inner_bins_k = length(inner_bins); #number of values inside
      inner_poisson_probability = []; #variable to store individual probability of each value inside this bin
      inner_observations = []; #variable to store observation counts of each value inside this bin
      for j=1:inner_bins_k
        inner_poisson_probability(1,j) = poisspdf(inner_bins(1, j), lambda_head);
        inner_observations(1, j) = histc(observed, inner_bins(1, j)); 
      endfor
      poisson_probability(1, i) = sum(inner_poisson_probability, 2); #assign over the sum of all inner probabilities
      observations(1, i) = sum(inner_observations, 2); #assign over the sum of all inner observation counts
    endif
  endfor
  
  expected = n .* poisson_probability; #expected observations if indeed poisson using lambda_head
  chisq = sum((observations - expected).^2 ./ expected, 2); #Pearson Chi-Square statistics 
  pvalue = 1 - chi2cdf(chisq, k-1-1); 
  result = struct("actual", observations, "expected", expected, "chi2", chisq, "pvalue", pvalue); 
  
  return;
  
endfunction

代码中有几点值得注意

首先，if块中的“标量”情况实际上与“范围”情况相同，因为标量只是1个元素的范围。因此，不需要对其进行特殊处理

第二，您不需要创建这样的显式子范围，您的bin组似乎适合用作索引以获得更大的结果（只要添加1以从0索引转换为1索引）

因此，我的方法是计算整个感兴趣领域内的预期和观察到的数字（从bin组推断），然后使用bin组本身作为1-指数来获得所需的子组，并相应地求和

下面是一个示例代码，使用两种语言的octave/matlab兼容子集编写：

function Result = poissonGoodnessOfFit( BinGroups, Observations )
% POISSONGOODNESSOFFIT( BinGroups, Observations) calculates the [... etc, etc.]

  pkg load statistics; % only needed in octave; for matlab buy statistics toolbox.
  assert( iscell( BinGroups ),   'Bins should be a cell array' );
  assert( all( cellfun( @ismatrix, BinGroups ) ) == 1,   'Bin entries either scalars or matrices' );
  assert( ismatrix( Observations ) && rows( Observations ) == 1,   'Observed data should be a 1xn matrix' );

% Define helpful variables
  RangeMin       = min( cellfun( @min, BinGroups ) );
  RangeMax       = max( cellfun( @max, BinGroups ) );
  Domain         = RangeMin : RangeMax;
  LambdaEstimate = mean( Observations );
  NBinGroups     = length( BinGroups );
  NObservations  = length( Observations );

% Get expected and observed numbers per 'bin' (i.e. discrete value) over the *entire* domain.
  Expected_Domain = NObservations * poisspdf( Domain, LambdaEstimate );
  Observed_Domain = histc( Observations, Domain );

% Apply BinGroup values as indices
  Expected_byBinGroup = cellfun( @(c) sum( Expected_Domain(c+1) ), BinGroups );
  Observed_byBinGroup = cellfun( @(c) sum( Observed_Domain(c+1) ), BinGroups );

% Perform a Chi-Square test on the Bin-wise Expected and Observed outputs
  O = Observed_byBinGroup; E = Expected_byBinGroup ; df = NBinGroups - 1 - 1;
  ChiSquareTestStatistic = sum( (O - E) .^ 2 ./ E );

  PValue = 1 - chi2cdf( ChiSquareTestStatistic, df );
  Result = struct( 'actual', O, 'expected', E, 'chi2', ChiSquareTestStatistic, 'pvalue', PValue );
end

使用您的示例运行将提供：

X = [8 0 0 1 3 4 0 2 12 5 1 8 0 2 0 1 9 3 4 5 3 3 4 7 4 0 1 2 1 2]; % 30 observations
bins = {0, 1, [2:3], [4:5], [6:20]}; % each bin can be single value or multiple values
Result = poissonGoodnessOfFit( bins, X )
% Result =
%  scalar structure containing the fields:
%    actual   = 6         5        8          6         5
%    expected = 1.2643    4.0037   13.0304    8.6522    3.0493
%    chi2 =  21.989
%    pvalue =  0.000065574

关于守则的一般性评论；最好编写可自我解释的代码，而不是在没有注释的情况下编写本身没有意义的代码。评论通常只用于解释“为什么”，而不是“如何”。

你离一个常见的嫌疑犯出现并从你的帖子中删除matlab标记只有几毫秒的距离。如果您确实关心文章标题中所暗示的matlab兼容性，请努力编写与matlab兼容的代码。@ABC:删除matlab标记是为了帮助您并防止潜在回答者打扰您。我已经看过很多次了，其中给出了一个MATLAB答案，OP是“哦，但我实际上使用的是倍频程，这在那里不起作用。”此外，如果你尝试在MATLAB中运行，你发布的代码会产生很多语法错误。仅仅因为Octave基于MATLAB语法，并且可以只使用两个软件包的一个子集编写代码，并不意味着这两个软件包是可交换的。@ABCAnalytics我也经常对删除MATLAB标记感到恼火，因为如果有人问（隐式或显式）MATLAB兼容的答案，那么我的观点是，这应该得到尊重，或者至少可以解释为什么matlab兼容性不是一个好主意。简单地删除matlab标签可能会很烦人，并且会对一些对matlab用户有用的好问题造成损害。然而，当移除器完全正确时，它同样令人恼火，就像本例中的Cris一样。如果您没有努力编写一个与matlab兼容的子集，那么这实际上与matlab无关。如果您将一个问题标记为“c或c++”，然后您的示例涉及c中不支持的代码，那么对于浪费时间阅读该问题的c用户来说，这将是非常恼人的。更重要的是，如果他们费心为你提供C兼容的答案，你回答“哦，我需要使用C++类感谢”。谢谢。让我花点时间研究一下你的代码。来自Java背景，我非常关注元素循环。我试着寻找像R的chisq.test（x，p=p）这样的东西，但是R函数使用bins-1表示df，因此df被估计的参数数关闭，我不知道如何使用Octave的chisquare\u test\u同质性（x，y，c）。

cellfun

只比for循环稍微有效一点，它不是矢量化操作。如果愿意，可以使用for循环，这只是更有效（但希望同样清晰）的语法。当像这样涉及单元阵列时，通常不可能进行矢量化。需要注意的主要一点是，如果

块是不必要的，那么您可以使用单个for块，而不需要使用内部for块。