Python 卡方分析-预期频率在(0,)处有一个零元素。错误

Python 卡方分析-预期频率在(0,)处有一个零元素。错误,python,machine-learning,scipy,statistics,chi-squared,Python,Machine Learning,Scipy,Statistics,Chi Squared,我正在处理数据,试图查看两个变量之间的关联,并在Python的Scipy包中使用了卡方分析 以下是两个变量的交叉表结果: pd.crosstab(data['loan_default'],data['id_proofs']) 结果: id_proofs 2 3 4 5 loan_default 0 167035 15232 273 3 1 46354 4202 54 1 如果我对相同的数据应用卡

我正在处理数据,试图查看两个变量之间的关联,并在Python的Scipy包中使用了卡方分析

以下是两个变量的交叉表结果:

pd.crosstab(data['loan_default'],data['id_proofs'])
结果:

   id_proofs    2   3   4   5
  loan_default              
    0   167035  15232   273 3
    1   46354   4202    54  1
如果我对相同的数据应用卡方检验,我会看到一个错误,即ValueError:预期频率的内部计算表在(0,)处有一个零元素

代码:

错误报告:

    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-154-63c6f49aec48> in <module>()
      1 from scipy.stats import chi2_contingency
----> 2 stat,p,dof,expec = chi2_contingency(data['loan_default'],data['id_proofs'])
      3 print(stat,p,dof,expec)

~/anaconda3/lib/python3.6/site-packages/scipy/stats/contingency.py in chi2_contingency(observed, correction, lambda_)
    251         zeropos = list(zip(*np.where(expected == 0)))[0]
    252         raise ValueError("The internally computed table of expected "
--> 253                          "frequencies has a zero element at %s." % (zeropos,))
    254 
    255     # The degrees of freedom

ValueError: The internally computed table of expected frequencies has a zero element at (0,).
---------------------------------------------------------------------------
ValueError回溯(最近一次调用上次)
在()
1来自scipy.stats导入chi2_意外事故
---->2 stat,p,dof,expec=chi2_偶然事件(数据['loan\u default',数据['id\u proof']))
3个打印(统计、p、dof、EXEC)
~/anaconda3/lib/python3.6/site-packages/scipy/stats/contractive.py in chi2_contractive(已观察、更正、lambda_)
251 zeropos=列表(zip(*np.where(预期==0))[0]
252 raise VALUE ERROR(“预期的内部计算表”)
-->253“频率在%s处有一个零元素。”%(零位,)
254
255#自由度
ValueError:预期频率的内部计算表在(0,)处有一个零元素。

问题的原因可能是什么?如何克服这个问题?

再看一看文档字符串。第一个参数
必须是列联表。您必须计算列联表(就像您在
pd.crosstab(数据['loan\u default',数据['id\u proof'])
中所做的那样)并将其传递给
chi2\u列联表

    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-154-63c6f49aec48> in <module>()
      1 from scipy.stats import chi2_contingency
----> 2 stat,p,dof,expec = chi2_contingency(data['loan_default'],data['id_proofs'])
      3 print(stat,p,dof,expec)

~/anaconda3/lib/python3.6/site-packages/scipy/stats/contingency.py in chi2_contingency(observed, correction, lambda_)
    251         zeropos = list(zip(*np.where(expected == 0)))[0]
    252         raise ValueError("The internally computed table of expected "
--> 253                          "frequencies has a zero element at %s." % (zeropos,))
    254 
    255     # The degrees of freedom

ValueError: The internally computed table of expected frequencies has a zero element at (0,).