Python 使用scipy进行关联_Python_Pandas_Scipy_Pearson Correlation

Python 使用scipy进行关联

python pandas

Python 使用scipy进行关联,python,pandas,scipy,pearson-correlation,Python,Pandas,Scipy,Pearson Correlation,我有两个变量，一个叫做极性，另一个叫做情绪。我想看看这两个变量之间是否存在相关性。极性可以取0到1之间的值（连续）情感可以取值-1、0和1。我尝试了以下方法： from scipy import stats pearson_coef, p_value = stats.pearsonr(df['polarity'], df['sentiment']) print(pearson_coef) 但我有以下错误： TypeError: unsupported operand type(s) f

我有两个变量，一个叫做极性，另一个叫做情绪。我想看看这两个变量之间是否存在相关性。

极性

可以取

到

之间的值（连续）<代码>情感可以取值

-1、0

和

。我尝试了以下方法：

from scipy import stats

pearson_coef, p_value = stats.pearsonr(df['polarity'], df['sentiment']) 
print(pearson_coef)

但我有以下错误：

TypeError: unsupported operand type(s) for +: 'float' and 'str'

值的示例：

polarity      sentiment
 
0.34            -1
0.12            -1
0.85             1
0.76             1
0.5              0
0.21             0

按照注释中的建议尝试将所有数据框列更改为数字数据类型：

df = df.astype(float)

调用pearsonr函数之前。

由于您处理的是

数据帧

，因此可以执行以下操作来了解列的

数据类型

：

>>> df.info() 

 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   polarity   6 non-null      float64
 1   sentiment  6 non-null      object 

>>> df['sentiment'] = df.sentiment.map(float) # or do : df = df.astype(float)

>>> df.info()

 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   polarity   6 non-null      float64
 1   sentiment  6 non-null      float64


>>> pearson_coef, p_value = stats.pearsonr(df['polarity'], df['sentiment']) 
>>> print(pearson_coef)
0.870679269711991

# Moreover, you can use pandas to estimate 'pearsonr' correlation matrix if you want to:
>>> df.corr()

           polarity  sentiment
polarity   1.000000   0.870679
sentiment  0.870679   1.000000

不可与测试数据一起复制。修复两列的

dtype

，其中一列不是数字。