Python 3.x 数据帧（python）中的Z分数规范化_Python 3.x_Pandas_Spyder_Normalization

Python 3.x 数据帧（python）中的Z分数规范化

python-3.x pandas

Python 3.x 数据帧（python）中的Z分数规范化,python-3.x,pandas,spyder,normalization,Python 3.x,Pandas,Spyder,Normalization,我使用的是python3（spyder），我有一个表，它是对象“pandas.core.frame.DataFrame”的类型。我想对该表中的值进行z-score标准化（每个值减去其行的平均值并除以其行的sd），因此每行的平均值为0，sd=1。我尝试了两种方法第一种方法 from scipy.stats import zscore zetascore_table=zscore(table,axis=1) 第二种方法 rows=table.index.values columns=table.

我使用的是python3（spyder），我有一个表，它是对象“pandas.core.frame.DataFrame”的类型。我想对该表中的值进行z-score标准化（每个值减去其行的平均值并除以其行的sd），因此每行的平均值为0，sd=1。我尝试了两种方法

第一种方法

from scipy.stats import zscore
zetascore_table=zscore(table,axis=1)

第二种方法

rows=table.index.values
columns=table.columns
import numpy as np
for i in range(len(rows)):
    for j in range(len(columns)):
         table.loc[rows[i],columns[j]]=(table.loc[rows[i],columns[j]] - np.mean(table.loc[rows[i],]))/np.std(table.loc[rows[i],])
table

这两种方法似乎都有效，但当我检查每行的平均值和sd时，它不是假设的0和1，而是其他浮点值。我不知道哪一个可能是问题

提前感谢您的帮助

抱歉，考虑到这一点，我发现自己找到了另一种比for循环更容易计算z分数的方法（减去每行的平均值，并将结果除以该行的sd）：

table=table.T# need to transpose it since the functions work like that 
sd=np.std(table)
mean=np.mean(table)
numerator=table-mean #numerator in the formula for z-score 
z_score=numerator/sd
z_norm_table=z_score.T #we transpose again and we have the initial table but with all the 
#values z-scored by row.

我检查过了，现在每行的平均值是0或非常接近0，sd是1或非常接近1，所以这对我来说是可行的。对不起，我对编码没有什么经验，有时简单的事情需要大量的尝试，直到我找到解决方法。

下面的代码计算出一列中每个值的z分数。然后将z分数保存在一个新列中（这里称为“num_1_zscore”）。很容易做到

from scipy.stats import zscore
import pandas as pd

# Create a sample df
df = pd.DataFrame({'num_1': [1,2,3,4,5,6,7,8,9,3,4,6,5,7,3,2,9]})

# Calculate the zscores and drop zscores into new column
df['num_1_zscore'] = zscore(df['num_1'])

display(df)