在sklearn或其他python库中,有没有更有效的方法来规范化一组数据

在sklearn或其他python库中,有没有更有效的方法来规范化一组数据,python,scikit-learn,Python,Scikit Learn,我试图用L2规范化一组数据 我已经定义了一个函数来演示这一点(将扩展到多个功能) 这个功能似乎工作得很好 >>> data1 = np.random.normal(scale=10.0, size = 30) >>> stats.describe(data1) DescribeResult(nobs=30, minmax=(-14.480351639879657, 21.694340665659155), mean=1.7693402703870142, v

我试图用L2规范化一组数据

我已经定义了一个函数来演示这一点(将扩展到多个功能)

这个功能似乎工作得很好

>>> data1 = np.random.normal(scale=10.0, size = 30)
>>> stats.describe(data1)
DescribeResult(nobs=30, minmax=(-14.480351639879657, 21.694340665659155), mean=1.7693402703870142, variance=70.96823479863615, skewness=0.48446965640611006, kurtosis=0.029201481246492023)
>>> data2 = np.random.normal(scale=100.0, size = 30)
>>> stats.describe(data2)
DescribeResult(nobs=30, minmax=(-131.3594947316083, 198.39728417503383), mean=-7.255658382442095, variance=5255.736619957794, skewness=0.6343298691171217, kurtosis=0.4738823408913704)
>>> data1, data2 = fnormlz(data1, data2)
>>> print(stats.describe(data1))
DescribeResult(nobs=30, minmax=(-0.9973779251196154, 0.9881011078096066), mean=-0.05634450329772703, variance=0.46458361781960184, skewness=0.06081037409100871, kurtosis=-1.4984969471774237)
>>> print(stats.describe(data2))
DescribeResult(nobs=30, minmax=(-0.9896047983762021, 0.9884599298308269), mean=-0.03121868793266298, variance=0.565606751634083, skewness=0.04677252893105364, kurtosis=-1.655597055471202)
结果如预期。有没有更有效的方法

中的方差缩放可用于此?如果是,如何使用?

您可以使用


fnormlz_v2
可能就是您所需要的。但是zscore处理来自原始代码,可能会在数据中隐藏一些信息

import numpy as np
from sklearn.preprocessing import normalize
from scipy import stats

def fnormlz_v2(X):
    X = stats.zscore(X)
    X_norm, norm = normalize(X, norm='l2', axis=1, copy=True, return_norm=True)
    return X_norm

feature1 = np.random.normal(scale=10.0, size = 100)
feature2 = np.random.normal(scale=100.0, size = 100)
data = np.concatenate((feature1.reshape(-1,1) ,feature2.reshape(-1,1)), axis=1)

data_norm = fnormlz_v2(data)

for i in [data, data_norm]:
    print(stats.describe(i))
import numpy as np
from sklearn.preprocessing import normalize
from scipy import stats

a = np.random.normal(scale=10.0, size = 30)
b = data2 = np.random.normal(scale=100.0, size = 30)

c = np.concatenate((a.reshape(-1,1) ,b.reshape(-1,1)), axis=1)

d, norm = normalize(c, norm='l2', axis=1, copy=True, return_norm=True)

a_n = a / norm
b_n = b / norm

for x in [a, a_n, b, b_n]:
    print(stats.describe(x))


import numpy as np
from sklearn.preprocessing import normalize
from scipy import stats

def fnormlz_v2(X):
    X = stats.zscore(X)
    X_norm, norm = normalize(X, norm='l2', axis=1, copy=True, return_norm=True)
    return X_norm

feature1 = np.random.normal(scale=10.0, size = 100)
feature2 = np.random.normal(scale=100.0, size = 100)
data = np.concatenate((feature1.reshape(-1,1) ,feature2.reshape(-1,1)), axis=1)

data_norm = fnormlz_v2(data)

for i in [data, data_norm]:
    print(stats.describe(i))