Python scitkit学习中的意外行为'；s归一化器_Python_Normalization_Scikit Learn

Python scitkit学习中的意外行为'；s归一化器

python scikit-learn

Python scitkit学习中的意外行为'；s归一化器,python,normalization,scikit-learn,Python,Normalization,Scikit Learn,我有一个pandas数组，想要规范化一个列，这里是'col3' 我的数据是这样的： test1['col3'] 1 73.506 2 73.403 3 74.038 4 73.980 5 74.295 6 72.864 7 74.013 8 73.748 9 74.536 10 74.926 11 74.355 12 75.577 13 75.563 Name: col3, dtype: float6

我有一个pandas数组，想要规范化一个列，这里是'col3'

我的数据是这样的：

test1['col3']
1     73.506
2     73.403
3     74.038
4     73.980
5     74.295
6     72.864
7     74.013
8     73.748
9     74.536
10    74.926
11    74.355
12    75.577
13    75.563
Name: col3, dtype: float64

当我使用normalizer函数时（我希望我只是错误地使用了它），我得到：

但是对于规范化（不是标准化），我通常希望将值缩放到0到1的范围，对吗？例如，通过方程

$X'=\frac{X\-\\X{min}{X{max}-X{min}$

（嗯，不知怎的，乳胶今天不起作用了…）

所以，当我“手动”操作时，我会得到完全不同的结果（但我预期的结果）

这并不是sklearn.preprocessing.normalize所做的全部工作。事实上，它将其输入向量缩放为单位L2范数（或L1范数，如果需要的话），即

（

normalize

使用比

np.linalg

更快的方法计算范数，并优雅地处理零，但在其他方面，这两个表达式是相同的。）

您所期待的是在scikit中学习。

谢谢，这就是我要找的！

from sklearn import preprocessing
preprocessing.normalize(test1['col3'][:, np.newaxis], axis=0)

array([[ 0.27468327],
       [ 0.27429837],
       [ 0.27667129],
       [ 0.27645455],
       [ 0.27763167],
       [ 0.27228419],
       [ 0.27657787],
       [ 0.27558759],
       [ 0.27853226],
       [ 0.27998964],
       [ 0.27785588],
       [ 0.28242235],
       [ 0.28237003]])

(test1['col3'] - test1['col3'].min()) / (test1['col3'].max() - test1['col3'].min())


1     0.236638
2     0.198673
3     0.432731
4     0.411353
5     0.527460
6     0.000000
7     0.423516
8     0.325839
9     0.616292
10    0.760044
11    0.549576
12    1.000000
13    0.994840
Name: col3, dtype: float64

>>> from sklearn.preprocessing import normalize
>>> rng = np.random.RandomState(42)
>>> x = rng.randn(2, 5)
>>> x
array([[ 0.49671415, -0.1382643 ,  0.64768854,  1.52302986, -0.23415337],
       [-0.23413696,  1.57921282,  0.76743473, -0.46947439,  0.54256004]])
>>> normalize(x)
array([[ 0.28396232, -0.07904315,  0.37027159,  0.87068807, -0.13386116],
       [-0.12251149,  0.82631858,  0.40155802, -0.24565113,  0.28389299]])
>>> x / np.linalg.norm(x, axis=1).reshape(-1, 1)
array([[ 0.28396232, -0.07904315,  0.37027159,  0.87068807, -0.13386116],
       [-0.12251149,  0.82631858,  0.40155802, -0.24565113,  0.28389299]])
>>> np.linalg.norm(normalize(x), axis=1)
array([ 1.,  1.])