如何在python中规范化（最小/最大）特定列？（数据帧）_Python_Pandas_Numpy_Dataframe

如何在python中规范化（最小/最大）特定列？（数据帧）

python pandas numpy dataframe

如何在python中规范化（最小/最大）特定列？（数据帧）,python,pandas,numpy,dataframe,Python,Pandas,Numpy,Dataframe,我一直致力于基于最小-最大标准化的数据标准化。我的数据集是存储在df_mols列表中的数据帧，如下所示 df_mols[0]: frequency Molecule0 0 -326.0 2.604015 1 -323.0 2.624186 2 -321.0 2.644598 3 -318.0 2.665254 4 -316.0 2.686159 ...

我一直致力于基于最小-最大标准化的数据标准化。我的数据集是存储在df_mols列表中的数据帧，如下所示

df_mols[0]:   
         frequency  Molecule0
 0        -326.0   2.604015
 1        -323.0   2.624186
 2        -321.0   2.644598
 3        -318.0   2.665254
 4        -316.0   2.686159
 ...         ...        ...
 1996     4589.0   4.565467
 1997     4591.0   4.512142
 1998     4594.0   4.459744
 1999     4596.0   4.408251
 2000     4598.0   4.357645
 
df_mols[1]:      
          frequency  Molecule1
 0        -357.0   0.368472
 1        -354.0   0.371063
 2        -352.0   0.373683
 3        -350.0   0.376332
 4        -347.0   0.379010
 ...         ...        ...
 1996     4293.0   0.538391
 1997     4295.0   0.532088
 1998     4297.0   0.525894
 1999     4300.0   0.519807
 2000        NaN        NaN

我只想标准化所有的分子柱。到目前为止我所做的是

from sklearn.preprocessing import MinMaxScaler
scaler=MinMaxScaler()

for i in df_mols:
  i['frequency']=i['frequency'].apply(np.rint) # This was to make frequency values into int
  i[:,1]=scaler.fit_transform(i[:,1])

并有如下错误

/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    554                     "Reshape your data either using array.reshape(-1, 1) if "
    555                     "your data has a single feature or array.reshape(1, -1) "
--> 556                     "if it contains a single sample.".format(array))
    557 
    558         # in the future np.flexible dtypes will be handled like object dtypes

ValueError: Expected 2D array, got 1D array instead:
array=[2.60401472 2.62418641 2.64459837 ... 4.45974369 4.4082515  4.35764454].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

我试图重塑，但做不到。我应该创建新系列，然后更新值吗？或者我应该解决这个问题？

谢谢：）

在重塑之前，您可以使用对象方法

.to_numpy（）

将您的pd.Series转换为np.ndarray。

我不确定这是否是您想要的，但我认为类似的方法可以实现

import pandas as pd
from sklearn.preprocessing import normalize

data = [[-326.0, 2.604015], [-323.0, 2.624186], [-321.0, 2.644598], [-318.0, 2.665254]]

df = pd.DataFrame(data, columns = ['frequency', 'Molecule0'])

print("Shape of column: ", df['Molecule0'].shape)

normalized_data = normalize(df['Molecule0'].to_numpy().reshape(1, -1), norm='max')[0]

print("Normalized data: ", normalized_data)

df['Molecule0'] = normalized_data
print(df)

当我运行这个时，我得到了以下输出

柱形：（4，）
标准化数据：[0.9770232 0.98459134 0.99224989 1.]

频率分子0
0-326.0 0.977023
1-323.0.984591
2-321.0.992250

3-318.01.000000

谢谢！但是，如果有将近70个分子，我想用for loop？/generic version？@sopL进行更改，那么会怎样？@sopL所有的分子都在同一个数据帧中？不，每个分子都在不同的数据帧中，但iloc[1]（例如：[频率，分子量0]，[频率，分子量1]，）和df_mols（列表）中的数据帧.对不起，我解释得不好。df_mol[0]=它是一个带有“频率”和“分子量0”列的系列，对于df_mol[1]=它是“频率”和“分子量1”列