Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/351.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在python中规范化(最小/最大)特定列?(数据帧)_Python_Pandas_Numpy_Dataframe - Fatal编程技术网

如何在python中规范化(最小/最大)特定列?(数据帧)

如何在python中规范化(最小/最大)特定列?(数据帧),python,pandas,numpy,dataframe,Python,Pandas,Numpy,Dataframe,我一直致力于基于最小-最大标准化的数据标准化。 我的数据集是存储在df_mols列表中的数据帧,如下所示 df_mols[0]: frequency Molecule0 0 -326.0 2.604015 1 -323.0 2.624186 2 -321.0 2.644598 3 -318.0 2.665254 4 -316.0 2.686159 ...

我一直致力于基于最小-最大标准化的数据标准化。 我的数据集是存储在df_mols列表中的数据帧,如下所示

df_mols[0]:   
         frequency  Molecule0
 0        -326.0   2.604015
 1        -323.0   2.624186
 2        -321.0   2.644598
 3        -318.0   2.665254
 4        -316.0   2.686159
 ...         ...        ...
 1996     4589.0   4.565467
 1997     4591.0   4.512142
 1998     4594.0   4.459744
 1999     4596.0   4.408251
 2000     4598.0   4.357645
 
df_mols[1]:      
          frequency  Molecule1
 0        -357.0   0.368472
 1        -354.0   0.371063
 2        -352.0   0.373683
 3        -350.0   0.376332
 4        -347.0   0.379010
 ...         ...        ...
 1996     4293.0   0.538391
 1997     4295.0   0.532088
 1998     4297.0   0.525894
 1999     4300.0   0.519807
 2000        NaN        NaN
我只想标准化所有的分子柱。 到目前为止我所做的是

from sklearn.preprocessing import MinMaxScaler
scaler=MinMaxScaler()

for i in df_mols:
  i['frequency']=i['frequency'].apply(np.rint) # This was to make frequency values into int
  i[:,1]=scaler.fit_transform(i[:,1])
并有如下错误

/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    554                     "Reshape your data either using array.reshape(-1, 1) if "
    555                     "your data has a single feature or array.reshape(1, -1) "
--> 556                     "if it contains a single sample.".format(array))
    557 
    558         # in the future np.flexible dtypes will be handled like object dtypes

ValueError: Expected 2D array, got 1D array instead:
array=[2.60401472 2.62418641 2.64459837 ... 4.45974369 4.4082515  4.35764454].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
我试图重塑,但做不到。我应该创建新系列,然后更新值吗?或者我应该解决这个问题?
谢谢:)

在重塑之前,您可以使用对象方法
.to_numpy()
将您的pd.Series转换为np.ndarray。

我不确定这是否是您想要的,但我认为类似的方法可以实现

import pandas as pd
from sklearn.preprocessing import normalize

data = [[-326.0, 2.604015], [-323.0, 2.624186], [-321.0, 2.644598], [-318.0, 2.665254]]

df = pd.DataFrame(data, columns = ['frequency', 'Molecule0'])

print("Shape of column: ", df['Molecule0'].shape)

normalized_data = normalize(df['Molecule0'].to_numpy().reshape(1, -1), norm='max')[0]

print("Normalized data: ", normalized_data)

df['Molecule0'] = normalized_data
print(df)
当我运行这个时,我得到了以下输出

柱形:(4,)
标准化数据:[0.9770232 0.98459134 0.99224989 1.]

频率分子0
0-326.0 0.977023
1-323.0.984591
2-321.0.992250

3-318.01.000000

谢谢!但是,如果有将近70个分子,我想用for loop?/generic version?@sopL进行更改,那么会怎样?@sopL所有的分子都在同一个数据帧中?不,每个分子都在不同的数据帧中,但iloc[1](例如:[频率,分子量0],[频率,分子量1],)和df_mols(列表)中的数据帧.对不起,我解释得不好。df_mol[0]=它是一个带有“频率”和“分子量0”列的系列,对于df_mol[1]=它是“频率”和“分子量1”列