如何在python中规范化(最小/最大)特定列?(数据帧)
我一直致力于基于最小-最大标准化的数据标准化。 我的数据集是存储在df_mols列表中的数据帧,如下所示如何在python中规范化(最小/最大)特定列?(数据帧),python,pandas,numpy,dataframe,Python,Pandas,Numpy,Dataframe,我一直致力于基于最小-最大标准化的数据标准化。 我的数据集是存储在df_mols列表中的数据帧,如下所示 df_mols[0]: frequency Molecule0 0 -326.0 2.604015 1 -323.0 2.624186 2 -321.0 2.644598 3 -318.0 2.665254 4 -316.0 2.686159 ...
df_mols[0]:
frequency Molecule0
0 -326.0 2.604015
1 -323.0 2.624186
2 -321.0 2.644598
3 -318.0 2.665254
4 -316.0 2.686159
... ... ...
1996 4589.0 4.565467
1997 4591.0 4.512142
1998 4594.0 4.459744
1999 4596.0 4.408251
2000 4598.0 4.357645
df_mols[1]:
frequency Molecule1
0 -357.0 0.368472
1 -354.0 0.371063
2 -352.0 0.373683
3 -350.0 0.376332
4 -347.0 0.379010
... ... ...
1996 4293.0 0.538391
1997 4295.0 0.532088
1998 4297.0 0.525894
1999 4300.0 0.519807
2000 NaN NaN
我只想标准化所有的分子柱。
到目前为止我所做的是
from sklearn.preprocessing import MinMaxScaler
scaler=MinMaxScaler()
for i in df_mols:
i['frequency']=i['frequency'].apply(np.rint) # This was to make frequency values into int
i[:,1]=scaler.fit_transform(i[:,1])
并有如下错误
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
554 "Reshape your data either using array.reshape(-1, 1) if "
555 "your data has a single feature or array.reshape(1, -1) "
--> 556 "if it contains a single sample.".format(array))
557
558 # in the future np.flexible dtypes will be handled like object dtypes
ValueError: Expected 2D array, got 1D array instead:
array=[2.60401472 2.62418641 2.64459837 ... 4.45974369 4.4082515 4.35764454].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
我试图重塑,但做不到。我应该创建新系列,然后更新值吗?或者我应该解决这个问题?
谢谢:)在重塑之前,您可以使用对象方法
.to_numpy()
将您的pd.Series转换为np.ndarray。我不确定这是否是您想要的,但我认为类似的方法可以实现
import pandas as pd
from sklearn.preprocessing import normalize
data = [[-326.0, 2.604015], [-323.0, 2.624186], [-321.0, 2.644598], [-318.0, 2.665254]]
df = pd.DataFrame(data, columns = ['frequency', 'Molecule0'])
print("Shape of column: ", df['Molecule0'].shape)
normalized_data = normalize(df['Molecule0'].to_numpy().reshape(1, -1), norm='max')[0]
print("Normalized data: ", normalized_data)
df['Molecule0'] = normalized_data
print(df)
当我运行这个时,我得到了以下输出
柱形:(4,)标准化数据:[0.9770232 0.98459134 0.99224989 1.]
频率分子0
0-326.0 0.977023
1-323.0.984591
2-321.0.992250
3-318.01.000000谢谢!但是,如果有将近70个分子,我想用for loop?/generic version?@sopL进行更改,那么会怎样?@sopL所有的分子都在同一个数据帧中?不,每个分子都在不同的数据帧中,但iloc[1](例如:[频率,分子量0],[频率,分子量1],)和df_mols(列表)中的数据帧.对不起,我解释得不好。df_mol[0]=它是一个带有“频率”和“分子量0”列的系列,对于df_mol[1]=它是“频率”和“分子量1”列