Python 排序u值错误
我不知道我的代码出了什么问题Python 排序u值错误,python,pandas,dataframe,Python,Pandas,Dataframe,我不知道我的代码出了什么问题 import pandas as pd import numpy as np woe = [1.1147295474833758,0.364043491078754,-0.05525053172192353,-0.3950007109750665,-0.6784658191115104,-0.9522135140050229,-1.1441658353033486] iv = [0.29078213954085946,0.29078213954085946,0.29
import pandas as pd
import numpy as np
woe = [1.1147295474833758,0.364043491078754,-0.05525053172192353,-0.3950007109750665,-0.6784658191115104,-0.9522135140050229,-1.1441658353033486]
iv = [0.29078213954085946,0.29078213954085946,0.29078213954085946,0.29078213954085946,0.29078213954085946,0.29078213954085946,0.29078213954085946]
lis = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
fin = [lis,woe,iv]
fin = np.array(fin).T
df_disc = pd.DataFrame(fin,columns=['Label','WoE','IV'])
print(df_disc)
df_disc = df_disc.sort_values(by=['WoE'])
df_disc = df_disc.reset_index(drop=True)
print(df_disc)
结果
Label WoE IV
0 A 1.1147295474833758 0.29078213954085946
1 B 0.364043491078754 0.29078213954085946
2 C -0.05525053172192353 0.29078213954085946
3 D -0.3950007109750665 0.29078213954085946
4 E -0.6784658191115104 0.29078213954085946
5 F -0.9522135140050229 0.29078213954085946
6 G -1.1441658353033486 0.29078213954085946
Label WoE IV
0 C -0.05525053172192353 0.29078213954085946
1 D -0.3950007109750665 0.29078213954085946
2 E -0.6784658191115104 0.29078213954085946
3 F -0.9522135140050229 0.29078213954085946
4 G -1.1441658353033486 0.29078213954085946
5 B 0.364043491078754 0.29078213954085946
6 A 1.1147295474833758 0.29078213954085946
我认为正确的标签应该是标签G、F、E、D、C、B、A,但结果似乎是错误的问题在于数据帧中,列由对象填充,而不是数字 在代码中,如果转换字符串和数值,所有值都将转换为对象:
fin = np.array(fin).T
解决方案是按列名称使用字典,并传递给:
如果将字典传递给DataFrame
constructor,则可以防止:
df_disc = pd.DataFrame({'Label':lis,'WoE':woe,'IV':iv})
print(df_disc)
df_disc = df_disc.sort_values(by=['WoE'], ignore_index=True)
print(df_disc)
Label WoE IV
0 G -1.144166 0.290782
1 F -0.952214 0.290782
2 E -0.678466 0.290782
3 D -0.395001 0.290782
4 C -0.055251 0.290782
5 B 0.364043 0.290782
6 A 1.114730 0.290782
您的列
WoE
和IV
属于dtype
对象
。需要将其转换为浮点值
,以便正确进行排序
:
In [2723]: df_disc.dtypes
Out[2723]:
Label object
WoE object
IV object
dtype: object
In [2725]: df_disc.WoE = df_disc.WoE.astype(float)
In [2726]: df_disc.sort_values(by=['WoE'])
Out[2726]:
Label WoE IV
6 G -1.144166 0.29078213954085946
5 F -0.952214 0.29078213954085946
4 E -0.678466 0.29078213954085946
3 D -0.395001 0.29078213954085946
2 C -0.055251 0.29078213954085946
1 B 0.364043 0.29078213954085946
0 A 1.114730 0.29078213954085946
如上所述,该列包含字符串。要保持精度,请将序列转换为十进制:
from decimal import Decimal
# ...
df_disc['WoE'] = df_disc['WoE'].apply(Decimal)
df_disc = df_disc.sort_values(by='WoE')
print(df_disc)
印刷品:
Label WoE IV
6 G -1.1441658353033486 0.29078213954085946
5 F -0.9522135140050229 0.29078213954085946
4 E -0.6784658191115104 0.29078213954085946
3 D -0.3950007109750665 0.29078213954085946
2 C -0.05525053172192353 0.29078213954085946
1 B 0.364043491078754 0.29078213954085946
0 A 1.1147295474833758 0.29078213954085946
Label WoE IV
6 G -1.1441658353033486 0.29078213954085946
5 F -0.9522135140050229 0.29078213954085946
4 E -0.6784658191115104 0.29078213954085946
3 D -0.3950007109750665 0.29078213954085946
2 C -0.05525053172192353 0.29078213954085946
1 B 0.364043491078754 0.29078213954085946
0 A 1.1147295474833758 0.29078213954085946