Python 应用minmaxscaler后,列中的科学形式值给出了错误的输出
我有一个数据框,如下所示。我在价格列上应用了MinMaxScaler,并使用逆_变换函数返回原始价格值,但它给出了错误的结果。请就此向我提出建议 数据帧:Python 应用minmaxscaler后,列中的科学形式值给出了错误的输出,python,pandas,data-science,normalization,scientific-notation,Python,Pandas,Data Science,Normalization,Scientific Notation,我有一个数据框,如下所示。我在价格列上应用了MinMaxScaler,并使用逆_变换函数返回原始价格值,但它给出了错误的结果。请就此向我提出建议 数据帧: Date Customer Price 1/6/2019 A 142404534.13 1/7/2019 A 989.34 1/8/2019 A 45444.57 1/9/2019 A 574343.10 1/10/2019 A 23232.34 1/1/2019 A 92
Date Customer Price
1/6/2019 A 142404534.13
1/7/2019 A 989.34
1/8/2019 A 45444.57
1/9/2019 A 574343.10
1/10/2019 A 23232.34
1/1/2019 A 923423.00
1/2/2019 A 332.00
1/3/2019 A 2342323.24
1/4/2019 A 232.00
1/5/2019 A 65.70
1/6/2019 B 875.46
1/7/2019 B 142466027340.03
1/8/2019 B 25.17
1/9/2019 B 1.01
1/10/2019 B 1.00
1/10/2019 B 57.61
1/6/2019 B 232232.78
1/7/2019 B 15.20
1/8/2019 B 44.56
1/9/2019 B 2323254.45
1/10/2019 B 395.45
1/10/2019 B 23423454.92
1/6/2019 C 34.12
1/7/2019 C 89.34
1/8/2019 C 44.57
1/9/2019 C 343.10
1/10/2019 C 232.34
df上的MinMaxScaler代码:
from sklearn.preprocessing import MinMaxScaler
df['Price'] = df['Price'].apply(lambda x: '{:.2f}'.format(x))
scaler=MinMaxScaler()
dff = df.groupby('Customer').Price.transform(lambda s:scaler.fit_transform(s.values.reshape(-1,1)).ravel())
dff = pd.DataFrame(dff)
dff['Price'] = dff['Price'].apply(lambda x: '{:.2f}'.format(x))
dff = pd.concat([dff['Price'] , df['Customer']] , axis=1)
dff输出:
Price Customer
0 1.00 A
1 0.00 A
2 0.00 A
3 0.00 A
4 0.00 A
5 0.01 A
6 0.00 A
7 0.02 A
8 0.00 A
9 0.00 A
10 0.00 B
11 1.00 B
12 0.00 B
.
.
.
.
20 0.00 B
21 0.00 B
22 0.00 C
23 0.18 C
24 0.03 C
25 1.00 C
26 0.64 C
获取实际价格值的逆变换函数代码:
dd = dff.groupby('Customer').Price.transform(lambda s: scaler.inverse_transform(s.values.reshape(-1,1)).ravel())
dd = pd.DataFrame(dd)
dd['Price'] = dd['Price'].apply(lambda x: '{:.2f}'.format(x))
dd = pd.concat([dd['Price'] , df['Customer']] , axis=1)
dd输出:
Price Customer
0 343.10 A
1 34.12 A
2 34.12 A
3 34.12 A
4 34.12 A
5 37.21 A
6 34.12 A
7 40.30 A
8 34.12 A
9 34.12 A
10 34.12 B
11 343.10 B
12 34.12 B
13 34.12 B
.
.
.
.
.
18 34.12 B
19 34.12 B
20 34.12 B
21 34.12 B
22 34.12 C
23 89.74 C
24 43.39 C
25 343.10 C
26 231.87 C
请在这方面提供帮助和建议。您在脚本中遇到了多个问题:
df['Price']=df['Price'].apply(lambda x:'{.2f}.format(x))
将Price列转换为字符串,因此任何数字操作在此数据类型上都无效。如果要在显示上设置浮点精度,可以在导入行后使用pd.set\u选项('display.float\u format',lambda x:'%0.2f'%x)
import pandas as pd
pd.set_option('display.float_format', lambda x: '%.7f' % x)
from sklearn.preprocessing import MinMaxScaler
scalers_dict = dict({}) # Prepare place to hold scalers per Customer
df = pd.read_csv('stack_data.csv', parse_dates=['Date'])
df['Price_scaled'] = None
df['Price_inverse_scaled'] = None
# Loop over Customers, scale Price and save scaler for further use
for customer in pd.unique(df['Customer']):
scalers_dict[customer] = MinMaxScaler()
scaled_price = scalers_dict[customer].fit_transform(df[df['Customer']==customer].Price.values.reshape(-1, 1))
df.loc[df['Customer']==customer, 'Price_scaled'] = scaled_price
# Loop over Customers and inverse scaled values
for customer in pd.unique(df['Customer']):
inverse_scale = scalers_dict[customer].inverse_transform(df[df['Customer']==customer].Price_scaled.values.reshape(-1, 1))
df.loc[df['Customer']==customer, 'Price_inverse_scaled'] = inverse_scale
输出(我将精度设置为.7f,因为对于许多值来说,最大值太大,并且大多数值在双精度中变为0.00):
编辑:
多列版本
import pandas as pd
pd.set_option('display.float_format', lambda x: '%.7f' % x)
from sklearn.preprocessing import MinMaxScaler
scalers_dict = dict({}) # Prepare place to hold scalers per Customer
df = pd.read_csv('stack_data.csv', parse_dates=['Date'])
df['Price2'] = range(df.shape[0])
df['Price3'] = df['Price2'] * 2
df = df.join(pd.DataFrame(0, df.index, ['Price1_scaled', 'Price2_scaled', 'Price3_scaled', 'Price1_inverse_scaled', 'Price2_inverse_scaled', 'Price3_inverse_scaled']))
# Loop over Customers, scale Price and save scaler for further use
for customer in pd.unique(df['Customer']):
scalers_dict[customer] = MinMaxScaler()
scaled_price = scalers_dict[customer].fit_transform(df.loc[df['Customer']==customer, 'Price':'Price3'])
df.loc[df['Customer']==customer, 'Price1_scaled':'Price3_scaled'] = scaled_price
# Loop over Customers and inverse scaled values
for customer in pd.unique(df['Customer']):
inverse_scale = scalers_dict[customer].inverse_transform(df.loc[df['Customer']==customer, 'Price1_scaled':'Price3_scaled'])
df.loc[df['Customer']==customer, 'Price1_inverse_scaled':'Price3_inverse_scaled'] = inverse_scale
输出:
Date Customer Price Price2 Price3 Price1_scaled Price2_scaled Price3_scaled Price1_inverse_scaled Price2_inverse_scaled Price3_inverse_scaled
0 2019-01-06 A 142404534.1300000 0 0 1.0000000 0.0000000 0.0000000 142404534.1300000 0.0000000 0.0000000
1 2019-01-07 A 989.3400000 1 2 0.0000065 0.1111111 0.1111111 989.3400000 1.0000000 2.0000000
2 2019-01-08 A 45444.5700000 2 4 0.0003187 0.2222222 0.2222222 45444.5700000 2.0000000 4.0000000
3 2019-01-09 A 574343.1000000 3 6 0.0040327 0.3333333 0.3333333 574343.1000000 3.0000000 6.0000000
4 2019-01-10 A 23232.3400000 4 8 0.0001627 0.4444444 0.4444444 23232.3400000 4.0000000 8.0000000
5 2019-01-01 A 923423.0000000 5 10 0.0064840 0.5555556 0.5555556 923423.0000000 5.0000000 10.0000000
6 2019-01-02 A 332.0000000 6 12 0.0000019 0.6666667 0.6666667 332.0000000 6.0000000 12.0000000
7 2019-01-03 A 2342323.2400000 7 14 0.0164479 0.7777778 0.7777778 2342323.2400000 7.0000000 14.0000000
8 2019-01-04 A 232.0000000 8 16 0.0000012 0.8888889 0.8888889 232.0000000 8.0000000 16.0000000
9 2019-01-05 A 65.7000000 9 18 0.0000000 1.0000000 1.0000000 65.7000000 9.0000000 18.0000000
10 2019-01-06 B 875.4600000 10 20 0.0000000 0.0000000 0.0000000 875.4600000 10.0000000 20.0000000
11 2019-01-07 B 142466027340.0299988 11 22 1.0000000 0.0909091 0.0909091 142466027340.0299988 11.0000000 22.0000000
12 2019-01-08 B 25.1700000 12 24 0.0000000 0.1818182 0.1818182 25.1700000 12.0000000 24.0000000
13 2019-01-09 B 1.0100000 13 26 0.0000000 0.2727273 0.2727273 1.0100000 13.0000000 26.0000000
14 2019-01-10 B 1.0000000 14 28 0.0000000 0.3636364 0.3636364 1.0000000 14.0000000 28.0000000
15 2019-01-10 B 57.6100000 15 30 0.0000000 0.4545455 0.4545455 57.6100000 15.0000000 30.0000000
16 2019-01-06 B 232232.7800000 16 32 0.0000016 0.5454545 0.5454545 232232.7800000 16.0000000 32.0000000
17 2019-01-07 B 15.2000000 17 34 0.0000000 0.6363636 0.6363636 15.2000000 17.0000000 34.0000000
18 2019-01-08 B 44.5600000 18 36 0.0000000 0.7272727 0.7272727 44.5600000 18.0000000 36.0000000
19 2019-01-09 B 2323254.4500000 19 38 0.0000163 0.8181818 0.8181818 2323254.4500000 19.0000000 38.0000000
20 2019-01-10 B 395.4500000 20 40 0.0000000 0.9090909 0.9090909 395.4500000 20.0000000 40.0000000
21 2019-01-10 B 23423454.9200000 21 42 0.0001644 1.0000000 1.0000000 23423454.9200000 21.0000000 42.0000000
22 2019-01-06 C 34.1200000 22 44 0.0000000 0.0000000 0.0000000 34.1200000 22.0000000 44.0000000
23 2019-01-07 C 89.3400000 23 46 0.1787171 0.2500000 0.2500000 89.3400000 23.0000000 46.0000000
24 2019-01-08 C 44.5700000 24 48 0.0338210 0.5000000 0.5000000 44.5700000 24.0000000 48.0000000
25 2019-01-09 C 343.1000000 25 50 1.0000000 0.7500000 0.7500000 343.1000000 25.0000000 50.0000000
26 2019-01-10 C 232.3400000 26 52 0.6415302 1.0000000 1.0000000 232.3400000 26.0000000 52.0000000
感谢您的回复,如果我想对price、weight和height等多个列运行上述逻辑,我该如何做?只需在循环中添加更多行和其他列:scaled\u price=scalers\u dict[customer]。fit\u transform(df[df['customer']==customer]。另一列。values。重塑(-1,1)),如果数据帧与此链接类似,如果我有50列,继续添加行不是一个好主意,你能建议我这样做吗?然后每个客户放入一整块定标器:scaled_price=scalers_dict[customer].fit_transform(df.loc[df['customer']==customer,'price':'Price50'])
Date Customer Price Price2 Price3 Price1_scaled Price2_scaled Price3_scaled Price1_inverse_scaled Price2_inverse_scaled Price3_inverse_scaled
0 2019-01-06 A 142404534.1300000 0 0 1.0000000 0.0000000 0.0000000 142404534.1300000 0.0000000 0.0000000
1 2019-01-07 A 989.3400000 1 2 0.0000065 0.1111111 0.1111111 989.3400000 1.0000000 2.0000000
2 2019-01-08 A 45444.5700000 2 4 0.0003187 0.2222222 0.2222222 45444.5700000 2.0000000 4.0000000
3 2019-01-09 A 574343.1000000 3 6 0.0040327 0.3333333 0.3333333 574343.1000000 3.0000000 6.0000000
4 2019-01-10 A 23232.3400000 4 8 0.0001627 0.4444444 0.4444444 23232.3400000 4.0000000 8.0000000
5 2019-01-01 A 923423.0000000 5 10 0.0064840 0.5555556 0.5555556 923423.0000000 5.0000000 10.0000000
6 2019-01-02 A 332.0000000 6 12 0.0000019 0.6666667 0.6666667 332.0000000 6.0000000 12.0000000
7 2019-01-03 A 2342323.2400000 7 14 0.0164479 0.7777778 0.7777778 2342323.2400000 7.0000000 14.0000000
8 2019-01-04 A 232.0000000 8 16 0.0000012 0.8888889 0.8888889 232.0000000 8.0000000 16.0000000
9 2019-01-05 A 65.7000000 9 18 0.0000000 1.0000000 1.0000000 65.7000000 9.0000000 18.0000000
10 2019-01-06 B 875.4600000 10 20 0.0000000 0.0000000 0.0000000 875.4600000 10.0000000 20.0000000
11 2019-01-07 B 142466027340.0299988 11 22 1.0000000 0.0909091 0.0909091 142466027340.0299988 11.0000000 22.0000000
12 2019-01-08 B 25.1700000 12 24 0.0000000 0.1818182 0.1818182 25.1700000 12.0000000 24.0000000
13 2019-01-09 B 1.0100000 13 26 0.0000000 0.2727273 0.2727273 1.0100000 13.0000000 26.0000000
14 2019-01-10 B 1.0000000 14 28 0.0000000 0.3636364 0.3636364 1.0000000 14.0000000 28.0000000
15 2019-01-10 B 57.6100000 15 30 0.0000000 0.4545455 0.4545455 57.6100000 15.0000000 30.0000000
16 2019-01-06 B 232232.7800000 16 32 0.0000016 0.5454545 0.5454545 232232.7800000 16.0000000 32.0000000
17 2019-01-07 B 15.2000000 17 34 0.0000000 0.6363636 0.6363636 15.2000000 17.0000000 34.0000000
18 2019-01-08 B 44.5600000 18 36 0.0000000 0.7272727 0.7272727 44.5600000 18.0000000 36.0000000
19 2019-01-09 B 2323254.4500000 19 38 0.0000163 0.8181818 0.8181818 2323254.4500000 19.0000000 38.0000000
20 2019-01-10 B 395.4500000 20 40 0.0000000 0.9090909 0.9090909 395.4500000 20.0000000 40.0000000
21 2019-01-10 B 23423454.9200000 21 42 0.0001644 1.0000000 1.0000000 23423454.9200000 21.0000000 42.0000000
22 2019-01-06 C 34.1200000 22 44 0.0000000 0.0000000 0.0000000 34.1200000 22.0000000 44.0000000
23 2019-01-07 C 89.3400000 23 46 0.1787171 0.2500000 0.2500000 89.3400000 23.0000000 46.0000000
24 2019-01-08 C 44.5700000 24 48 0.0338210 0.5000000 0.5000000 44.5700000 24.0000000 48.0000000
25 2019-01-09 C 343.1000000 25 50 1.0000000 0.7500000 0.7500000 343.1000000 25.0000000 50.0000000
26 2019-01-10 C 232.3400000 26 52 0.6415302 1.0000000 1.0000000 232.3400000 26.0000000 52.0000000