Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/280.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 应用minmaxscaler后,列中的科学形式值给出了错误的输出_Python_Pandas_Data Science_Normalization_Scientific Notation - Fatal编程技术网

Python 应用minmaxscaler后,列中的科学形式值给出了错误的输出

Python 应用minmaxscaler后,列中的科学形式值给出了错误的输出,python,pandas,data-science,normalization,scientific-notation,Python,Pandas,Data Science,Normalization,Scientific Notation,我有一个数据框,如下所示。我在价格列上应用了MinMaxScaler,并使用逆_变换函数返回原始价格值,但它给出了错误的结果。请就此向我提出建议 数据帧: Date Customer Price 1/6/2019 A 142404534.13 1/7/2019 A 989.34 1/8/2019 A 45444.57 1/9/2019 A 574343.10 1/10/2019 A 23232.34 1/1/2019 A 92

我有一个数据框,如下所示。我在价格列上应用了MinMaxScaler,并使用逆_变换函数返回原始价格值,但它给出了错误的结果。请就此向我提出建议

数据帧:

Date    Customer    Price
1/6/2019    A   142404534.13
1/7/2019    A   989.34
1/8/2019    A   45444.57
1/9/2019    A   574343.10
1/10/2019   A   23232.34
1/1/2019    A   923423.00
1/2/2019    A   332.00
1/3/2019    A   2342323.24
1/4/2019    A   232.00
1/5/2019    A   65.70
1/6/2019    B   875.46
1/7/2019    B   142466027340.03
1/8/2019    B   25.17
1/9/2019    B   1.01
1/10/2019   B   1.00
1/10/2019   B   57.61
1/6/2019    B   232232.78
1/7/2019    B   15.20
1/8/2019    B   44.56
1/9/2019    B   2323254.45
1/10/2019   B   395.45
1/10/2019   B   23423454.92
1/6/2019    C   34.12
1/7/2019    C   89.34
1/8/2019    C   44.57
1/9/2019    C   343.10
1/10/2019   C   232.34
df上的MinMaxScaler代码:

 from sklearn.preprocessing import MinMaxScaler
    df['Price'] = df['Price'].apply(lambda x: '{:.2f}'.format(x))
    scaler=MinMaxScaler()
    dff = df.groupby('Customer').Price.transform(lambda s:scaler.fit_transform(s.values.reshape(-1,1)).ravel())
dff = pd.DataFrame(dff)
dff['Price'] = dff['Price'].apply(lambda x: '{:.2f}'.format(x))

dff = pd.concat([dff['Price'] , df['Customer']] , axis=1)
dff输出:

Price   Customer
0   1.00    A
1   0.00    A
2   0.00    A
3   0.00    A
4   0.00    A
5   0.01    A
6   0.00    A
7   0.02    A
8   0.00    A
9   0.00    A
10  0.00    B
11  1.00    B
12  0.00    B
.
.
.
.

20  0.00    B
21  0.00    B
22  0.00    C
23  0.18    C
24  0.03    C
25  1.00    C
26  0.64    C
获取实际价格值的逆变换函数代码:

dd = dff.groupby('Customer').Price.transform(lambda s: scaler.inverse_transform(s.values.reshape(-1,1)).ravel())

dd = pd.DataFrame(dd)
dd['Price'] = dd['Price'].apply(lambda x: '{:.2f}'.format(x))
dd = pd.concat([dd['Price'] , df['Customer']] , axis=1)
dd输出:

Price   Customer
0   343.10  A
1   34.12   A
2   34.12   A
3   34.12   A
4   34.12   A
5   37.21   A
6   34.12   A
7   40.30   A
8   34.12   A
9   34.12   A
10  34.12   B
11  343.10  B
12  34.12   B
13  34.12   B
.
.
.
.
.
18  34.12   B
19  34.12   B
20  34.12   B
21  34.12   B
22  34.12   C
23  89.74   C
24  43.39   C
25  343.10  C
26  231.87  C

请在这方面提供帮助和建议。

您在脚本中遇到了多个问题:

  • 使用
    df['Price']=df['Price'].apply(lambda x:'{.2f}.format(x))
    将Price列转换为字符串,因此任何数字操作在此数据类型上都无效。如果要在显示上设置浮点精度,可以在导入行后使用
    pd.set\u选项('display.float\u format',lambda x:'%0.2f'%x)
  • 如果我理解的很好,您希望按比例调整每个产品的价格 客户取值范围为0-1。如果是这种情况,则不能在一行中执行此操作,如:

    df.groupby('Customer').Price.transform(lambda x:scaler.fit_变换(s.values.restrape(-1,1)).ravel()

    因为每个组对scaler对象的每次迭代都会覆盖上一组的最小/最大参数。因此,在最后调用scaler.inverse_transform()时,您有最后一组的参数

  • 解决方案:

    import pandas as pd
    pd.set_option('display.float_format', lambda x: '%.7f' % x)
    from sklearn.preprocessing import MinMaxScaler
    
    scalers_dict = dict({}) # Prepare place to hold scalers per Customer
    
    df = pd.read_csv('stack_data.csv', parse_dates=['Date'])
    
    df['Price_scaled'] = None
    df['Price_inverse_scaled'] = None
    
    # Loop over Customers, scale Price and save scaler for further use
    for customer in pd.unique(df['Customer']):
        scalers_dict[customer] = MinMaxScaler()
        scaled_price = scalers_dict[customer].fit_transform(df[df['Customer']==customer].Price.values.reshape(-1, 1))
        df.loc[df['Customer']==customer, 'Price_scaled'] = scaled_price
    
    # Loop over Customers and inverse scaled values
    for customer in pd.unique(df['Customer']):
        inverse_scale = scalers_dict[customer].inverse_transform(df[df['Customer']==customer].Price_scaled.values.reshape(-1, 1))
        df.loc[df['Customer']==customer, 'Price_inverse_scaled'] = inverse_scale
    
    输出(我将精度设置为.7f,因为对于许多值来说,最大值太大,并且大多数值在双精度中变为0.00):

    编辑: 多列版本

    import pandas as pd
    pd.set_option('display.float_format', lambda x: '%.7f' % x)
    from sklearn.preprocessing import MinMaxScaler
    
    scalers_dict = dict({}) # Prepare place to hold scalers per Customer
    
    df = pd.read_csv('stack_data.csv', parse_dates=['Date'])
    df['Price2'] = range(df.shape[0])
    df['Price3'] = df['Price2'] * 2
    
    
    df = df.join(pd.DataFrame(0, df.index, ['Price1_scaled', 'Price2_scaled', 'Price3_scaled', 'Price1_inverse_scaled', 'Price2_inverse_scaled', 'Price3_inverse_scaled']))
    
    # Loop over Customers, scale Price and save scaler for further use
    for customer in pd.unique(df['Customer']):
        scalers_dict[customer] = MinMaxScaler()
        scaled_price = scalers_dict[customer].fit_transform(df.loc[df['Customer']==customer, 'Price':'Price3'])
        df.loc[df['Customer']==customer, 'Price1_scaled':'Price3_scaled'] = scaled_price
    
    # Loop over Customers and inverse scaled values
    for customer in pd.unique(df['Customer']):
        inverse_scale = scalers_dict[customer].inverse_transform(df.loc[df['Customer']==customer, 'Price1_scaled':'Price3_scaled'])
        df.loc[df['Customer']==customer, 'Price1_inverse_scaled':'Price3_inverse_scaled'] = inverse_scale
    
    输出:

             Date Customer                Price  Price2  Price3  Price1_scaled  Price2_scaled  Price3_scaled  Price1_inverse_scaled  Price2_inverse_scaled  Price3_inverse_scaled
    0  2019-01-06        A    142404534.1300000       0       0      1.0000000      0.0000000      0.0000000      142404534.1300000              0.0000000              0.0000000
    1  2019-01-07        A          989.3400000       1       2      0.0000065      0.1111111      0.1111111            989.3400000              1.0000000              2.0000000
    2  2019-01-08        A        45444.5700000       2       4      0.0003187      0.2222222      0.2222222          45444.5700000              2.0000000              4.0000000
    3  2019-01-09        A       574343.1000000       3       6      0.0040327      0.3333333      0.3333333         574343.1000000              3.0000000              6.0000000
    4  2019-01-10        A        23232.3400000       4       8      0.0001627      0.4444444      0.4444444          23232.3400000              4.0000000              8.0000000
    5  2019-01-01        A       923423.0000000       5      10      0.0064840      0.5555556      0.5555556         923423.0000000              5.0000000             10.0000000
    6  2019-01-02        A          332.0000000       6      12      0.0000019      0.6666667      0.6666667            332.0000000              6.0000000             12.0000000
    7  2019-01-03        A      2342323.2400000       7      14      0.0164479      0.7777778      0.7777778        2342323.2400000              7.0000000             14.0000000
    8  2019-01-04        A          232.0000000       8      16      0.0000012      0.8888889      0.8888889            232.0000000              8.0000000             16.0000000
    9  2019-01-05        A           65.7000000       9      18      0.0000000      1.0000000      1.0000000             65.7000000              9.0000000             18.0000000
    10 2019-01-06        B          875.4600000      10      20      0.0000000      0.0000000      0.0000000            875.4600000             10.0000000             20.0000000
    11 2019-01-07        B 142466027340.0299988      11      22      1.0000000      0.0909091      0.0909091   142466027340.0299988             11.0000000             22.0000000
    12 2019-01-08        B           25.1700000      12      24      0.0000000      0.1818182      0.1818182             25.1700000             12.0000000             24.0000000
    13 2019-01-09        B            1.0100000      13      26      0.0000000      0.2727273      0.2727273              1.0100000             13.0000000             26.0000000
    14 2019-01-10        B            1.0000000      14      28      0.0000000      0.3636364      0.3636364              1.0000000             14.0000000             28.0000000
    15 2019-01-10        B           57.6100000      15      30      0.0000000      0.4545455      0.4545455             57.6100000             15.0000000             30.0000000
    16 2019-01-06        B       232232.7800000      16      32      0.0000016      0.5454545      0.5454545         232232.7800000             16.0000000             32.0000000
    17 2019-01-07        B           15.2000000      17      34      0.0000000      0.6363636      0.6363636             15.2000000             17.0000000             34.0000000
    18 2019-01-08        B           44.5600000      18      36      0.0000000      0.7272727      0.7272727             44.5600000             18.0000000             36.0000000
    19 2019-01-09        B      2323254.4500000      19      38      0.0000163      0.8181818      0.8181818        2323254.4500000             19.0000000             38.0000000
    20 2019-01-10        B          395.4500000      20      40      0.0000000      0.9090909      0.9090909            395.4500000             20.0000000             40.0000000
    21 2019-01-10        B     23423454.9200000      21      42      0.0001644      1.0000000      1.0000000       23423454.9200000             21.0000000             42.0000000
    22 2019-01-06        C           34.1200000      22      44      0.0000000      0.0000000      0.0000000             34.1200000             22.0000000             44.0000000
    23 2019-01-07        C           89.3400000      23      46      0.1787171      0.2500000      0.2500000             89.3400000             23.0000000             46.0000000
    24 2019-01-08        C           44.5700000      24      48      0.0338210      0.5000000      0.5000000             44.5700000             24.0000000             48.0000000
    25 2019-01-09        C          343.1000000      25      50      1.0000000      0.7500000      0.7500000            343.1000000             25.0000000             50.0000000
    26 2019-01-10        C          232.3400000      26      52      0.6415302      1.0000000      1.0000000            232.3400000             26.0000000             52.0000000
    

    感谢您的回复,如果我想对price、weight和height等多个列运行上述逻辑,我该如何做?只需在循环中添加更多行和其他列:scaled\u price=scalers\u dict[customer]。fit\u transform(df[df['customer']==customer]。另一列。values。重塑(-1,1)),如果数据帧与此链接类似,如果我有50列,继续添加行不是一个好主意,你能建议我这样做吗?然后每个客户放入一整块定标器:scaled_price=scalers_dict[customer].fit_transform(df.loc[df['customer']==customer,'price':'Price50'])
             Date Customer                Price  Price2  Price3  Price1_scaled  Price2_scaled  Price3_scaled  Price1_inverse_scaled  Price2_inverse_scaled  Price3_inverse_scaled
    0  2019-01-06        A    142404534.1300000       0       0      1.0000000      0.0000000      0.0000000      142404534.1300000              0.0000000              0.0000000
    1  2019-01-07        A          989.3400000       1       2      0.0000065      0.1111111      0.1111111            989.3400000              1.0000000              2.0000000
    2  2019-01-08        A        45444.5700000       2       4      0.0003187      0.2222222      0.2222222          45444.5700000              2.0000000              4.0000000
    3  2019-01-09        A       574343.1000000       3       6      0.0040327      0.3333333      0.3333333         574343.1000000              3.0000000              6.0000000
    4  2019-01-10        A        23232.3400000       4       8      0.0001627      0.4444444      0.4444444          23232.3400000              4.0000000              8.0000000
    5  2019-01-01        A       923423.0000000       5      10      0.0064840      0.5555556      0.5555556         923423.0000000              5.0000000             10.0000000
    6  2019-01-02        A          332.0000000       6      12      0.0000019      0.6666667      0.6666667            332.0000000              6.0000000             12.0000000
    7  2019-01-03        A      2342323.2400000       7      14      0.0164479      0.7777778      0.7777778        2342323.2400000              7.0000000             14.0000000
    8  2019-01-04        A          232.0000000       8      16      0.0000012      0.8888889      0.8888889            232.0000000              8.0000000             16.0000000
    9  2019-01-05        A           65.7000000       9      18      0.0000000      1.0000000      1.0000000             65.7000000              9.0000000             18.0000000
    10 2019-01-06        B          875.4600000      10      20      0.0000000      0.0000000      0.0000000            875.4600000             10.0000000             20.0000000
    11 2019-01-07        B 142466027340.0299988      11      22      1.0000000      0.0909091      0.0909091   142466027340.0299988             11.0000000             22.0000000
    12 2019-01-08        B           25.1700000      12      24      0.0000000      0.1818182      0.1818182             25.1700000             12.0000000             24.0000000
    13 2019-01-09        B            1.0100000      13      26      0.0000000      0.2727273      0.2727273              1.0100000             13.0000000             26.0000000
    14 2019-01-10        B            1.0000000      14      28      0.0000000      0.3636364      0.3636364              1.0000000             14.0000000             28.0000000
    15 2019-01-10        B           57.6100000      15      30      0.0000000      0.4545455      0.4545455             57.6100000             15.0000000             30.0000000
    16 2019-01-06        B       232232.7800000      16      32      0.0000016      0.5454545      0.5454545         232232.7800000             16.0000000             32.0000000
    17 2019-01-07        B           15.2000000      17      34      0.0000000      0.6363636      0.6363636             15.2000000             17.0000000             34.0000000
    18 2019-01-08        B           44.5600000      18      36      0.0000000      0.7272727      0.7272727             44.5600000             18.0000000             36.0000000
    19 2019-01-09        B      2323254.4500000      19      38      0.0000163      0.8181818      0.8181818        2323254.4500000             19.0000000             38.0000000
    20 2019-01-10        B          395.4500000      20      40      0.0000000      0.9090909      0.9090909            395.4500000             20.0000000             40.0000000
    21 2019-01-10        B     23423454.9200000      21      42      0.0001644      1.0000000      1.0000000       23423454.9200000             21.0000000             42.0000000
    22 2019-01-06        C           34.1200000      22      44      0.0000000      0.0000000      0.0000000             34.1200000             22.0000000             44.0000000
    23 2019-01-07        C           89.3400000      23      46      0.1787171      0.2500000      0.2500000             89.3400000             23.0000000             46.0000000
    24 2019-01-08        C           44.5700000      24      48      0.0338210      0.5000000      0.5000000             44.5700000             24.0000000             48.0000000
    25 2019-01-09        C          343.1000000      25      50      1.0000000      0.7500000      0.7500000            343.1000000             25.0000000             50.0000000
    26 2019-01-10        C          232.3400000      26      52      0.6415302      1.0000000      1.0000000            232.3400000             26.0000000             52.0000000