Python Sklearn逆变换适用于多个列时仅返回一列_Python_Scikit Learn

Python Sklearn逆变换适用于多个列时仅返回一列

python scikit-learn

Python Sklearn逆变换适用于多个列时仅返回一列,python,scikit-learn,Python,Scikit Learn,当初始转换器适合整个数据集时，是否有方法使用sklearn对一列进行逆变换？下面是我想要得到的一个例子 import pandas as pd import numpy as np from sklearn.pipeline import Pipeline from sklearn.preprocessing import MinMaxScaler # Setting up a dummy pipeline pipes = [] pipes.append(('scaler', MinMaxSc

当初始转换器适合整个数据集时，是否有方法使用sklearn对一列进行逆变换？下面是我想要得到的一个例子

import pandas as pd
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler

# Setting up a dummy pipeline
pipes = []
pipes.append(('scaler', MinMaxScaler()))
transformation_pipeline = Pipeline(pipes)

# Random data.
df = pd.DataFrame(
    {'data1': [1, 2, 3, 1, 2, 3],
     'data2': [1, 1, 1, 2, 2, 2],
     'Y': [1, 4, 1, 2, 2, 2]
    }
)

# Fitting the transformation pipeline
test = transformation_pipeline.fit_transform(df)

# Pulling the scaler function from the pipeline.
scaler = transformation_pipeline.named_steps['scaler']

# This is what I thought may work.
predicted_transformed = scaler.inverse_transform(test['Y'])

# The output would look something like this
# Essentially overlooking that scaler was fit on 3 variables and fitting
# the last one, or any I need.
predicted_transfromed = [1, 4, 1, 2, 2, 2]

作为数据准备过程的一部分，我需要能够适应整个数据集。但是稍后我将使用sklearn.externals joblibs将scaler导入另一个实例。在这个新实例中，预测值是唯一存在的东西。所以我只需要提取Y列的逆定标器，就可以得到原始值

我知道我可以为X变量和Y变量安装一个变压器，但是，我希望避免这种情况。这种方法将增加移动缩放器并在未来项目中维护这两个缩放器的复杂性。

类似的问题。我有一个多维时间序列作为输入（一个数量和“外生”变量），一个维度（一个数量）作为输出。我无法反转缩放以将预测与原始测试集进行比较，因为缩放器需要多维输入

我能想到的一个解决方案是对数量和外部列使用单独的定标器

我能想到的另一个解决方案是，给定标器足够的“垃圾”列，只需填写要取消标度的数组的维度，然后只查看输出的第一列

然后，一旦我进行了预测，我就可以反转预测上的缩放比例，以获得可以与测试集进行比较的值。
有点晚了，但我认为这段代码实现了您想要的功能：

# - scaler = the scaler object (it needs an inverse_transform method) # - data = the data to be inverse transformed as a Series, ndarray, ... # (a 1d object you can assign to a df column) # - ftName = the name of the column to which the data belongs # - colNames = all column names of the data on which scaler was fit # (necessary because scaler will only accept a df of the same shape as the one it was fit on) def invTransform(scaler, data, colName, colNames): dummy = pd.DataFrame(np.zeros((len(data), len(colNames))), columns=colNames) dummy[colName] = data dummy = pd.DataFrame(scaler.inverse_transform(dummy), columns=colNames) return dummy[colName].values

请注意，您需要提供足够的信息，以便在后台使用
缩放器对象的逆变换方法运行。改进Willem所说的内容。这将在投入较少的情况下起作用 def invTransform(scaler, data): dummy = pd.DataFrame(np.zeros((len(data), scaler.n_features_in_))) dummy[0] = data dummy = pd.DataFrame(scaler.inverse_transform(dummy), columns=dummy.columns) return dummy[0].values 然后，您可以使用scaler对象的scale\uuu 和min\uu 属性中的适当值。@VivekKumar您能举一个例子说明您在代码中的意思吗？这是完全错误的。每次都将数据放在第一列上，因此如果原始数据不在第一列上，则很有可能使用错误的缩放器进行缩放。