Python同时转换多个列错误_Python_Pandas_Dataframe_Scikit Learn_Sklearn Pandas

Python同时转换多个列错误

python pandas dataframe scikit-learn

Python同时转换多个列错误,python,pandas,dataframe,scikit-learn,sklearn-pandas,Python,Pandas,Dataframe,Scikit Learn,Sklearn Pandas,我正在将python与pandas和sklearn一起使用，并尝试使用新的非常方便的sklearn pandas 我有一个大数据框架，需要以类似的方式转换多个列我在另一个变量中有多个列名源代码文档明确表示有可能使用相同的转换转换多个列，但以下代码的行为不符合预期： from sklearn.preprocessing import MinMaxScaler, LabelEncoder mapper = DataFrameMapper([[other[0],other[1]],LabelE

我正在将python与pandas和sklearn一起使用，并尝试使用新的非常方便的sklearn pandas

我有一个大数据框架，需要以类似的方式转换多个列

我在另一个变量中有多个列名源代码文档明确表示有可能使用相同的转换转换多个列，但以下代码的行为不符合预期：

from sklearn.preprocessing import MinMaxScaler, LabelEncoder

mapper = DataFrameMapper([[other[0],other[1]],LabelEncoder()])
mapper.fit_transform(df.copy())

我得到以下错误：

提升值错误错误输入形状{0}.formatshape ValueError:['EFW'，'BPD']：输入形状错误154,2

当我使用以下代码时，效果非常好：

cols = [(other[i], LabelEncoder()) for i,col in enumerate(other)]
mapper = DataFrameMapper(cols)
mapper.fit_transform(df.copy())

据我理解，两者都应该很好地工作并产生相同的结果。我做错了什么

谢谢

这里遇到的问题是，这两段代码在数据结构方面完全不同

cols=[other[i]，i的LabelEncoder，枚举其他中的col]构建元组列表。请注意，您可以将这一行代码缩短为：

cols = [(col, LabelEncoder()) for col in other]

无论如何，第一个代码段[[other[0]，other[1]]，LabelEncoder]生成一个包含两个元素的列表：一个列表和一个LabelEncoder实例。现在，您可以通过指定以下内容来转换多个列：

转换可能需要多个输入列。在这些情况下，可以在列表中指定列名：

mapper2=DataFrameMapper[ ['children'，'salary'，sklearn.decomposition.PCA1 ]

这是一个包含元组列表、对象结构元素的列表，而不是列表[list，object]结构元素

如果我们看一下源代码本身

class DataFrameMapper(BaseEstimator, TransformerMixin):
    """
    Map Pandas data frame column subsets to their own
    sklearn transformation.
    """

    def __init__(self, features, default=False, sparse=False, df_out=False,
                 input_df=False):
        """
        Params:
        features    a list of tuples with features definitions.
                    The first element is the pandas column selector. This can
                    be a string (for one column) or a list of strings.
                    The second element is an object that supports
                    sklearn's transform interface, or a list of such objects.
                    The third element is optional and, if present, must be
                    a dictionary with the options to apply to the
                    transformation. Example: {'alias': 'day_of_week'}

类定义中还明确指出，DataFrameMapper的features参数必须是元组列表，其中元组的元素可以是列表

最后一点，关于您实际收到错误消息的原因：sklearn中的LabelEncoder transformer用于在1D阵列上进行标记。因此，它基本上无法同时处理2列，并将引发异常。因此，如果要使用LabelEncoder，必须使用1个columnname和transformer构建N个元组，其中N是要转换的列数。

如果使用MinMaxScaler而不是LabelEncoder，则错误消失？-似乎LabelEncoder不能同时处理多个列，更好的是..显式地检查1D数据。@captainshai是的。LabelEncoder用于标签，仅处理一维数组。对于要转换的每个列，都需要使用单独的LabelEncoder。