Python中某些数据帧列的插补器
我正在学习如何在Python上使用插补器 这是我的代码:Python中某些数据帧列的插补器,python,scikit-learn,missing-data,imputation,Python,Scikit Learn,Missing Data,Imputation,我正在学习如何在Python上使用插补器 这是我的代码: df=pd.DataFrame([["XXL", 8, "black", "class 1", 22], ["L", np.nan, "gray", "class 2", 20], ["XL", 10, "blue", "class 2", 19], ["M", np.nan, "orange", "class 1", 17], ["M", 11, "green", "class 3", np.nan], ["M", 7, "red",
df=pd.DataFrame([["XXL", 8, "black", "class 1", 22],
["L", np.nan, "gray", "class 2", 20],
["XL", 10, "blue", "class 2", 19],
["M", np.nan, "orange", "class 1", 17],
["M", 11, "green", "class 3", np.nan],
["M", 7, "red", "class 1", 22]])
df.columns=["size", "price", "color", "class", "boh"]
from sklearn.preprocessing import Imputer
imp=Imputer(missing_values="NaN", strategy="mean" )
imp.fit(df["price"])
df["price"]=imp.transform(df["price"])
但是,这会导致以下错误:
ValueError:值的长度与索引的长度不匹配
我的代码怎么了
感谢您的帮助我想您应该指定输入器的轴,然后转置它返回的数组:
import pandas as pd
import numpy as np
df=pd.DataFrame([["XXL", 8, "black", "class 1", 22],
["L", np.nan, "gray", "class 2", 20],
["XL", 10, "blue", "class 2", 19],
["M", np.nan, "orange", "class 1", 17],
["M", 11, "green", "class 3", np.nan],
["M", 7, "red", "class 1", 22]])
df.columns=["size", "price", "color", "class", "boh"]
from sklearn.preprocessing import Imputer
imp=Imputer(missing_values="NaN", strategy="mean",axis=1 ) #specify axis
q = imp.fit_transform(df["price"]).T #perform a transpose operation
df["price"]=q
print df
这是因为
inputer
通常与数据帧而不是序列一起使用。一种可能的解决办法是:
imp=Imputer(missing_values="NaN", strategy="mean" )
imp.fit(df[["price"]])
df["price"]=imp.transform(df[["price"]]).ravel()
# Or even
imp=Imputer(missing_values="NaN", strategy="mean" )
df["price"]=imp.fit_transform(df[["price"]]).ravel()
简单的解决方案是提供一个二维阵列
df=pd.DataFrame([["XXL", 8, "black", "class 1", 22],
["L", np.nan, "gray", "class 2", 20],
["XL", 10, "blue", "class 2", 19],
["M", np.nan, "orange", "class 1", 17],
["M", 11, "green", "class 3", np.nan],
["M", 7, "red", "class 1", 22]])
df.columns=["size", "price", "color", "class", "boh"]
from sklearn.preprocessing import Imputer
imp=Imputer(missing_values="NaN", strategy="mean" )
imp.fit(df[["price"]])
df["price"]=imp.transform(df[["price"]])
df['boh'] = imp.fit_transform(df[['price']])
这是您的数据帧
这是fit方法的文档,它采用类似数组或稀疏矩阵作为输入参数。 您可以尝试以下方法:
imp.fit(df.iloc[:,1:2])
df['price']=imp.transform(df.iloc[:,1:2])
提供索引位置以适应方法,然后应用转换
>>> df
size price color class boh
0 XXL 8.0 black class 1 22.0
1 L 9.0 gray class 2 20.0
2 XL 10.0 blue class 2 19.0
3 M 9.0 orange class 1 17.0
4 M 11.0 green class 3 NaN
5 M 7.0 red class 1 22.0
对于boh
imp.fit(df.iloc[:,4:5])
df['price']=imp.transform(df.iloc[:,4:5])
>>> df
size price color class boh
0 XXL 8.0 black class 1 22.0
1 L 9.0 gray class 2 20.0
2 XL 10.0 blue class 2 19.0
3 M 9.0 orange class 1 17.0
4 M 11.0 green class 3 20.0
5 M 7.0 red class 1 22.0
如果我错了,请纠正我。欢迎您的建议。谢谢您,瑞安。真的很有用。不幸的是,这对我不起作用:(ValueError:Expected 2D array,Get 1D array:为什么这里需要
ravel()
呢?它似乎返回了正确的类型,但没有它1。如果您制作的是二维df[[“price”]],那么ravel()不需要。为了使插补和拟合转换工作,我们只需要二维。df[[“price”]]将数据转换为二维格式(行数,1)。2.如果您使用一维-df[“price”],则以下内容仍将工作,但也将返回错误-ValueError:预期的二维数组,改为1D数组:数组df[“price”]=imp.fit_变换(df[“price”]).ravel()