Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/361.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 黑色星期五数据集与蟒蛇的线性回归_Python_Dataset_Anaconda_Linear Regression - Fatal编程技术网

Python 黑色星期五数据集与蟒蛇的线性回归

Python 黑色星期五数据集与蟒蛇的线性回归,python,dataset,anaconda,linear-regression,Python,Dataset,Anaconda,Linear Regression,我试图用anaconda和黑色星期五的数据集来预测购买量 这是我的密码 train=pd.read_csv("C:\\Users\\User\\Documents\\data sets\\train.csv") test=pd.read_csv("C:\\Users\\User\\Documents\\data sets\\test.csv") import numpy as np frames=[train,test] data=pd.concat(fr

我试图用anaconda和黑色星期五的数据集来预测购买量 这是我的密码

    train=pd.read_csv("C:\\Users\\User\\Documents\\data sets\\train.csv")
    test=pd.read_csv("C:\\Users\\User\\Documents\\data sets\\test.csv")
    import numpy as np
    frames=[train,test]
    data=pd.concat(frames)
    print(data.shape)
    data.head()
    data.isnull().any()
    data.fillna(999,inplace=True)
    data.head(20)
    data.Age[data["Age"]=="0-17"]="15"
    data["Age"].head(10)
    data.Age[data["Age"]=="18-25"]="21"
    data.Age[data["Age"]=="26-35"]="30"
    data.Age[data["Age"]=="36-45"]="40"
    data.Age[data["Age"]=="46-50"]="48"
    data.Age[data["Age"]=="51-55"]="53"
    data.Age[data["Age"]=="55+"]="60"
   data.Stay_In_Current_City_Years[data["Stay_In_Current_City_Years"]=="4+"]
   ="4"
   data["Age"]=data["Age"].astype(int)
   data["Stay_In_Current_City_Years"]=data["Stay_In_Current_City_Years"].
   astype(int)
   data.dtypes
   data["Marital_Status"]=data["Marital_Status"].astype(int)
   data["Occupation"]=data["Occupation"].astype(int)
   data["Product_Category_1"]=data["Product_Category_1"].astype(int)
   data["Product_Category_1"]=data["Product_Category_1"].astype(int)
   data["Product_Category_2"]=data["Product_Category_2"].astype(float)
   data["Product_Category_3"]=data["Product_Category_3"].astype(float)
   data["Purchase"]=data["Purchase"].astype(float)
   sex=pd.get_dummies(data["Gender"]).iloc[:,1:]
   data1=pd.concat([data,sex],axis=1)
   city=pd.get_dummies(data["City_Category"]).iloc[:,1:]
   data1=pd.concat([data,sex,city],axis=1)
   # cross validation and creating the features and the target variable 
   from sklearn.cross_validation import train_test_split
   y=data1["Purchase"]
   x=data1[["Age","City_Category","Gender","Marital_Status","Occupation",
 "Product_Category_1","Product_Category_2","Product_Category_3","Product_ID"
  ,"Stay_In_Current_City_Years","User_ID","M","B","C"]]
  x_train,x_test,y_train,y_test=train_test_split(x,y)
   # building the regration
   from sklearn import linear_model
   reg=linear_model.LinearRegression()
   reg.fit(x_train,y_train)
但我一直在想:

    ValueError: could not convert string to float: 'P00100642'
这是什么意思?为了运行回归,我还需要将其他特性转换为整数吗? 我怎样才能修好它?
谢谢:)

机器学习算法只接受数字数据。列
Purchase\u ID
没有数字数据,因为它以“P”开头。您正试图传递它,因为它将因此获得错误

注意这些值中的模式,您将看到每个条目都以“
P00
”开头。由于它是一个字符串,因此可以将其替换为零

试试这个:

data['Product_ID'] = data['Product_ID'].str.replace('P00', '')

在此之后,您可以使用
StandardScaler

缩小值,您的代码的哪一行给出了错误?我想说是您的
astype(float)
给出了错误,所以您最好检查您试图转换的数据是什么,为什么您希望它是float(即表示float的字符串)为什么它不是你所期望的…这行:reg.fit(x_-train,y_-train)