Python 支持向量机模型预测不'；不变_Python_Machine Learning_Scikit Learn_Svm

Python 支持向量机模型预测不'；不变

python machine-learning scikit-learn

Python 支持向量机模型预测不'；不变,python,machine-learning,scikit-learn,svm,Python,Machine Learning,Scikit Learn,Svm,当我改变分类因子的水平时，SVM模型预测不会改变假设因子A的A1和A2为水平，因子B的B1和B2为水平。现在，当我把A的能级改为A1时，我得到了输出O1，但当我把能级改为A2时，我继续得到相同的结果。也将B的电平更改为B1，我得到一个输出P1，但当我将电平更改为B2时，我继续得到P1 可能的原因是什么我的模型基本上 19个参数，由于一个热编码而膨胀到193个。大约有1600个数据点 #Pre Procesing df = pd.read_csv("01.METAD

当我改变分类因子的水平时，SVM模型预测不会改变

假设因子A的A1和A2为水平，因子B的B1和B2为水平。现在，当我把A的能级改为A1时，我得到了输出O1，但当我把能级改为A2时，我继续得到相同的结果。也将B的电平更改为B1，我得到一个输出P1，但当我将电平更改为B2时，我继续得到P1

可能的原因是什么

我的模型基本上 19个参数，由于一个热编码而膨胀到193个。大约有1600个数据点

   #Pre Procesing
        df = pd.read_csv("01.METADATA.csv")
    cat_prams = ["LGTCOND", "PRECREVbin", "FIRSTCRA", "TRAFFLDT", "PREVEH", "CRITPRE", "AVOIDMAN", "GAD1", "MANUSE", 
                 "SEATPOS", "PARTNERCLASSe"]
    cont_prams = ['OAL',"OAW", 'DIRDAMW', 'MAX', 'VC_1ST', 'DV_1ST', "LANEOPP"]
    HIS = ["HISPID"]
    all_param = HIS + cont_prams + cat_prams
    df = df[all_param]
    df["HISPID"] = df["HISPID"].astype("category")
    for cat_pram in cat_prams:
        df[cat_pram] = df[cat_pram].astype("category")

    df= df.dropna()

    # ONE HOT ENCODING
    cat_columns = cat_prams
    df_onehot = pd.get_dummies(df, prefix_sep = "__", columns = cat_prams)
    cat_dummies = [col for col in df_onehot if "__" in col and col.split("__")[0] in cat_columns]
    processed_columns = list(df_onehot.columns[:])

    df_onehot.to_csv("02_SVM_model_Input.csv")

    #Train Test split
    y = df_onehot["HISPID"]
    col = df_onehot.columns[1:]
    X = df_onehot[col]
    order = X.columns
    print(order)

    # Normalizing the factors
    from sklearn.preprocessing import StandardScaler
    scaler = StandardScaler()
    X = scaler.fit_transform(X)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=101)

    linearmodel = SVC(kernel= "linear", probability= True).fit(X_train, y_train)
    lin_predictions = linearmodel.predict(X_test)
    print(classification_report(y_test,lin_predictions))
    error_shift(lin_predictions, y_test)

#Creating an impulse in the actual data and predicting the outcome
print("\nShape of trainign: ", X_train.shape)
print("No of parameters used in training: ", X_train.shape[1], "\n")

dft = pd.read_csv("01.METADATA.csv")
dft = dft[all_param]

print(dft.iloc[:, 6].head())
dft.iloc[:, 6] = dft["DV_1ST"]*2
print(dft.iloc[:, 6].head())


for cat_pram in cat_prams:
    dft[cat_pram] = dft[cat_pram].astype("category")
dft= dft.dropna()
print("dft Shape (after dropping rows with null):" , dft.shape)

dft_onehot = pd.get_dummies(dft, prefix_sep = "__", columns = cat_prams)
print("dft_onehot Shape (after creating dummies):" , dft_onehot.shape)
cat_dummies = [col for col in dft_onehot if "__" in col and col.split("__")[0] in cat_columns]


# Remove additional columns
print("---------------------")
for col in dft_onehot.columns:
    if ("__" in col) and (col.split("__")[0] in cat_columns) and col not in processed_columns:
        print("Removing feature {}".format(col))
        dft_onehot.drop(col, axis=1, inplace=True)
print("---------------------")
print("dft_onehot Shape (after removing extra param):" , dft_onehot.shape)


    #Add missing columns:
    print("---------------------")
    for col in processed_columns:
        if col not in dft_onehot.columns:
            print("Adding feature {}".format(col))
            dft_onehot[col] = 0
    print("---------------------")
    print("dft_onehot Shape (after adding missing param):" , dft_onehot.shape)


    yt = dft_onehot["HISPID"]
    col = dft_onehot.columns[1:]
    Xt = dft_onehot[col]
    print("y Shape:" , yt.shape)
    Xt = dft_onehot[dft_onehot.columns[1:]]
    print("X Shape:" , Xt.shape)
    Xt = Xt[order]
    Xt.to_csv("Xt.csv")
    Xt.head()
    print(Xt.columns)


    Xt = scaler.fit_transform(Xt)
    new_predict = linearmodel.predict(Xt)
    impulse_result = pd.DataFrame({'Actual': yt, 'Altered': new_predict })

你能把一个简单的复制案例，代码和可能的占位符数据放在一起吗。为了吸引人们的注意力，使问题更容易理解，我认为问题应该包括一些亮点、示例代码、输出等；不仅仅是纯文本。在选择最好的变量之前，你是否对两个类的均值进行了t检验比较？您正在更改支持向量机的哪些参数？你有不平衡的类吗？我已经添加了我写的脚本哦，你是说其中一个分类特征与预测无关？为什么这是个问题？您确定该功能应该相关吗？