Python SKlearn fit方法不起作用_Python_Machine Learning_Scikit Learn

Python SKlearn fit方法不起作用

python machine-learning scikit-learn

Python SKlearn fit方法不起作用,python,machine-learning,scikit-learn,Python,Machine Learning,Scikit Learn,我正在使用Python（3.6）和Sklearn进行一个项目。我已经完成了分类，但当我尝试将其应用于重塑，以便与Sklearn的fit方法一起使用时，它返回了一个错误以下是我尝试过的： # Get all the columns from dataframe columns = data.columns.tolist() # Filter the columns to remove data we don't want columns = [c for c in columns if c n

我正在使用Python（3.6）和Sklearn进行一个项目。我已经完成了分类，但当我尝试将其应用于重塑，以便与Sklearn的fit方法一起使用时，它返回了一个错误

以下是我尝试过的：

# Get all the columns from dataframe
columns = data.columns.tolist()

# Filter the columns to remove data we don't want
columns = [c for c in columns if c not in ["Class"] ]

# store the variables we want to predicting on
target = "Class"
X = data.drop(target, 1)
Y = data[target]

# Print the shapes of X & Y
print(X.shape)
print(Y.shape)

# define a random state
state = 1

# define the outlier detection method
classifiers = {
    "Isolation Forest": IsolationForest(max_samples=len(X),
                                       contamination=outlier_fraction,
                                       random_state=state),
    "Local Outlier Factor": LocalOutlierFactor(
    n_neighbors = 20,
    contamination = outlier_fraction)
}



 # fit the model
n_outliers = len(Fraud)

for i, (clf_name, clf) in enumerate(classifiers.items()):

    # fit te data and tag outliers
    if clf_name == "Local Outlier Factor":
        y_pred = clf.fit_predict(X)
        scores_pred = clf.negative_outlier_factor_
    else:
        clf.fit(X)
        scores_pred = clf.decision_function(X)
        y_pred = clf.predict(X)

    # Reshape the prediction values to 0 for valid and 1 for fraudulent
    y_pred[y_pred == 1] = 0
    y_pred[y_pred == -1] = 1

    n_errors = (y_pred != Y).sum()

    # run classification metrics 
    print('{}:{}'.format(clf_name, n_errors))
    print(accuracy_score(Y, y_pred ))
    print(classification_report(Y, y_pred ))

然后返回以下错误：

ValueError: could not convert string to float: '301.48 Change: $0.00'
and it's pointed to  `clf.fit(X)` line.

我配置错了什么？

我们可以根据数据集的唯一性将其转换为数字数据值，您也可以从数据集中删除不必要的列

以下是您可以尝试的方法：

df_full = pd.read_excel('input/samp.xlsx', sheet_name=0,)
df_full = df_full[df_full.filter(regex='^(?!Unnamed)').columns]
df_full.drop(['paymentdetails',], 1, inplace=True)
df_full.drop(['timestamp'], 1, inplace=True)
# Handle non numaric data
def handle_non_numaric_data(df_full):
    columns = df_full.columns.values

    for column in columns:
        text_digit_vals = {}
        def convert_to_int(val):
            return text_digit_vals[val]

        if df_full[column].dtype != np.int64 and df_full[column].dtype != np.float64:
            column_contents = df_full[column].values.tolist()
            unique_elements = set(column_contents)
            x = 0
            for unique in unique_elements:
                if unique not in text_digit_vals:
                    text_digit_vals[unique] = x
                    x+=1

            df_full[column] = list(map(convert_to_int, df_full[column]))

    return df_full

您在X中传递的数据是错误的，并且包含此短语。嗨@VivekKumar，我有一个大数据框架，我可以忽略此类值吗？不可以。唯一的选择是不传递整个列，或者事先转换为数字。不管怎样，

“301.48变化：$0.00”

代表什么？您确定它对一列而不是两列混合正确吗？是的，它是正确的，但我们可以将它转换为第一个float/is not值。例如，在本例中，我们可以使用301.48并忽略字符串的其余部分。@AbdulRehman如果您只想提取float/int部分，请将其从列中解析出来并使用它。。使用

fit

时，它只接受

浮动。。因此，如果字符串（包括数字）确实意味着什么，您可能需要使用TF-ID或BOW