Python 数据帧的无监督聚类方法存在的问题_Python_Pandas_Dataframe_Scikit Learn_Cluster Analysis

Python 数据帧的无监督聚类方法存在的问题

python pandas dataframe scikit-learn

Python 数据帧的无监督聚类方法存在的问题,python,pandas,dataframe,scikit-learn,cluster-analysis,Python,Pandas,Dataframe,Scikit Learn,Cluster Analysis,我正在做一些Python ML练习，我被一个问题困住了。我有一个数据框，有7列和近10k行。其中6个列/变量是对象，1个是浮点。这7个变量是：公司、工作、技术、学位、经验（一个浮动变量-#年）、城市和经验水平我想做一个无监督聚类来显示我认为重要的两个变量我一直在测试的代码不起作用，而且我的混合变量似乎存在问题 x = df y = x.pop('Metier') y.unique() OneHotEncoder().fit(df.dropna()).ca

我正在做一些Python ML练习，我被一个问题困住了。我有一个数据框，有7列和近10k行。其中6个列/变量是对象，1个是浮点。这7个变量是：公司、工作、技术、学位、经验（一个浮动变量-#年）、城市和经验水平

我想做一个无监督聚类来显示我认为重要的两个变量

我一直在测试的代码不起作用，而且我的混合变量似乎存在问题

    x = df
    y = x.pop('Metier')

    y.unique()

    OneHotEncoder().fit(df.dropna()).categories_

    x.values, y

    for weights in ['uniform', 'distance']:
    # we create an instance of Neighbours Classifier and fit the data.
    clf = KNN.KNeighborsClassifier(5, weights=weights)
    
    clf.fit(x.values, y.values)

    # Plot the decision boundary. For that, we will assign a color to each
    # point in the mesh [x_min, x_max]x[y_min, y_max].
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])

    # Put the result into a color plot
    Z = Z.reshape(xx.shape)
    plt.figure()
    plt.pcolormesh(xx, yy, Z, cmap=cmap_light)

    # Plot also the training points
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_bold,
                edgecolor='k', s=20)
    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())
    plt.title("3-Class classification (k = %i, weights = '%s')"
              % (n_neighbors, weights))

plt.show()

顺便说一下，这是第8个练习，所以我的所有导入和数据帧加载都是在开始时完成的

我一直遇到的错误是

ValueError:无法将字符串转换为float:“赛诺菲”

（公司名称）

我正在尽最大努力训练和提高我的Python技能。我希望我提供了足够的信息来证明这一点。有没有更好的方法来实现我的目标？我只能使用以下库：

import pandas as pd
import numpy as np
import re
import sklearn as sk
import sklearn.neighbors as KNN
from sklearn.preprocessing import OneHotEncoder
import seaborn as sb
from matplotlib import pyplot as plt

希望我能想出这个棘手的练习，任何帮助都将不胜感激！我提前感谢你：）非常高兴能越来越多地学习我的Python技能

这是我的df:

您似乎有文本数据，需要借助onehotencoder、countvectorizer或TFIDFvectorizer将其转换为数字在我的第四行中，我尝试使用onehotencoder。。。有没有更好的实施方法。您需要分离浮动和字符串2。对文本数据3执行

fit_transform（）

。将其分配给一个新变量