Python Scikit学习对混合数据进行聚类(数字和分类)

Python Scikit学习对混合数据进行聚类(数字和分类),python,machine-learning,scikit-learn,levenshtein-distance,mean-shift,Python,Machine Learning,Scikit Learn,Levenshtein Distance,Mean Shift,有人可以帮助修改下面的工作示例,从共享数据创建集群吗 该示例使用Scikit Learn的均值漂移聚类来识别农艺设施中类似/共定位植物物种的斑块 在这类问题中,除了使用数值外,还曾提出过关于使用分类值的类似问题,但我认为此示例有所不同,原因如下:此问题中的非数值不能简单地用一个和零个伪值进行编码。例如,我们不能对“马兜铃属植物”和“马兜铃属植物”这样的值进行热编码,因为在名称上具有这种相似性的物种需要根据其家族聚集在一起,除了X和Y值所给出的地理接近度之外。在创建集群时,名称的相似性与位置同样重

有人可以帮助修改下面的工作示例,从共享数据创建集群吗

该示例使用Scikit Learn的均值漂移聚类来识别农艺设施中类似/共定位植物物种的斑块

在这类问题中,除了使用数值外,还曾提出过关于使用分类值的类似问题,但我认为此示例有所不同,原因如下:此问题中的非数值不能简单地用一个和零个伪值进行编码。例如,我们不能对“马兜铃属植物”和“马兜铃属植物”这样的值进行热编码,因为在名称上具有这种相似性的物种需要根据其家族聚集在一起,除了X和Y值所给出的地理接近度之外。在创建集群时,名称的相似性与位置同样重要

我尝试过两件事:为物种名称中的字母指定任意数字值,以表明拼写相似的名称在数字行上更接近。我打算对这些值应用自动缩放,并插入到带有X和Y坐标的脚本中。这不起作用,因为不同的名字在数字上非常相似

我的另一个尝试是通过使用Levenstein距离来合并分类值。但距离的输出仅基于比较两个值。如果输出显示每个字符串与所有其他字符串之间的距离,如何将该结果作为Meanshift算法的输入

无论如何,这里是数据和工作脚本,目前只使用数值。 我真的很感激任何关于如何使用分类值的相似性对这些数据进行聚类的例子

多谢各位

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from itertools import cycle
from sklearn.cluster import MeanShift, estimate_bandwidth
from sklearn.datasets.samples_generator import make_blobs

df=pd.DataFrame()

df["POINT_X"]=[-75.933169765,-75.932900302,-75.933060039,-75.932456135,-75.932334122,-75.933383845,-75.933378563,-75.933290334,-75.933302506,-75.932024669,-75.931803297,-75.931777655,-75.9317845,-75.931807731,
               -75.931794839,-75.932045113,-75.932165473,-75.932763574,-75.93216276,-75.932066326,-75.931934871,-75.932294115,-75.931852284,-75.93187799,-75.932063549,-75.932377939,-75.932466697,-75.9324484,-75.932523695,
               -75.932484492,-75.931882652,-75.932006344,-75.932228988,-75.932702486,-75.933245229,-75.933165385,-75.932990797,-75.932741398,-75.932519195,-75.932336262,-75.932264764,-75.932953569,-75.932938167,-75.933098289,
               -75.932503985,-75.932597591,-75.932551382,-75.932541384,-75.932575066,-75.932751274,-75.932869969,-75.932086405,-75.932125915,-75.932089623,-75.932229816,-75.932356252,-75.93221234,-75.932505964,-75.932455199,
               -75.932672148,-75.932823439,-75.93266258,-75.932722695,-75.93262497,-75.932613958,-75.932726832,-75.933179618,-75.933413275,-75.932911947,-75.93293013,-75.933129681,-75.933348106,-75.933328068,-75.9333501,
               -75.933133529,-75.93306104,-75.933020824,-75.933056158,-75.933261164,-75.933157803,-75.933320158,-75.93306193,-75.932935915,-75.933125758,-75.933088069,-75.933158642,-75.9331282,-75.933096121,-75.933250109,
               -75.933325084,-75.933336448,-75.934785616,-75.934843128,-75.93387422,-75.933996517,-75.934114484,-75.934560855,-75.935138185,-75.935228902,-75.935550248,-75.935326059,-75.935167468,-75.935038326,-75.934937151,
               -75.934476218,-75.934576771,-75.934556169,-75.934324709,-75.934215059,-75.934185509,-75.933996183,-75.938853557,-75.937435702,-75.93755249,-75.93709863,-75.937584727,-75.937080786,-75.93717527,-75.937158245,
               -75.937153622,-75.937255458,-75.937291351,-75.937463492,-75.937508635,-75.937568922,-75.937604,-75.937643152,-75.937538299,-75.936224493,-75.936538213,-75.936653234,-75.936672687,-75.936781092,-75.936765158,
               -75.936775048,-75.93680606,-75.936808197,-75.936753824,-75.936637658,-75.936923553,-75.936872045,-75.936871187,-75.936735385,-75.936800934,-75.936504657,-75.936528774,-75.936462867,-75.936301988,-75.936248282,
               -75.936192436,-75.935933385,-75.93679036,-75.936984567,-75.937178376,-75.937072594,-75.936212479,-75.937100912,-75.937075027,-75.93703418,-75.936553923,-75.936563813,-75.936750108,-75.935328068,-75.93329076,
               -75.933274837,-75.932816577,-75.932958943,-75.932872736,-75.933039998,-75.932930987,-75.932975423,-75.932987859,-75.932944342,-75.932984985,-75.933102016,-75.933042959,-75.935432474,-75.93539475,-75.935456177,
               -75.935413297,-75.935564812,-75.936518316,-75.935680005,-75.936558194,-75.935736741,-75.935754977,-75.935809,-75.935866569,-75.936134435,-75.936272398,-75.936252114,-75.936497277,-75.936178069,-75.933545359,
               -75.933462287,-75.933528848,-75.933456247,-75.933508043,-75.933443108,-75.933436682,-75.933293086,-75.933458306,-75.932948828,-75.933541322,-75.933719067,-75.933560447,-75.934586709,-75.934531055,-75.93416494,
               -75.933882234,-75.934830229,-75.934978045,-75.934357619,-75.934605828,-75.934754661,-75.934743056,-75.934130125,-75.935928887,-75.936286533,-75.936425628,-75.936477105,-75.935622798,-75.935607342,-75.936576534,
               -75.936823941,-75.936664385,-75.936985859,-75.936927641,-75.937655315,-75.93754798,-75.937409554,-75.937780814,-75.936920843,-75.93724831,-75.937473965,-75.937712006,-75.935331673,-75.936250622,-75.934986449,
               -75.938144151,-75.938287148,-75.938572438,-75.938677207,-75.938737192,-75.936696505,-75.9379094,-75.937601482,-75.931082221,-75.931152233,-75.931929379,-75.931886037,-75.931539305,-75.93145414,-75.931517537,
               -75.93206476,-75.931104594,-75.930886831,-75.930796839,-75.930770692,-75.934395391,-75.933485857,-75.935094793,-75.935243938,-75.934978751,-75.935325475,-75.935361712,-75.933975927,-75.933883586,-75.936299827,
               -75.934936738,-75.935015301,-75.934930658,-75.935287011,-75.935294894,-75.937784172,-75.937770775,-75.938253481,-75.93826076,-75.937784726,-75.93717805,-75.938872368,-75.938875092,-75.939336652,-75.940266037,
               -75.940331239,-75.940421181,-75.940331999,-75.940177713,-75.939332917,-75.938994759,-75.939607395,-75.939598636,-75.939560673,-75.939534037,-75.939555948,-75.939015855,-75.939243491,-75.938789939,-75.933198497,
               -75.93296926,-75.933132717,-75.932772368,-75.932419051,-75.93293841,-75.932798596,-75.932208745,-75.93206523,-75.931983351,-75.932410373,-75.931891975,-75.931568921,-75.931771254,-75.932397243,-75.931396196,
               -75.931519619,-75.932093909,-75.931942073,-75.934429867,-75.934438719,-75.93453334,-75.934266886,-75.934183909,-75.93452075,-75.933856314,-75.933881074,-75.933901224,-75.933751983,-75.933594864,-75.93358154,
               -75.93347677,-75.933895768,-75.933917682,-75.933687372,-75.933927415,-75.933739282,-75.933891053,-75.933712267,-75.93361711,-75.933901067,-75.934161321,-75.934305249,-75.934239461,-75.934211658,-75.933980238,
               -75.934018133,-75.93397582,-75.933918536,-75.933971179,-75.933877169]

df["POINT_Y"]=[38.95259201,38.952468493,38.952585964,38.952220643,38.952172451,38.952978948,38.952611101,38.952620123,38.952527583,38.952013642,38.951971095,38.951950598,38.951878617,38.951867573,38.952051039,38.952319899,
               38.952751776,38.952261808,38.951645828,38.951591344,38.951583443,38.951660428,38.951750197,38.951752666,38.951776696,38.951792968,38.951787078,38.951862848,38.951800999,38.951744805,38.951870508,38.951889649,
               38.951936158,38.95170948,38.951751749,38.951735386,38.951742727,38.951588575,38.951528477,38.951520106,38.951519453,38.951936698,38.952010261,38.952013956,38.952102079,38.952165877,38.952146088,38.952089106,
               38.952117254,38.952151545,38.949969545,38.951201998,38.951159228,38.951123753,38.950778391,38.950531943,38.950989092,38.950097211,38.950208568,38.950065183,38.950071356,38.949923603,38.9498474,38.949809668,
               38.949757376,38.949571133,38.951447294,38.95147755,38.950581745,38.950733667,38.951069352,38.951237478,38.95107276,38.95096753,38.9508122,38.950734862,38.950688169,38.950514372,38.950075351,38.950010511,38.949960875,
               38.949992064,38.95007398,38.950101272,38.950295815,38.950227769,38.950211517,38.950441255,38.950335632,38.95024686,38.950307666,38.950528546,38.950513096,38.950187972,38.950217841,38.950263645,38.950510523,
               38.950755399,38.950708302,38.950286311,38.950229957,38.950164615,38.950045229,38.949970825,38.949877169,38.949993101,38.949660647,38.949543522,38.949625589,38.949412861,38.949487811,38.949880172,38.951839048,
               38.952063455,38.949880835,38.951913953,38.949897842,38.949754481,38.949913573,38.951052934,38.951134326,38.951215119,38.951281057,38.951294341,38.951397886,38.951533389,38.951672146,38.949658462,38.950068808,
               38.949883166,38.949852263,38.949919533,38.950057898,38.950028999,38.950188832,38.950304129,38.950435138,38.950514515,38.950622084,38.950381874,38.949994828,38.950052327,38.949830647,38.949824853,38.949732702,
               38.949761675,38.949791427,38.949879419,38.949914074,38.949955099,38.951691376,38.951766177,38.951785811,38.951832242,38.951733008,38.950873805,38.951440038,38.951405074,38.951254936,38.951212584,38.951201821,
               38.951198089,38.951901959,38.94884403,38.948941748,38.949353979,38.949035993,38.949016785,38.94887402,38.948802413,38.948722997,38.94868013,38.948698153,38.948609493,38.948407937,38.948413538,38.94884251,
               38.948821237,38.948818421,38.948795076,38.949678178,38.949281509,38.949751466,38.949261269,38.949715525,38.949652229,38.949566304,38.949532396,38.949542936,38.949567821,38.94953658,38.949563742,38.948735942,
               38.952147575,38.952155751,38.951912912,38.951985954,38.952728799,38.952622921,38.952451597,38.952436249,38.95231594,38.952313127,38.951745893,38.952390373,38.952286187,38.952708734,38.951839413,38.952030386,
               38.951616852,38.951420298,38.951608998,38.952554863,38.9520134,38.951292914,38.951667791,38.952112184,38.954031241,38.953799626,38.953837241,38.953853864,38.953692287,38.953686947,38.953751245,38.953616457,
               38.95369262,38.953694331,38.953744736,38.953742862,38.953858308,38.953767308,38.953659111,38.953499777,38.953494864,38.953676808,38.953570088,38.953574927,38.953146008,38.953138966,38.953219752,38.953218684,
               38.953196026,38.953217491,38.953260642,38.953365184,38.953343071,38.953392347,38.95584336,38.955799692,38.956182326,38.95621302,38.956049617,38.957470088,38.957171152,38.956453402,38.956649954,38.956791692,
               38.957180989,38.957521592,38.955754158,38.95553646,38.955953035,38.956405511,38.956660878,38.957086511,38.957423389,38.957793854,38.957835976,38.955448024,38.955021013,38.954934154,38.954927544,38.954598007,
               38.954570833,38.954367294,38.954343,38.954497793,38.954471,38.954821256,38.954369125,38.955348715,38.955333171,38.955343991,38.955489753,38.955493927,38.955516735,38.955049181,38.955110383,38.954724398,38.954521524,
               38.954517463,38.954512208,38.954493542,38.954434212,38.954117479,38.95435162,38.954310712,38.954277052,38.954161078,38.954580606,38.954197375,38.955451505,38.955596079,38.955045523,38.955097295,38.955970146,
               38.954232335,38.95411988,38.953505553,38.955288869,38.955759644,38.955647996,38.955040953,38.954949777,38.95485026,38.954643337,38.954546745,38.953547289,38.953542137,38.953995634,38.954146947,38.954862356,
               38.953287566,38.954523419,38.954915863,38.955002144,38.954945777,38.955006524,38.95507815,38.955120243,38.953067979,38.953073084,38.953453648,38.953640022,38.953641026,38.954062633,38.954027667,38.954110137,
               38.954249401,38.953874232,38.953529725,38.953628972,38.953476826,38.95351151,38.953498365,38.953491846,38.953767787,38.953843351,38.953849161]

#Must incorporate these identifiers and cluster by similarity of species in addition to their proximity.
df["Category"]=['Aristolochia macrophylla', 'Aristolochia macrophylla', 'Aristolochia macrophylla', 'Aristolochia macrophylla', 'Aristolochia macrophylla', 'Aristolochia macrophylla', 'Aristolochia macrophylla',
                'Aristolochia macrophylla', 'Aristolochia macrophylla', 'Aristolochia macrophylla', 'Aristolochia macrophylla', 'Aristolochia macrophylla', 'Aristolochia macrophylla', 'Aristolochia macrophylla',
                'Aristolochia macrophylla', 'Aristolochia macrophylla', 'Aristolochia macrophylla', 'Aristolochia macrophylla', 'Aristolochia durior', 'Aristolochia durior', 'Aristolochia durior', 'Aristolochia durior',
                'Aristolochia durior', 'Aristolochia durior', 'Aristolochia durior', 'Aristolochia durior', 'Aristolochia durior', 'Aristolochia durior', 'Aristolochia durior', 'Aristolochia durior', 'Aristolochia durior',
                'Aristolochia durior', 'Aristolochia durior', 'Aristolochia durior', 'Aristolochia tomentosa', 'Aristolochia tomentosa', 'Aristolochia tomentosa', 'Aristolochia tomentosa', 'Aristolochia tomentosa',
                'Aristolochia tomentosa', 'Aristolochia tomentosa', 'Aristolochia tomentosa', 'Aristolochia tomentosa', 'Aristolochia tomentosa', 'Aristolochia tomentosa', 'Aristolochia tomentosa', 'Aristolochia tomentosa',
                'Aristolochia tomentosa', 'Aristolochia tomentosa', 'Aristolochia tomentosa', 'Buddleia davidii', 'Buddleia davidii', 'Buddleia davidii', 'Buddleia davidii', 'Buddleia davidii', 'Buddleia davidii',
                'Buddleia davidii', 'Buddleia davidii', 'Buddleia davidii', 'Buddleia davidii', 'Buddleia davidii', 'Buddleia davidii', 'Buddleia davidii', 'Buddleia davidii', 'Buddleia davidii', 'Buddleia davidii',
                'Buddleia x weyeriana', 'Buddleia x weyeriana', 'Buddleia x weyeriana', 'Buddleia x weyeriana', 'Buddleia x weyeriana', 'Buddleia x weyeriana', 'Buddleia x weyeriana', 'Buddleia x weyeriana', 'Buddleia x weyeriana',
                'Buddleia x weyeriana', 'Buddleia x weyeriana', 'Buddleia x weyeriana', 'Chamaecyparis obtusa', 'Chamaecyparis obtusa', 'Chamaecyparis obtusa', 'Chamaecyparis obtusa', 'Chamaecyparis obtusa', 'Chamaecyparis obtusa',
                'Chamaecyparis obtusa', 'Chamaecyparis obtusa', 'Chamaecyparis obtusa', 'Chamaecyparis obtusa', 'Chamaecyparis obtusa', 'Chamaecyparis obtusa', 'Chamaecyparis obtusa', 'Chamaecyfoccia gracilis',
                'Chamaecyfoccia gracilis', 'Chamaecyfoccia gracilis', 'Chamaecyfoccia gracilis', 'Chamaecyfoccia gracilis', 'Chamaecyfoccia gracilis', 'Chamaecyfoccia gracilis', 'Chamaecyfoccia gracilis', 'Chamaecyparis pisifera',
                'Chamaecyparis pisifera', 'Chamaecyparis pisifera', 'Chamaecyparis pisifera', 'Chamaecyparis pisifera', 'Chamaecyparis pisifera', 'Chamaecyparis pisifera', 'Chamaecyparis pisifera', 'Chamaecyparis pisifera',
                'Chamaecyparis pisifera', 'Chamaecyparis pisifera', 'Chamaecyparis pisifera', 'Cornus alba', 'Cornus alba', 'Cornus alba', 'Cornus alba', 'Cornus alba', 'Cornus alba', 'Cornus alba', 'Cornus alba', 'Cornus alba',
                'Cornus alba', 'Cornus alba', 'Cornus alba', 'Cornus alba', 'Cornus alba', 'Cornus alba', 'Cornus alba', 'Cornus alba', 'Cornus albernifolia', 'Cornus albernifolia', 'Cornus albernifolia', 'Cornus albernifolia',
                'Cornus albernifolia', 'Cornus albernifolia', 'Cornus albernifolia', 'Cornus albernifolia', 'Cornus albernifolia', 'Cornus albernifolia', 'Cornus albernifolia', 'Cornus albernifolia', 'Cornus albernifolia',
                'Cornus albernifolia', 'Cornus albernifolia', 'Cornus albernifolia', 'Cornus albernifolia', 'Cornus albernifolia', 'Cornus albernifolia', 'Cornus albernifolia', 'Cornus albernifolia', 'Cornus albernifolia',
                'Cornus canadensis', 'Cornus canadensis', 'Cornus canadensis', 'Cornus canadensis', 'Cornus canadensis', 'Cornus canadensis', 'Cornus canadensis', 'Cornus canadensis', 'Cornus canadensis', 'Cornus canadensis',
                'Cornus canadensis', 'Cornus canadensis', 'Cornus canadensis', 'Euonymus alata', 'Euonymus alata', 'Euonymus alata', 'Euonymus alata', 'Euonymus alata', 'Euonymus alata', 'Euonymus alata', 'Euonymus alata',
                'Euonymus alata', 'Euonymus alata', 'Euonymus alata', 'Euonymus alata', 'Euonymus alata', 'Euphorbia pulcherrima', 'Euphorbia pulcherrima', 'Euphorbia pulcherrima', 'Euphorbia pulcherrima', 'Euphorbia pulcherrima',
                'Euphorbia pulcherrima', 'Euphorbia pulcherrima', 'Euphorbia pulcherrima', 'Euphorbia pulcherrima', 'Euphorbia pulcherrima', 'Euphorbia pulcherrima', 'Euphorbia pulcherrima', 'Euphorbia pulcherrima',
                'Euphorbia pulcherrima', 'Euphorbia pulcherrima', 'Euphorbia pulcherrima', 'Euphorbia pulcherrima', 'Galanthus nivalis', 'Galanthus nivalis', 'Galanthus nivalis', 'Galanthus nivalis', 'Galanthus nivalis',
                'Galanthus nivalis', 'Galanthus nivalis', 'Galanthus nivalis', 'Galanthus nivalis', 'Galanthus nivalis', 'Galanthus nivalis', 'Galanthus nivalis', 'Galanthus nivalis', 'Galanthus nivalisodoratum',
                'Galanthus nivalisodoratum', 'Galanthus nivalisodoratum', 'Galanthus nivalisodoratum', 'Galanthus nivalisodoratum', 'Galanthus nivalisodoratum', 'Galanthus nivalisodoratum', 'Galanthus nivalisodoratum',
                'Galanthus nivalisodoratum', 'Galanthus nivalisodoratum', 'Galanthus nivalisodoratum', 'Hakonechloa macra', 'Hakonechloa macra', 'Hakonechloa macra', 'Hakonechloa macra', 'Hakonechloa macra', 'Hakonechloa macra',
                'Hakonechloa macra', 'Hakonechloa macra', 'Hakonechloa macra', 'Hakonechloa macra', 'Hakonechloa macra', 'Hakonechloa macra', 'Hakonechloa macra', 'Hakonechloa macra', 'Hakonechloa macra', 'Hakonechloa macra',
                'Hakonechloa macra', 'Hakonechloa macra', 'Hakonechloa macra', 'Hakonechloa aureola-macra', 'Hakonechloa aureola-macra', 'Hakonechloa aureola-macra', 'Hakonechloa aureola-macra', 'Hakonechloa aureola-macra',
                'Hakonechloa aureola-macra', 'Hakonechloa aureola-macra', 'Hakonechloa aureola-macra', 'Hakonechloa aureola-macra', 'Hakonechloa aureola-macra', 'Hakonechloa aureola-macra', 'Ilex crenata Hetzii',
                'Ilex crenata Hetzii', 'Ilex crenata Hetzii', 'Ilex crenata Hetzii', 'Ilex crenata Hetzii', 'Ilex crenata Hetzii', 'Ilex crenata Hetzii', 'Ilex crenata Hetzii', 'Ilex crenata Hetzii', 'Ilex crenata Hetzii',
                'Ilex crenata Hetzii', 'Ilex crenata Hetzii', 'Iberis sempervirens', 'Iberis sempervirens', 'Iberis sempervirens', 'Iberis sempervirens', 'Iberis sempervirens', 'Iberis sempervirens', 'Iberis sempervirens',
                'Iberis sempervirens', 'Iberis sempervirens', 'Lamium maculatum', 'Lamium maculatum', 'Lamium maculatum', 'Lamium maculatum', 'Lamium maculatum', 'Lamium maculatum', 'Lamium maculatum', 'Lamium maculatum',
                'Lamium maculatum', 'Lamium maculatum', 'Lamium maculatum', 'Lamium maculatum', 'Mertensia virginica', 'Mertensia virginica', 'Mertensia virginica', 'Mertensia virginica', 'Mertensia virginica', 'Mertensia virginica',
                'Mertensia virginica', 'Mertensia virginica', 'Aristolochata pseudophilus', 'Aristolochata pseudophilus', 'Aristolochata pseudophilus', 'Aristolochata pseudophilus', 'Aristolochata pseudophilus',
                'Aristolochata pseudophilus', 'Aristolochata pseudophilus', 'Aristolochata pseudophilus', 'Aristolochata pseudophilus', 'Aristolochata pseudophilus', 'Chamaecyparis duplicatus', 'Chamaecyparis duplicatus',
                'Chamaecyparis duplicatus', 'Chamaecyparis duplicatus', 'Chamaecyparis duplicatus', 'Chamaecyparis duplicatus', 'Chamaecyparis duplicatus', 'Chamaecyparis duplicatus', 'Chamaecyparis crenata Hetzii',
                'Chamaecyparis crenata Hetzii', 'Chamaecyparis crenata Hetzii', 'Chamaecyparis crenata Hetzii', 'Chamaecyparis crenata Hetzii', 'Chamaecyparis crenata Hetzii', 'Chamaecyparis crenata Hetzii', 'Chamaecyparis',
                'Chamaecyparis', 'Chamaecyparis', 'Chamaecyparis', 'Veronicastrum virginicum', 'Veronicastrum virginicum', 'Veronicastrum virginicum', 'Veronicastrum virginicum', 'Veronicastrum virginicum', 'Veronicastrum virginicum',
                'Veronicastrum virginicum', 'Veronicastrum virginicum', 'Veronicastrum virginicum', 'Veronicastrum virginicum', 'Veronicastrum virginicum', 'Veronicastrum virginicum', 'Veronicastrum virginicum',
                'Veronicastrum vulgaris', 'Veronicastrum vulgaris', 'Veronicastrum vulgaris', 'Veronicastrum vulgaris', 'Veronicastrum vulgaris', 'Veronicastrum vulgaris', 'Veronicastrum vulgaris', 'Veronicastrum vulgaris',
                'Veronicastrum vulgaris', 'Veronicastrum pulchra', 'Veronicastrum pulchra', 'Veronicastrum pulchra', 'Veronicastrum pulchra', 'Veronicastrum pulchra', 'Veronicastrum pulchra', 'Veronicastrum pulchra',
                'Veronicastrum pulchra', 'Veronicastrum pulchra', 'Veronicastrum pulchra']



#Get clusters with MeanShift
X= np.array(df.loc[:,["POINT_X","POINT_Y"]].values.tolist()) # Only using numeric values for now
bandwidth = estimate_bandwidth(X, quantile=0.0595, n_samples=15000)
ms = MeanShift(bandwidth=bandwidth, bin_seeding=True)
ms.fit(X)
labels = ms.labels_
cluster_centers = ms.cluster_centers_
labels_unique = np.unique(labels)
n_clusters_ = len(labels_unique)
print("Estimated number of clusters: %d" % n_clusters_)

#Make plot
plt.figure(1)
plt.clf()
colors = cycle('bgrcmykbgrcmykbgrcmykbgrcmyk')

for k, col in zip(range(n_clusters_), colors):
    my_members = labels == k
    cluster_center = cluster_centers[k]
    plt.plot(X[my_members, 0], X[my_members, 1], col + '.')
    plt.plot(cluster_center[0], cluster_center[1], 'o', markerfacecolor=col,
             markeredgecolor='k', markersize=14)
plt.title('Clusters found by X/Y proximity (before using categorical values): %d' % n_clusters_)
plt.show(); plt.show()

您可以将类别列划分为两个或多个列(用于属、类、种等),然后执行一个热编码。为什么不呢。当两者的属相同时,您将使用相同的热编码:-
'Aristolochia'
我明白您的意思,但我确实需要根据该列中字符串的相似性找到解决方案。在动物学中,您可以在属和种,有时甚至在科中重复相同的单词。这将聚集无关物种。但我认为如果数据没有这种怪癖,你的解决方案就会奏效。非常感谢。你能举一个你上面所说的例子吗?因为据我所知,二项式名称在一个王国中应该是唯一的。即使跨越多个王国,这些复制品也应该是罕见的。