Python 将字典中的值与pandas中的列进行比较以更新字典

Python 将字典中的值与pandas中的列进行比较以更新字典,python,pandas,Python,Pandas,我为分类模型创建自己的标签 labels = { 0:['replaced scanner', 'scanner has been replaced', 'replaced the scanner', 'scanner was replaced', 'replaced scanner and tested', 'i replaced the scanner', 'deployed replacement scanner', 'replaced scanner with a new one

我为分类模型创建自己的标签

labels = {
    0:['replaced scanner', 'scanner has been replaced', 'replaced the scanner', 'scanner was replaced', 'replaced scanner and tested', 'i replaced the scanner', 'deployed replacement scanner', 'replaced scanner with a new one', 'replaced scanner with', 'replaced damaged scanner', 'replaced scanner and synced to station', 'replaced scanner with asset', 'scanner replaced', 'replaced missing scanner', 'replaced scanner at station', 'replaced broken scanner', 'replaced defective scanner', 'replaced scanner for this station we are glad that we were able to assist you today ill go ahead and mark this ticket as resolved if this issue requires further attention from please let me know have a great day', 'replaced defective scanner with new'],
    1:['station has been rebooted', 'rebooted station', 'performed a remote reboot of the ar station', 'rebooted station remotely', 'remotely reset station', 'restarted station', 'station has been rebooted resolving ticket', 'reboot station verified station is fully operational', 'station rebooted verified that the station is back online', 'rebooted station operational', 'remotely rebooted station', 'reboot of station', 'station has been remotely rebooted', 'station rebooted', 'rebooted the station', 'station was rebooted', 'remotely rebooted the station', 'station was successfully rebooted resolving ticket', 'station has been rebooted verified that the station is back online', 'rebooted the station and all is well', 'station rebooted remotely', 'ar station remotely rebooted', 'rebooted station  issue resolved', 'station rebooted and verified up', 'after rebooting station works good', 'rebooted station issue resolved', 'station rebooted successfully no further issues reported', 'station remotely rebooted', 'successfully rebooted the station as requested confirmed the aa was able to log back into the station and get to work', 'rebooted station and tested', 'reboot station', 'sshd into station and reboot it then used sasd to verify that station came back up successfully', 'station rebooted verified station functionality issue resolved'],
    2:['password reset', 'password has been reset', 'reset password', 'reset users password', 'password rotated', 'peap password rotated', 'the password has been changed', 'reset password for associate', 'i assisted the user with a password reset', 'reset password resolving this tt', 'assisted with password reset at it hub kiosk', 'password successfully reset', 'password rotated on the peap portal', 'password rotation is completed', 'password resetunlock was performed after validating user identity  on resolution please refer over to the correspondence tab', 'successfully reset password', 'password reset successfully', 'reset user password', 'verified user assisted requester with password reset', 'password was reset', 'password changed', 'assisted with password reset', 'reset password for user', 'password reseted', 'password reset done', 'reset the pwd using the password tool hence resolving this tt', 'the password was updated', 'helped aa to reset their password via password tool with admin rights', 'assisted associate with password reset', 'sopno password has been reset', 'reset password for aa', 'password has been reset successfully', 'performed an inperson password reset via password tool', 'password reset for user'],
    3:['replaced printer', 'replaced the printer', 'printer replaced', 'printer has been replaced', 'printer was replaced', 'replaced printer and tested'],
    4:['rebooted thin client', 'tc has been remotely rebooted', 'rebooted the thin client', 'rebooted tc', 'rebooted thinclient', 'tc have been rebooted functionality has been reestablished'],
    5:['printer reconfigured', 'reconfigured printer with zebra tool successfully for ib destinations', 'reconfigured printer', 'printer configured', 'recalibrated printer', 'reconfigured the printer', 'recalibrated the printer', 'calibrated printer', 'configured printertested its working now', 'printer reconfigured its working now', 'printer has been reconfigured', 'configured and tested printer', 'printer has been configured', 'configured printer', 'pushed correct configuration to printer verified everything works resolving'],
    6:['laptop returned to it', 'laptop has been returned', 'laptop returned', 'the laptop has been returned', 'loaner laptop has been returned', 'loaned the user a laptop and made sure it returned to it', 'laptop returned closing', 'laptop replaced', 'laptop was returned', 'loaner laptop received from user', 'laptop returned closing the ticket'],
    7:['replaced keyboard', 'keyboard replaced', 'keyboard has been replaced', 'replaced the keyboard', 'keyboard was replaced', 'replaced defective keyboard'],
    8:['replaced scanner cable', 'scanner cable replaced', 'replaced the scanner cable', 'scanner cable has been replaced'],
    9:['replaced thin client', 'thin client replaced', 'replaced the thin client', 'thinclient replaced'],
    10:['scanner reconfigured', 'reconfigured scanner', 'scanner was reconfigured', 'reconfigured the scanner', 'scanner configured'],
    11:['replaced monitor', 'monitor replaced'],
    12:['reinstalled printer and drivers'],
    13:['replaced mouse', 'mouse replaced', 'mouse has been replaced', 'mouse was replaced', 'replaced the mouse', 'replaced defective mouse'],
    14:['stopstart spooler reconfigured printer up and running'],
    15:['restarted thin client', 'thin client rebooted', 'restarted the thin client', 'reboot thin client', 'rebooted the thinclient', 'performed hard reboot of thin client', 'thinclient rebooted'],
    16:['deployed scanner to station', 'deployed scanner', 'scanner deployed', 'deployed new scanner', 'deployed a scanner', 'scanners deployed', 'deployed new scanner to station'],
    17:['pslip cable was unplugged reconnected pslip cable then ran test print to verify that issue is resolved'],
    18:['cable replaced', 'replaced cable', 'cable has been replaced', 'replaced the cable'],
    19:['replaced battery', 'replaced the battery', 'battery replaced'],
    20:['unlocked account'],
    21:['reimaged laptop', 'reimaged the laptop', 'laptop reimaged'],
    22:['rollbacked mcm root cause should be found in master tt'],
    23:['reassigned ports and tested', 'issue resolved printer port reassigned', 'printer port was reassigned', 'reassigned printer ports verified slim', 'reassigned printer port'],
    24:['replaced laptop'],
    25:['resynced scanner to base'],
    26:['camera removed from proxemics', 'removed cameras', 'cameras removed as requested', 'cameras removed from proxemics', 'camera removed as requested', 'camera has been removed from proxemics', 'cameras have been removed from proxemics'],
    27:['the account has been unlocked'],
    28:['replaced screen','screen replaced'],
    29:['confirmed images are uploading correctly to'],
    30:['replaced laptop screen'],
    31:['replaced print head'],
    32:['monitor has been replaced','replaced the monitor'],
    33:['rebooted server per cm resolving to see if any alerts refire'],
    34:['replaced usb cable','usb cable replaced'],
    35:['reconnected usb cable','usb cable was disconnected connected back tested working ok now resolving'],
    36:['replaced power cable'],
    37:['wifi card replaced'],
    38:['reassigned printer ports','printer port reassigned issue resolved','reassigned printer ports slim updated'],
    39:['advised to reach out to global it'],
    40:['replaced pslip printer'],
    41:['moved cameras to server'],
    42:['restarted print spooler'],
    43:['replaced hand scanner'],
    44:['resynced scanner to station'],
    45:['upgraded ios version on switch verified all connections to uplinks are restored'],
    46:['printer installed', 'printer deployed'],
    47:['xterm replaced']
}
我构建了一个函数,将每个键的每个值与我的数据帧进行比较

from fuzzywuzzy import fuzz
def cluster_resolution(df, cluster, cluster_no):
    for res_string in df['resolution'].unique():
        a = []
        for val in cluster[cluster_no]:
            if fuzz.partial_ratio(res_string, val) >= 90:
                a.append(res_string)
        cluster[cluster_no].extend(a)
    return cluster
d = {'resolution' : ['replaced scanner boy', 'replaced the scanner for the user with a properly working one from the cage replaced the wire on the damaged one and stored it for later use', 'tc reimage', 'updated pc', 'deploying replacement scanner', 'upgraded and rebooted station', 'printer has been reconfigured', 'cleared linux print queue and now it is working','user reset her password successfully closing tt','have reset the printer to get it to print again','i plugged usb cable into port and scanner works','reconfigured hand scanner and linked to station','replaced the scanner with station is functional','laptops battery needed to be reset asset serial','reconfigured scanner confirmed that it scans as intended','reimaging laptop corrected the anyconnect software issue','printer was unplugged from usb port working properly now','reconnected usb cable and reassign printer ports on port','reconfigured scanner to base and tested with aa all fine','replaced the defective device with a fresh imaged laptop','reconfigured the printer and the media to print properly','tested printer at station connected and working resolved','red scanner reconfigured and base rebooted via usb joint','station scanner was synced to base and station and is now working','printer offlineswitched usb portprinter is now online and working','replaced the barcode label with one reflecting the tcs ip address','restarted the thin client by using ssh to run the restart command','printer reconfigured and test they are functioning normally again','removed old printer for service installed replacement tested good','tc required reboot rebooted tc had aa signin dp is now functional','resetting the printer to factory settings and then reconfigure it','updated windows os forced update and the laptop operated normally','printer settings are set correct and printer is working correctly','power to printer was disconnected reconnected and is working fine','power cycled equipment and restocked spooler with plastic bubbles','laptop checked ive logged into paskiplacowepl without any problem','reseated scanner cables connection into usb port to resolve issue','the scanner has been replaced and the station is working well now']}

df_sample = pd.DataFrame(data=d)
我在这个示例数据帧上测试了我的函数

from fuzzywuzzy import fuzz
def cluster_resolution(df, cluster, cluster_no):
    for res_string in df['resolution'].unique():
        a = []
        for val in cluster[cluster_no]:
            if fuzz.partial_ratio(res_string, val) >= 90:
                a.append(res_string)
        cluster[cluster_no].extend(a)
    return cluster
d = {'resolution' : ['replaced scanner boy', 'replaced the scanner for the user with a properly working one from the cage replaced the wire on the damaged one and stored it for later use', 'tc reimage', 'updated pc', 'deploying replacement scanner', 'upgraded and rebooted station', 'printer has been reconfigured', 'cleared linux print queue and now it is working','user reset her password successfully closing tt','have reset the printer to get it to print again','i plugged usb cable into port and scanner works','reconfigured hand scanner and linked to station','replaced the scanner with station is functional','laptops battery needed to be reset asset serial','reconfigured scanner confirmed that it scans as intended','reimaging laptop corrected the anyconnect software issue','printer was unplugged from usb port working properly now','reconnected usb cable and reassign printer ports on port','reconfigured scanner to base and tested with aa all fine','replaced the defective device with a fresh imaged laptop','reconfigured the printer and the media to print properly','tested printer at station connected and working resolved','red scanner reconfigured and base rebooted via usb joint','station scanner was synced to base and station and is now working','printer offlineswitched usb portprinter is now online and working','replaced the barcode label with one reflecting the tcs ip address','restarted the thin client by using ssh to run the restart command','printer reconfigured and test they are functioning normally again','removed old printer for service installed replacement tested good','tc required reboot rebooted tc had aa signin dp is now functional','resetting the printer to factory settings and then reconfigure it','updated windows os forced update and the laptop operated normally','printer settings are set correct and printer is working correctly','power to printer was disconnected reconnected and is working fine','power cycled equipment and restocked spooler with plastic bubbles','laptop checked ive logged into paskiplacowepl without any problem','reseated scanner cables connection into usb port to resolve issue','the scanner has been replaced and the station is working well now']}

df_sample = pd.DataFrame(data=d)
使用此代码

for key, value in labels.items():
    labels = cluster_resolution(df_sample, labels, key)
当我查看结果时,我注意到有重复的字符串,它们应该只显示一次

{0: ['replaced scanner',
  'scanner has been replaced',
  'replaced the scanner',
  'scanner was replaced',
  'replaced scanner and tested',
  'i replaced the scanner',
  'deployed replacement scanner',
  'replaced scanner with a new one',
  'replaced scanner with',
  'replaced damaged scanner',
  'replaced scanner and synced to station',
  'replaced scanner with asset',
  'scanner replaced',
  'replaced missing scanner',
  'replaced scanner at station',
  'replaced broken scanner',
  'replaced defective scanner',
  'replaced scanner for this station we are glad that we were able to assist you today ill go ahead and mark this ticket as resolved if this issue requires further attention from please let me know have a great day',
  'replaced defective scanner with new',
  'replaced scanner',
  'replaced scanner',
  'replaced scanner',
  'replaced scanner',
  'replaced scanner',
  'replaced scanner',
  'replaced scanner',
  'replaced scanner',
  'replaced the scanner for the user with a properly working one from the cage replaced the wire on the damaged one and stored it for later use',
  'replaced the scanner with station is functional',
  'replaced the scanner with station is functional',
  'the scanner has been replaced and the station is working well now',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced the scanner for the user with a properly working one from the cage replaced the wire on the damaged one and stored it for later use',
  'replaced the scanner for the user with a properly working one from the cage replaced the wire on the damaged one and stored it for later use',
  'replaced the scanner with station is functional',
  'replaced the scanner with station is functional',
  'replaced the scanner with station is functional',
  'replaced the scanner with station is functional',
  'the scanner has been replaced and the station is working well now',
  'the scanner has been replaced and the station is working well now',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced scanner boy',
  'replaced the scanner for the user with a properly working one from the cage replaced the wire on the damaged one and stored it for later use',
  'replaced the scanner for the user with a properly working one from the cage replaced the wire on the damaged one and stored it for later use',
  'replaced the scanner for the user with a properly working one from the cage replaced the wire on the damaged one and stored it for later use',
  'replaced the scanner for the user with a properly working one from the cage replaced the wire on the damaged one and stored it for later use',
  'replaced the scanner with station is functional',
  'replaced the scanner with station is functional',
  'replaced the scanner with station is functional',
  'replaced the scanner with station is functional',
  'replaced the scanner with station is functional',
  'replaced the scanner with station is functional',
  'replaced the scanner with station is functional',
  'replaced the scanner with station is functional',
  'the scanner has been replaced and the station is working well now',
  'the scanner has been replaced and the station is working well now',
  'the scanner has been replaced and the station is working well now',
  'the scanner has been replaced and the station is working well now'],

如何确保是否存在只添加一次的匹配项?

最简单的解决方法是对
a
变量使用
set
而不是
list
,或者在添加到
a
之前添加附加条件

使用集

def群集分辨率(df、群集、群集编号):
对于df['resolution']中的res_字符串。unique():
a=设置()#此处
对于群集中的val[群集编号]:
如果fuzz.partial_ratio(res_string,val)>=90:
a、 添加(res_string)#和此处
群集[群集编号].扩展(a)
返回簇
使用列表

def群集分辨率(df、群集、群集编号):
对于df['resolution']中的res_字符串。unique():
a=[]
对于群集中的val[群集编号]:
#这里
如果fuzz.partial_ratio(res_string,val)>=90且res_string不在a中:
a、 追加(res_字符串)
群集[群集编号].扩展(a)
返回簇

快速提问,如果我编写的函数需要大约600毫秒才能完成38行数据。如果我的数据集稍微超过一百万,这是否意味着可能需要一周的时间才能完成?如果是这样的话,有没有可能加快速度?至少在某种程度上,你可以通过使用RapidFuzz而不是FuzzyFuzzy来加快速度(我是作者)。