Python 如何根据给定的字典对数据帧进行聚类?

Python 如何根据给定的字典对数据帧进行聚类?,python,pandas,dataframe,dictionary,Python,Pandas,Dataframe,Dictionary,我有一个熊猫数据框,我正试图根据下面的dict进行集群。 示例:在“Info 1”集群中,根据字典,我总共有7个值,而在熊猫数据帧中,我只有4个值。根据这一点进行聚类。我将得到低于输出 输入: PII Counts CREDIT_CARD 158 DATE_TIME 544 DOMAIN_NAME 609 EMAIL_ADDRESS 90 IP_ADDRESS 405 LOCATION

我有一个熊猫数据框,我正试图根据下面的dict进行集群。 示例:在“Info 1”集群中,根据字典,我总共有7个值,而在熊猫数据帧中,我只有4个值。根据这一点进行聚类。我将得到低于输出

输入:

PII               Counts 
CREDIT_CARD        158
DATE_TIME          544
DOMAIN_NAME        609
EMAIL_ADDRESS      90
IP_ADDRESS         405
LOCATION           346
PERSON             202
BANK_NUMBER        202
PASSPORT           130
NHS                6
NRP                20

dict = {'Info 1': ['PERSON', 'LOCATION', 'PHONE_NUMBER', 'EMAIL_ADDRESS', 'PASSPORT', 'SSN',
                              'DRIVER_LICENSE'],
            'Info 2': ['NHS'],
            'Info 3': ['IP_ADDRESS', 'DOMAIN_NAME'],
             'Info 4': ['CRYPTO', 'DATE_TIME', 'NRP'],
            'Info 5': ['CREDIT_CARD', 'BANK_NUMBER', 'ITIN', 'CODE']}
输出:

    Names            Count             Info
0   Info 5           [158, 202]        ['CREDIT_CARD','BANK_NUMBER']
1   Info 2                  [6]        ['NHS']
2   Info 3           [405, 609]        ['IP_ADDRESS','DOMAIN_NAME']
3   Info 4            [20, 544]        ['NRP','DATE_TIME']
4   Info 1  [202, 346, 90, 130]        ['PERSON','LOCATION','EMAIL_ADDRESS','PASSPORT']

首先不要使用变量
dict
,因为python代码是变量

然后用交换的键和值展平dict列表,使用by
PII
并通过聚合
list
传递到:

d = {'Info 1': ['PERSON', 'LOCATION', 'PHONE_NUMBER', 'EMAIL_ADDRESS', 'PASSPORT', 'SSN',
                              'DRIVER_LICENSE'],
            'Info 2': ['NHS'],
            'Info 3': ['IP_ADDRESS', 'DOMAIN_NAME'],
             'Info 4': ['CRYPTO', 'DATE_TIME', 'NRP'],
            'Info 5': ['CREDIT_CARD', 'BANK_NUMBER', 'ITIN', 'CODE']}


d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}

df1 = (df.groupby(df['PII'].map(d1).rename('Names'), sort=False)
         .agg(list)
         .reset_index())
print (df1)
    Names                                          PII               Counts
0  Info 5                   [CREDIT_CARD, BANK_NUMBER]           [158, 202]
1  Info 4                             [DATE_TIME, NRP]            [544, 20]
2  Info 3                    [DOMAIN_NAME, IP_ADDRESS]           [609, 405]
3  Info 1  [EMAIL_ADDRESS, LOCATION, PERSON, PASSPORT]  [90, 346, 202, 130]
4  Info 2                                        [NHS]                  [6]