Python 在多个集合中查找子集的频率

Python 在多个集合中查找子集的频率,python,set,subset,Python,Set,Subset,我的技能清单如下: skills = ['Listening', 'Written_Expression','Clerical', 'Night_Vision', 'Accounting'] skill_pairs = [{'Listening', 'Written_Expression', 2}, {'Listening', 'Clerical', 2}, . .

我的技能清单如下:

skills = ['Listening', 'Written_Expression','Clerical',
         'Night_Vision', 'Accounting']
skill_pairs = [{'Listening', 'Written_Expression', 2},
              {'Listening', 'Clerical', 2},
              .
              .
              {'Night_Vision', 'Accounting', 0}]
我有一个单独的集合列表,每个集合都包含与特定工作相关的技能:

job_skills =  
     [{'Listening','Written_Expression','Clerical','Night_Vision'},
     {'Chemistry','Written_Expression','Clerical','Listening'},
     .
     .
     ]
我想计算两种独特技能的每种组合作为job_skills集合子集的频率,并返回列表/集合列表,其组合和频率如下:

skills = ['Listening', 'Written_Expression','Clerical',
         'Night_Vision', 'Accounting']
skill_pairs = [{'Listening', 'Written_Expression', 2},
              {'Listening', 'Clerical', 2},
              .
              .
              {'Night_Vision', 'Accounting', 0}]
目前,我正在做以下工作:

skill_combos = []
for idx, i in enumerate(skills):
    for jdx, j in enumerate(skills[idx+1:]):
        temp = []
        for job in range(len(job_skills)):
            temp.append(set([i,j]).issubset(job_skills[job])
        skill_combos.append([i,j,sum(temp)])
这项工作完成了,但速度很慢,因为我有大约50万个技能组合。有没有更快的方法?理想情况下不使用3个循环


谢谢

您只需计算存在的组合,其余为零,例如:

from collections import Counter
from itertools import combinations

job_skills = [{'Listening', 'Written_Expression', 'Clerical', 'Night_Vision'},
              {'Chemistry', 'Written_Expression', 'Clerical', 'Listening'}]


counts = Counter(combo for skill_set in job_skills for combo in combinations(skill_set, 2))

for key, value in counts.items():
    print(key, value)
输出

('Clerical', 'Written_Expression') 2
('Clerical', 'Listening') 2
('Clerical', 'Night_Vision') 1
('Written_Expression', 'Listening') 2
('Written_Expression', 'Night_Vision') 1
('Listening', 'Night_Vision') 1
('Clerical', 'Chemistry') 1
('Written_Expression', 'Chemistry') 1
('Listening', 'Chemistry') 1
0
见和。如果希望字典对缺少的字典返回0,请使用以下符号换行
counts

输出

('Clerical', 'Written_Expression') 2
('Clerical', 'Listening') 2
('Clerical', 'Night_Vision') 1
('Written_Expression', 'Listening') 2
('Written_Expression', 'Night_Vision') 1
('Listening', 'Night_Vision') 1
('Clerical', 'Chemistry') 1
('Written_Expression', 'Chemistry') 1
('Listening', 'Chemistry') 1
0

不确定它是否更快,但您可以使用带计数器的组合。我的解决方案只计算一次组合。然后它使用
issubset
符号

from itertools import combinations
from collections import Counter

skills = ['Listening', 'Written_Expression','Clerical',
         'Night_Vision', 'Accounting']
job_skills = [{'Listening','Written_Expression','Clerical','Night_Vision'}, {'Chemistry','Written_Expression','Clerical','Listening'}]

pairs = {frozenset(x) for x in combinations(skills, 2)}
c = Counter(pair for pair in pairs for job in job_skills if pair.issubset(job))

for pair in pairs: # Adding the pairs that had no matches
    if pair not in c:
        c[pair] = 0

for key, count in c.items():
    print(key, count)
输出:

frozenset({'Written_Expression', 'Clerical'}) 2
frozenset({'Listening', 'Clerical'}) 2
frozenset({'Written_Expression', 'Listening'}) 2
frozenset({'Written_Expression', 'Night_Vision'}) 1
frozenset({'Listening', 'Night_Vision'}) 1
frozenset({'Clerical', 'Night_Vision'}) 1
frozenset({'Written_Expression', 'Accounting'}) 0
frozenset({'Clerical', 'Accounting'}) 0
frozenset({'Listening', 'Accounting'}) 0
frozenset({'Night_Vision', 'Accounting'}) 0

这可能是codereview.stackexchange.com的一个很好的候选者