Python 在使用字典时,如果给定此列表,您将如何查找/跟踪重复的GUID?
我经常使用WIXXML文件,WiX中的几乎每个对象都需要GUID。为了避免复制粘贴错误,我已经着手对所有重复的guid进行排序和显示,并给出如下列表(使用Python 在使用字典时,如果给定此列表,您将如何查找/跟踪重复的GUID?,python,dictionary,guid,Python,Dictionary,Guid,我经常使用WIXXML文件,WiX中的几乎每个对象都需要GUID。为了避免复制粘贴错误,我已经着手对所有重复的guid进行排序和显示,并给出如下列表(使用find和egrep创建): 以这样的格式: 3 E289D834-4421-4DCE-B0A8-94C09978058A 2 ./A2.Spam.TrojanBunnies/Files/File1.wxs 1 ./A2.Spam.TrojanBunnies/Files/File2.wxs 2 083863F
find
和egrep
创建):
以这样的格式:
3 E289D834-4421-4DCE-B0A8-94C09978058A
2 ./A2.Spam.TrojanBunnies/Files/File1.wxs
1 ./A2.Spam.TrojanBunnies/Files/File2.wxs
2 083863F1-70DE-11D0-BD40-00A0C911CE86
2 ./A2.Spam.TrojanBunnies/Files/Files.wxs
GUID的总出现次数在GUID旁边计算,然后在每个文件中计算该GUID的出现次数
我提出了以下脚本(生成上述输出)。我对Python还是个新手,正在努力理解字典及其实际用途。使用嵌套字典是正确的方法吗?我选择字典是因为我认为这是添加/跟踪唯一条目的最简单方法。尽管如此,使用诸如parent_dict['child_dict_key']['value_key']
之类的语法感觉有点奇怪,比如我可以使用items()
或其他可使用的方法/技巧:
#!/usr/bin/env python
guids = {}
f_and_g = open( 'files-and-guids.txt', 'r')
for fg in f_and_g.readlines():
fname, guid = map( str.strip, fg.split(':') )
if guid not in guids:
guids[guid] = { 'count': 1, 'files': {} }
else:
guids[guid]['count'] += 1
## Count how many times a GUID was used in a given file
if fname not in guids[guid]['files']:
guids[guid]['files'][fname] = 1
else:
guids[guid]['files'][fname] += 1
## Sort by total count for a given GUID
for guid in sorted( guids, key=lambda x:guids[x]['count'], reverse=True):
## Skip printing if count is below threshold
if guids[guid]['count'] < 2:
continue
guid_dict = guids[guid]
print '{:>3} {}'.format( guid_dict['count'], guid )
## Sort by filename counts
for fname in sorted( guid_dict['files'],
key=lambda x: guid_dict['files'][x], reverse=True ):
fname_cnt = guid_dict['files'][fname]
print '{:>8} {}'.format( fname_cnt, fname)
#/usr/bin/env python
guids={}
f_和_g=open('files and guids.txt','r')
对于f_和g.readlines()中的fg:
fname,guid=map(str.strip,fg.split(':'))
如果guid不在guid中:
guid[guid]={‘计数’:1,‘文件’:{}
其他:
guid[guid]['count']+=1
##计算给定文件中使用GUID的次数
如果fname不在guids[guid]['files']中:
guid[guid]['files'][fname]=1
其他:
guid[guid]['files'][fname]+=1
##按给定GUID的总计数排序
对于排序后的guid(guid,key=lambda x:guids[x]['count'],reverse=True):
##如果计数低于阈值,则跳过打印
如果guid[guid]['count']<2:
持续
guid\u dict=guids[guid]
打印“{:>3}{}”。格式(guid_dict['count'],guid)
##按文件名计数排序
对于排序后的fname(guid_dict['files'],
key=lambda x:guid_dict['files'][x],reverse=True):
fname\u cnt=guid\u dict['files'][fname]
打印“{:>8}{}”。格式(fname\u cnt,fname)
我会这样做,尽管我还没有实际测试过这段代码:
#!/usr/bin/env python
import collections
import operator
guids = collections.defaultdict(collections.Counter)
f_and_g = open('files-and-guids.txt', 'r')
for fg in f_and_g:
fname, guid = map(str.strip, fg.split(':'))
guids[guid][fname] += 1
## Sort by total count for a given GUID
guids_counts_totals = [(guids, counts, sum(counts.itervalues()))
for guids, counts
in guids.iteritems()]
guids_counts_totals_sorted = sorted(guids_counts_totals,
key=operator.itemgetter(2),
reverse=True)
for guid, counts, total in guids_counts_totals_sorted:
## Skip printing if count is below threshold
if total < 2:
continue
print '{:>3} {}'.format(total, guid)
## Sorting by filename counts
fnames_counts_sorted = sorted(counts.iteritems(),
key=operator.itemgetter(1), reverse=True)
for fname, count in fnames_counts_sorted:
print '{:>8} {}'.format(count, fname)
#/usr/bin/env python
导入集合
进口经营者
guids=collections.defaultdict(collections.Counter)
f_和_g=open('files-and-guids.txt','r')
对于f_和g中的fg:
fname,guid=map(str.strip,fg.split(':'))
guid[guid][fname]+=1
##按给定GUID的总计数排序
guids\u counts\u totals=[(guids,counts,sum(counts.itervalues()))
对于guid,计算
在guids.iteritems()中
guids\u counts\u totals\u sorted=sorted(guids\u counts\u totals,
键=运算符.itemgetter(2),
反向=真)
对于guid、计数、guid中的总计\u计数\u总计\u排序:
##如果计数低于阈值,则跳过打印
如果总数小于2:
持续
打印“{:>3}{}”。格式(总计,guid)
##按文件名计数排序
fnames\u counts\u sorted=已排序(counts.iteritems(),
key=operator.itemgetter(1),reverse=True)
对于fname,fname中的计数\u计数\u排序:
打印“{:>8}{}”。格式(计数,fname)
这里有一些变化:
- 使用
和collections.defaultdict
,而不是反复检查是否有键,如果没有键则将其设置为1collections.Counter
- 不通过存储每个GUID和每个文件名的计数来复制数据。您可以将GUID的每个文件名的所有计数相加
- 排序和迭代
,而不只是使用键然后查找它们的值dict.itervalues()
- 使用
代替operator.itemgetter()
表达式lambda
- 间距根据
#!/usr/bin/env python
import collections
import operator
guids = collections.defaultdict(collections.Counter)
f_and_g = open('files-and-guids.txt', 'r')
for fg in f_and_g:
fname, guid = map(str.strip, fg.split(':'))
guids[guid][fname] += 1
## Sort by total count for a given GUID
guids_counts_totals = [(guids, counts, sum(counts.itervalues()))
for guids, counts
in guids.iteritems()]
guids_counts_totals_sorted = sorted(guids_counts_totals,
key=operator.itemgetter(2),
reverse=True)
for guid, counts, total in guids_counts_totals_sorted:
## Skip printing if count is below threshold
if total < 2:
continue
print '{:>3} {}'.format(total, guid)
## Sorting by filename counts
fnames_counts_sorted = sorted(counts.iteritems(),
key=operator.itemgetter(1), reverse=True)
for fname, count in fnames_counts_sorted:
print '{:>8} {}'.format(count, fname)
#/usr/bin/env python
导入集合
进口经营者
guids=collections.defaultdict(collections.Counter)
f_和_g=open('files-and-guids.txt','r')
对于f_和g中的fg:
fname,guid=map(str.strip,fg.split(':'))
guid[guid][fname]+=1
##按给定GUID的总计数排序
guids\u counts\u totals=[(guids,counts,sum(counts.itervalues()))
对于guid,计算
在guids.iteritems()中
guids\u counts\u totals\u sorted=sorted(guids\u counts\u totals,
键=运算符.itemgetter(2),
反向=真)
对于guid、计数、guid中的总计\u计数\u总计\u排序:
##如果计数低于阈值,则跳过打印
如果总数小于2:
持续
打印“{:>3}{}”。格式(总计,guid)
##按文件名计数排序
fnames\u counts\u sorted=已排序(counts.iteritems(),
key=operator.itemgetter(1),reverse=True)
对于fname,fname中的计数\u计数\u排序:
打印“{:>8}{}”。格式(计数,fname)
这里有一些变化:
- 使用
和collections.defaultdict
,而不是反复检查是否有键,如果没有键则将其设置为1collections.Counter
- 不通过存储每个GUID和每个文件名的计数来复制数据。您可以将GUID的每个文件名的所有计数相加
- 排序和迭代
,而不只是使用键然后查找它们的值dict.itervalues()
- 使用
代替operator.itemgetter()
表达式lambda
- 间距根据
#!/usr/bin/env python
import fileinput
from collections import defaultdict, Counter
# count guids
perfile = defaultdict(Counter)
total = Counter()
for line in fileinput.input():
fname, guid = map(str.strip, line.split(':'))
perfile[guid][fname] += 1
total[guid] += 1
# print most common guid first
for guid, count in total.most_common():
if count < 2: continue # skip printing if count is below threshold
print '{:>3} {}'.format(count, guid)
# sorting by filename counts
for fname, fname_cnt in perfile[guid].most_common():
print '{:>8} {}'.format(fname_cnt, fname)
如果脚本清晰且适合您,请不要过度思考。还有另一种变体:
#!/usr/bin/env python
import fileinput
from collections import defaultdict, Counter
# count guids
perfile = defaultdict(Counter)
total = Counter()
for line in fileinput.input():
fname, guid = map(str.strip, line.split(':'))
perfile[guid][fname] += 1
total[guid] += 1
# print most common guid first
for guid, count in total.most_common():
if count < 2: continue # skip printing if count is below threshold
print '{:>3} {}'.format(count, guid)
# sorting by filename counts
for fname, fname_cnt in perfile[guid].most_common():
print '{:>8} {}'.format(fname_cnt, fname)
如果剧本清晰且适合你,不要想得太多。基于我再次尝试的一些答案,为了让我的生活更加困难,我避开了任何其他LIB:
def MyCounter(l):
d = dict()
for i in l:
if i not in d:
d[i] = 1
else:
d[i] += 1
return d
def main():
guids = dict()
f_and_g = open('files-and-guids.txt', 'r')
for fg in f_and_g.readlines():
fname, guid = map(str.strip, fg.split(':'))
if guid not in guids:
guids[guid] = [fname]
else:
guids[guid] += [fname]
## Sort by total count for a given GUID
for guid in sorted(guids, key=lambda guid: len(guids[guid]), reverse=True):
## Skip printing if count is below threshold
if len(guids[guid]) < 2: continue
guid_list = guids[guid]
print '{:>3} {}'.format( len(guid_list), guid )
## Sort by filename counts
counts = MyCounter(guid_list)
for fname, fname_cnt in sorted(counts.iteritems(), key=lambda x:x[1],
reverse=True):
print '{:>8} {}'.format(fname_cnt, fname)
def MyCounter(l):
d=dict()
对于l中的i: