Python 在使用字典时，如果给定此列表，您将如何查找/跟踪重复的GUID？_Python_Dictionary_Guid

Python 在使用字典时，如果给定此列表，您将如何查找/跟踪重复的GUID？

python dictionary

Python 在使用字典时，如果给定此列表，您将如何查找/跟踪重复的GUID？,python,dictionary,guid,Python,Dictionary,Guid,我经常使用WIXXML文件，WiX中的几乎每个对象都需要GUID。为了避免复制粘贴错误，我已经着手对所有重复的guid进行排序和显示，并给出如下列表（使用find和egrep创建）：以这样的格式： 3 E289D834-4421-4DCE-B0A8-94C09978058A 2 ./A2.Spam.TrojanBunnies/Files/File1.wxs 1 ./A2.Spam.TrojanBunnies/Files/File2.wxs 2 083863F

我经常使用WIXXML文件，WiX中的几乎每个对象都需要GUID。为了避免复制粘贴错误，我已经着手对所有重复的guid进行排序和显示，并给出如下列表（使用

find

和

egrep

创建）：

以这样的格式：

  3 E289D834-4421-4DCE-B0A8-94C09978058A
       2 ./A2.Spam.TrojanBunnies/Files/File1.wxs
       1 ./A2.Spam.TrojanBunnies/Files/File2.wxs
  2 083863F1-70DE-11D0-BD40-00A0C911CE86
       2 ./A2.Spam.TrojanBunnies/Files/Files.wxs

GUID的总出现次数在GUID旁边计算，然后在每个文件中计算该GUID的出现次数

我提出了以下脚本（生成上述输出）。我对Python还是个新手，正在努力理解字典及其实际用途。使用嵌套字典是正确的方法吗？我选择字典是因为我认为这是添加/跟踪唯一条目的最简单方法。尽管如此，使用诸如

parent_dict['child_dict_key']['value_key']

之类的语法感觉有点奇怪，比如我可以使用

items（）

或其他可使用的方法/技巧：

#!/usr/bin/env python

guids = {}
f_and_g = open( 'files-and-guids.txt', 'r')

for fg in f_and_g.readlines():
    fname, guid = map( str.strip, fg.split(':') )

    if guid not in guids:
        guids[guid] = { 'count': 1, 'files': {} }
    else:
        guids[guid]['count'] += 1

    ## Count how many times a GUID was used in a given file
    if fname not in guids[guid]['files']:
        guids[guid]['files'][fname]  = 1
    else:
        guids[guid]['files'][fname] += 1

## Sort by total count for a given GUID
for guid in sorted( guids, key=lambda x:guids[x]['count'], reverse=True):
    ## Skip printing if count is below threshold
    if guids[guid]['count'] < 2:
        continue
    guid_dict = guids[guid]
    print '{:>3} {}'.format( guid_dict['count'], guid )
    ## Sort by filename counts
    for fname in sorted( guid_dict['files'],
                         key=lambda x: guid_dict['files'][x], reverse=True ):
        fname_cnt = guid_dict['files'][fname]
        print '{:>8} {}'.format( fname_cnt, fname)

#/usr/bin/env python
guids={}
f_和_g=open（'files and guids.txt'，'r'）
对于f_和g.readlines（）中的fg：
fname，guid=map（str.strip，fg.split（'：'））
如果guid不在guid中：
guid[guid]={‘计数’：1，‘文件’：{}
其他：
guid[guid]['count']+=1
##计算给定文件中使用GUID的次数
如果fname不在guids[guid]['files']中：
guid[guid]['files'][fname]=1
其他：
guid[guid]['files'][fname]+=1
##按给定GUID的总计数排序
对于排序后的guid（guid，key=lambda x:guids[x]['count']，reverse=True）：
##如果计数低于阈值，则跳过打印
如果guid[guid]['count']<2：
持续
guid\u dict=guids[guid]
打印“{:>3}{}”。格式（guid_dict['count']，guid）
##按文件名计数排序
对于排序后的fname（guid_dict['files']，
key=lambda x:guid_dict['files'][x]，reverse=True）：
fname\u cnt=guid\u dict['files'][fname]
打印“{:>8}{}”。格式（fname\u cnt，fname）

我会这样做，尽管我还没有实际测试过这段代码：

#!/usr/bin/env python

import collections
import operator

guids = collections.defaultdict(collections.Counter)
f_and_g = open('files-and-guids.txt', 'r')

for fg in f_and_g:
    fname, guid = map(str.strip, fg.split(':'))

    guids[guid][fname] += 1

## Sort by total count for a given GUID

guids_counts_totals = [(guids, counts, sum(counts.itervalues()))
                       for guids, counts
                       in guids.iteritems()]

guids_counts_totals_sorted = sorted(guids_counts_totals,
                                    key=operator.itemgetter(2),
                                    reverse=True)

for guid, counts, total in guids_counts_totals_sorted:
    ## Skip printing if count is below threshold
    if total < 2:
        continue

    print '{:>3} {}'.format(total, guid)

    ## Sorting by filename counts
    fnames_counts_sorted = sorted(counts.iteritems(),
                                  key=operator.itemgetter(1), reverse=True)
    for fname, count in fnames_counts_sorted:
        print '{:>8} {}'.format(count, fname)

#/usr/bin/env python
导入集合
进口经营者
guids=collections.defaultdict（collections.Counter）
f_和_g=open（'files-and-guids.txt'，'r'）
对于f_和g中的fg：
fname，guid=map（str.strip，fg.split（'：'））
guid[guid][fname]+=1
##按给定GUID的总计数排序
guids\u counts\u totals=[（guids，counts，sum（counts.itervalues（）））
对于guid，计算
在guids.iteritems（）中
guids\u counts\u totals\u sorted=sorted（guids\u counts\u totals，
键=运算符.itemgetter（2），
反向=真）
对于guid、计数、guid中的总计\u计数\u总计\u排序：
##如果计数低于阈值，则跳过打印
如果总数小于2：
持续
打印“{:>3}{}”。格式（总计，guid）
##按文件名计数排序
fnames\u counts\u sorted=已排序（counts.iteritems（），
key=operator.itemgetter（1），reverse=True）
对于fname，fname中的计数\u计数\u排序：
打印“{:>8}{}”。格式（计数，fname）

这里有一些变化：

使用
```
collections.defaultdict
```
和
```
collections.Counter
```
，而不是反复检查是否有键，如果没有键则将其设置为1
不通过存储每个GUID和每个文件名的计数来复制数据。您可以将GUID的每个文件名的所有计数相加
排序和迭代
```
dict.itervalues（）
```
，而不只是使用键然后查找它们的值
使用
```
operator.itemgetter（）
```
代替
```
lambda
```
表达式
间距根据

我会这样做，尽管我还没有实际测试过这段代码：

#!/usr/bin/env python

import collections
import operator

guids = collections.defaultdict(collections.Counter)
f_and_g = open('files-and-guids.txt', 'r')

for fg in f_and_g:
    fname, guid = map(str.strip, fg.split(':'))

    guids[guid][fname] += 1

## Sort by total count for a given GUID

guids_counts_totals = [(guids, counts, sum(counts.itervalues()))
                       for guids, counts
                       in guids.iteritems()]

guids_counts_totals_sorted = sorted(guids_counts_totals,
                                    key=operator.itemgetter(2),
                                    reverse=True)

for guid, counts, total in guids_counts_totals_sorted:
    ## Skip printing if count is below threshold
    if total < 2:
        continue

    print '{:>3} {}'.format(total, guid)

    ## Sorting by filename counts
    fnames_counts_sorted = sorted(counts.iteritems(),
                                  key=operator.itemgetter(1), reverse=True)
    for fname, count in fnames_counts_sorted:
        print '{:>8} {}'.format(count, fname)

#/usr/bin/env python
导入集合
进口经营者
guids=collections.defaultdict（collections.Counter）
f_和_g=open（'files-and-guids.txt'，'r'）
对于f_和g中的fg：
fname，guid=map（str.strip，fg.split（'：'））
guid[guid][fname]+=1
##按给定GUID的总计数排序
guids\u counts\u totals=[（guids，counts，sum（counts.itervalues（）））
对于guid，计算
在guids.iteritems（）中
guids\u counts\u totals\u sorted=sorted（guids\u counts\u totals，
键=运算符.itemgetter（2），
反向=真）
对于guid、计数、guid中的总计\u计数\u总计\u排序：
##如果计数低于阈值，则跳过打印
如果总数小于2：
持续
打印“{:>3}{}”。格式（总计，guid）
##按文件名计数排序
fnames\u counts\u sorted=已排序（counts.iteritems（），
key=operator.itemgetter（1），reverse=True）
对于fname，fname中的计数\u计数\u排序：
打印“{:>8}{}”。格式（计数，fname）

这里有一些变化：

使用
```
collections.defaultdict
```
和
```
collections.Counter
```
，而不是反复检查是否有键，如果没有键则将其设置为1
不通过存储每个GUID和每个文件名的计数来复制数据。您可以将GUID的每个文件名的所有计数相加
排序和迭代
```
dict.itervalues（）
```
，而不只是使用键然后查找它们的值
使用
```
operator.itemgetter（）
```
代替
```
lambda
```
表达式
间距根据

还有另一种变化：

#!/usr/bin/env python
import fileinput
from collections import defaultdict, Counter

# count guids
perfile = defaultdict(Counter)
total = Counter()
for line in fileinput.input():
    fname, guid = map(str.strip, line.split(':'))
    perfile[guid][fname] += 1
    total[guid] += 1

# print most common guid first
for guid, count in total.most_common():
    if count < 2: continue # skip printing if count is below threshold
    print '{:>3} {}'.format(count, guid)
    # sorting by filename counts
    for fname, fname_cnt in perfile[guid].most_common():
        print '{:>8} {}'.format(fname_cnt, fname)

如果脚本清晰且适合您，请不要过度思考。

还有另一种变体：

#!/usr/bin/env python
import fileinput
from collections import defaultdict, Counter

# count guids
perfile = defaultdict(Counter)
total = Counter()
for line in fileinput.input():
    fname, guid = map(str.strip, line.split(':'))
    perfile[guid][fname] += 1
    total[guid] += 1

# print most common guid first
for guid, count in total.most_common():
    if count < 2: continue # skip printing if count is below threshold
    print '{:>3} {}'.format(count, guid)
    # sorting by filename counts
    for fname, fname_cnt in perfile[guid].most_common():
        print '{:>8} {}'.format(fname_cnt, fname)

如果剧本清晰且适合你，不要想得太多。

基于我再次尝试的一些答案，为了让我的生活更加困难，我避开了任何其他LIB：

def MyCounter(l):
    d = dict()
    for i in l:
        if i not in d:
            d[i] = 1
        else:
            d[i] += 1
    return d

def main():
    guids = dict()
    f_and_g = open('files-and-guids.txt', 'r')
    for fg in f_and_g.readlines():
        fname, guid = map(str.strip, fg.split(':'))
        if guid not in guids:
            guids[guid] = [fname]
        else:
            guids[guid] += [fname]

    ## Sort by total count for a given GUID
    for guid in sorted(guids, key=lambda guid: len(guids[guid]), reverse=True):
        ## Skip printing if count is below threshold
        if len(guids[guid]) < 2: continue
        guid_list = guids[guid]
        print '{:>3} {}'.format( len(guid_list), guid )
        ## Sort by filename counts
        counts = MyCounter(guid_list)
        for fname, fname_cnt in sorted(counts.iteritems(), key=lambda x:x[1],
                                   reverse=True):
            print '{:>8} {}'.format(fname_cnt, fname)

def MyCounter（l）：
d=dict（）
对于l中的i：