Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/string/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将字典中与键关联的值列表与所有其他键关联的值进行比较_Python_Dictionary - Fatal编程技术网

Python 将字典中与键关联的值列表与所有其他键关联的值进行比较

Python 将字典中与键关联的值列表与所有其他键关联的值进行比较,python,dictionary,Python,Dictionary,以下是我编写的代码,用于将与每个键关联的值列表与字典中的所有其他键进行比较。。。但是csv文件中的大约10000条记录花费了大量的时间。任何人都能帮助优化代码以在最短的时间内执行吗。。不要担心外部函数调用,它工作正常 import csv import sys file = sys.argv[1] with open(file, 'rU') as inf: csvreader=csv.DictReader(inf,delimiter=',') result={} temp

以下是我编写的代码,用于将与每个键关联的值列表与字典中的所有其他键进行比较。。。但是csv文件中的大约10000条记录花费了大量的时间。任何人都能帮助优化代码以在最短的时间内执行吗。。不要担心外部函数调用,它工作正常

import csv
import sys
file = sys.argv[1]
with open(file, 'rU') as inf:
    csvreader=csv.DictReader(inf,delimiter=',')
    result={}
    temp = []
#Creating Dict
    for r in csvreader:
        name=[]
        name.append(r['FIRST_NAME'])
        name.append(r['LAST_NAME'])
        name.append(r['ID'])
        result.setdefault(r['GROUP_KEY'],[]).append(name) 

#Processing the Dict

for key1 in result.keys():
    temp.append(key1)
    for key2 in result.keys():
        if key1 != key2 and key2 not in ex:
            for v1 in result[key1]:
                for v2 in result[key2]:
                    score=name_match_score(v1,'',v2,'')[0] ####calling external function
                    if score > 0.90:
                        print v1[2],v2[2],score

像这样的东西会有帮助。目标是通过跳过冗余计算和缓存执行的计算,减少在
name\u match\u score
中进行的原始计算的数量

首先,让您的字典存储元组列表的defaultdict。元组是不可变的,所以它们可以用作下面集合和dict中的键

from collections import defaultdict
import csv
import sys

file = sys.argv[1]
with open(file, 'rU') as inf:
    csvreader=csv.DictReader(inf, delimiter=',')
    result = defaultdict(list)
    for r in csvreader:
        name = (r['FIRST_NAME'], r['LAST_NAME'], r['ID'])
        result[r['GROUP_KEY']].append(name)
然后,对密钥进行排序,以确保只对一对密钥求值一次

keys = sorted(result)
for i, key1 in enumerate(keys):
    for key2 in keys[i+1:]:
并对
v1
v2
进行排序,以便它们形成一个唯一的键。这将有助于缓存

        for v1 in result[key1]:
            for v2 in result[key2]:
                v1, v2 = (min(v1, v2), max(v1, v2))
                score=name_match_score(v1, v2)[0] ####calling external function
                if score > 0.90:
                    print v1[2],v2[2],score
然后使用缓存来缓存计算:

class memoized(object):
    '''Decorator. Caches a function's return value each time it is called.
    If called later with the same arguments, the cached value is returned
    (not reevaluated).
    '''
    def __init__(self, func):
        self.func = func
        self.cache = {}
    def __call__(self, *args):
        if not isinstance(args, collections.Hashable):
            # uncacheable. a list, for instance.
            # better to not cache than blow up.
            return self.func(*args)
        if args in self.cache:
            return self.cache[args]
        else:
            value = self.func(*args)
            self.cache[args] = value
            return value
    def __repr__(self):
        '''Return the function's docstring.'''
        return self.func.__doc__
    def __get__(self, obj, objtype):
        '''Support instance methods.'''
        return functools.partial(self.__call__, obj)
并更改
name\u match\u score
以使用装饰器:

@memoized
def name_match_score(v1, v2):
    # Whatever this does
    return (0.75, )

这将最大限度地减少您所做的
name\u match\u score
中原始计算的数量。

文件“test.py”,第30行,如果vkey不在vseen:TypeError:unhable type中:“list”没有帮助……需要更多的时间,因为它会反复检查集合。