Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/304.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/list/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
python中高效的列表映射_Python_List_Dictionary_Mapping_Itertools - Fatal编程技术网

python中高效的列表映射

python中高效的列表映射,python,list,dictionary,mapping,itertools,Python,List,Dictionary,Mapping,Itertools,我有以下意见: input = [(dog, dog, cat, mouse), (cat, ruby, python, mouse)] 并尝试获得以下输出: outputlist = [[0, 0, 1, 2], [1, 3, 4, 2]] outputmapping = {0:dog, 1:cat, 2:mouse, 3:ruby, 4:python, 5:mouse} 任何关于如何处理的提示都要考虑可伸缩性(var输入可能会变得非常大) 这里有一个可能的解决方案,尽管它不是最好的。如

我有以下意见:

input = [(dog, dog, cat, mouse), (cat, ruby, python, mouse)]
并尝试获得以下输出:

outputlist = [[0, 0, 1, 2], [1, 3, 4, 2]]

outputmapping = {0:dog, 1:cat, 2:mouse, 3:ruby, 4:python, 5:mouse}

任何关于如何处理的提示都要考虑可伸缩性(var输入可能会变得非常大)

这里有一个可能的解决方案,尽管它不是最好的。如果您知道列表中的每个条目将有多少个元素在手,那么通过预先分配这些元素,可以稍微提高效率

labels=[];
label2index={};
outputlist=[];
for group in input:
    current=[];
    for label in group:
       if label not in label2index:
           label2index[label]=len(labels);
           labels.append(label);
       current.append(label2index[label]);
    outputlist.append(current);

outputmapping={};
for idx, val in enumerate(labels):
    outputmapping[idx]=val;

您可能需要以下内容:

import collections
import itertools

def build_catalog(L):
    counter = itertools.count().next
    names = collections.defaultdict(counter)
    result = []
    for t in L:
        new_t = [ names[item] for item in t ]
        result.append(new_t)
    catalog = dict((name, idx) for idx, name in names.iteritems())
    return result, catalog
使用它:

>>> input = [('dog', 'dog', 'cat', 'mouse'), ('cat', 'ruby', 'python', 'mouse')]
>>> outputlist, outputmapping = build_catalog(input)
>>> outputlist
[[0, 0, 1, 2], [1, 3, 4, 2]]
>>> outputmapping
{0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'}

此类将自动将对象映射到递增的整数值:

class AutoMapping(object):
    def __init__(self):
        self.map = {}
        self.objects = []

    def __getitem__(self, val):
        if val not in self.map:
            self.map[val] = len(self.objects)
            self.objects.append(val)
        return self.map[val]
示例用法,用于您的输入:

>>> input = [('dog', 'dog', 'cat', 'mouse'), ('cat', 'ruby', 'python', 'mouse')]
>>> map = AutoMapping()
>>> [[map[x] for x in y] for y in input]
[[0, 0, 1, 2], [1, 3, 4, 2]]
>>> map.objects
['dog', 'cat', 'mouse', 'ruby', 'python']
>>> dict(enumerate(map.objects))
{0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'}

在我的项目中,我经常遇到同样的问题,所以我在不久前结束了一个课程,它正是这样做的:

class UniqueIdGenerator(object):
    """A dictionary-like class that can be used to assign unique integer IDs to
    names.

    Usage:

    >>> gen = UniqueIdGenerator()
    >>> gen["A"]
    0
    >>> gen["B"]
    1
    >>> gen["C"]
    2
    >>> gen["A"]      # Retrieving already existing ID
    0
    >>> len(gen)      # Number of already used IDs
    3
    """

    def __init__(self, id_generator=None):
        """Creates a new unique ID generator. `id_generator` specifies how do we
        assign new IDs to elements that do not have an ID yet. If it is `None`,
        elements will be assigned integer identifiers starting from 0. If it is
        an integer, elements will be assigned identifiers starting from the given
        integer. If it is an iterator or generator, its `next` method will be
        called every time a new ID is needed."""
        if id_generator is None:
            id_generator = 0
        if isinstance(id_generator, int):
            import itertools
            self._generator = itertools.count(id_generator)
        else:
            self._generator = id_generator
        self._ids = {}

    def __getitem__(self, item):
        """Retrieves the ID corresponding to `item`. Generates a new ID for `item`
        if it is the first time we request an ID for it."""
        try:
            return self._ids[item]
        except KeyError:
            self._ids[item] = self._generator.next()
            return self._ids[item]

    def __len__(self):
        """Retrieves the number of added elements in this UniqueIDGenerator"""
        return len(self._ids)

    def reverse_dict(self):
        """Returns the reversed mapping, i.e., the one that maps generated IDs to their
        corresponding items"""
        return dict((v, k) for k, v in self._ids.iteritems())

    def values(self):
        """Returns the list of items added so far. Items are ordered according to
        the standard sorting order of their keys, so the values will be exactly
        in the same order they were added if the ID generator generates IDs in
        ascending order. This hold, for instance, to numeric ID generators that
        assign integers starting from a given number."""
        return sorted(self._ids.keys(), key = self._ids.__getitem__)
用法示例:

>>> input = [(dog, dog, cat, mouse), (cat, ruby, python, mouse)]
>>> gen = UniqueIdGenerator()
>>> outputlist = [[gen[x] for x in y] for y in input]
[[0, 0, 1, 2], [1, 3, 4, 2]]
>>> print outputlist
>>> outputmapping = gen.reverse_dict()
>>> print outputmapping
{0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'}

@菲利克斯-不,不是真的。OPs映射是将输出列表转换回输入的逻辑组织。什么类型的对象是
dog
cat
,等等。?它们可以散列吗?
5:mouse
从何而来?哦,我不得不道歉,我以为
outputmapping
已经给出了。。。删除我的评论和答案。自动映射和构建目录(L)既高效又紧凑。在自动映射中,嵌套的for循环发生在理解列表中。这给了一个小的效率优势。对吗?但是非常感谢你的回答。问题解决了。