Python 具有重复值的联接列表

Python 具有重复值的联接列表,python,list,python-3.x,Python,List,Python 3.x,首先,这是到目前为止我对代码的了解,我将给出一点解释: ll1 = [ 'A', 'B', 'C', 'D' ] l2 = [ ['A', 10], ['B', 20], ['D', 5], ['A', 15], ['B', 30], ['C', 10], ['D', 15] ] dc = dict(l2) l3 = [[k, dc.get(k, 0)] for k in l1] 结果是: ['A', 15] ['B', 30] ['C', 10] ['D', 15] 第一列表l1由固定

首先,这是到目前为止我对代码的了解,我将给出一点解释:

ll1 = [
'A',
'B',
'C',
'D'
]

l2 = [
['A', 10],
['B', 20],
['D', 5],
['A', 15],
['B', 30],
['C', 10],
['D', 15]
]

dc = dict(l2)
l3 = [[k, dc.get(k, 0)] for k in l1]
结果是:

['A', 15]
['B', 30]
['C', 10]
['D', 15]
第一列表l1由固定数量的键组成,第二列表l2具有第一列表中给定的每个键的值。这里的l2只是一个例子,因为我稍后将获得值(这些值将作为列表给出),但它们的键与l1相同。每个键都需要显示,一个键可以重复,但有些键可能有空值(例如C项)

但是,当列表成为dict时,每个键的第一个值都会被丢弃,返回字典的唯一键

如何使结果与下面的结果相似

['A', 10]
['B', 20]
['C', 0]
['D', 5]
['A', 15]
['B', 30]
['C', 10]
['D', 15]
另一个例子是:

database_keys = [
'First Name',
'Last Name',
'Email',
'City'
]
database_input = [
['First Name', 'John'],
['Last Name', 'Doe'],
['Email', 'johndoe@test.com'],
['First Name', 'Jane'],
['Email', 'jane@test.com']
]

Output:
['First Name', 'John']
['Last Name', 'Doe']
['Email', 'johndoe@test.com']
['City', None]
['First Name', 'Jane']
['Last Name', None]
['Email', 'jane@test.com']
['City', None]

我会使用生成器来填充缺少的值,只保留一个键,当下一个需要的键不是数据中的键时,只生成空值:

import itertools
def fill_the_blanks(data, keys):
    keys = itertools.cycle(keys)
    for name, value in data:
        k = next(keys)
        while name!=k:
            yield [k,None]
            k = next(keys)
        yield [name,value]


>>> from pprint import pprint
>>> pprint( list(fill_the_blanks(l2, ll1)) )
[['A', 10],
 ['B', 20],
 ['C', None],
 ['D', 5],
 ['A', 15],
 ['B', 30],
 ['C', 10],
 ['D', 15]]
>>> pprint( list(fill_the_blanks(database_input,database_keys)) )
[['First Name', 'John'],
 ['Last Name', 'Doe'],
 ['Email', 'johndoe@test.com'],
 ['City', None],
 ['First Name', 'Jane'],
 ['Last Name', None],
 ['Email', 'jane@test.com']]
或者,如果您知道第一个键
'first Name'
将始终标记条目的开头,为什么不直接使用,然后填写,直到达到下一个“first value”:

def gen_dicts(data, keys):
    first_key = keys[0]
    entry = None #placeholder for first time
    for name, value in data:
        if name == first_key:
            if entry is not None: #skip first time
                yield entry
            entry = dict.fromkeys(keys)
        entry[name] = value
    yield entry #last one

>>> from pprint import pprint
>>> pprint( list(gen_dicts(l2, ll1)) )
[{'A': 10, 'B': 20, 'C': None, 'D': 5}, {'A': 15, 'B': 30, 'C': 10, 'D': 15}]
>>> pprint( list(gen_dicts(database_input, database_keys)) )
[{'City': None,
  'Email': 'johndoe@test.com',
  'First Name': 'John',
  'Last Name': 'Doe'},
 {'City': None,
  'Email': 'jane@test.com',
  'First Name': 'Jane',
  'Last Name': None}]

我会使用生成器来填充缺少的值,只保留一个键,当下一个需要的键不是数据中的键时,只生成空值:

import itertools
def fill_the_blanks(data, keys):
    keys = itertools.cycle(keys)
    for name, value in data:
        k = next(keys)
        while name!=k:
            yield [k,None]
            k = next(keys)
        yield [name,value]


>>> from pprint import pprint
>>> pprint( list(fill_the_blanks(l2, ll1)) )
[['A', 10],
 ['B', 20],
 ['C', None],
 ['D', 5],
 ['A', 15],
 ['B', 30],
 ['C', 10],
 ['D', 15]]
>>> pprint( list(fill_the_blanks(database_input,database_keys)) )
[['First Name', 'John'],
 ['Last Name', 'Doe'],
 ['Email', 'johndoe@test.com'],
 ['City', None],
 ['First Name', 'Jane'],
 ['Last Name', None],
 ['Email', 'jane@test.com']]
或者,如果您知道第一个键
'first Name'
将始终标记条目的开头,为什么不直接使用,然后填写,直到达到下一个“first value”:

def gen_dicts(data, keys):
    first_key = keys[0]
    entry = None #placeholder for first time
    for name, value in data:
        if name == first_key:
            if entry is not None: #skip first time
                yield entry
            entry = dict.fromkeys(keys)
        entry[name] = value
    yield entry #last one

>>> from pprint import pprint
>>> pprint( list(gen_dicts(l2, ll1)) )
[{'A': 10, 'B': 20, 'C': None, 'D': 5}, {'A': 15, 'B': 30, 'C': 10, 'D': 15}]
>>> pprint( list(gen_dicts(database_input, database_keys)) )
[{'City': None,
  'Email': 'johndoe@test.com',
  'First Name': 'John',
  'Last Name': 'Doe'},
 {'City': None,
  'Email': 'jane@test.com',
  'First Name': 'Jane',
  'Last Name': None}]
这里有一个肮脏的方法:

l1 = [
'A',
'B',
'C',
'D',
]

l2 = [
['A', 10],
['B', 20],
['D', 5],

['A', 15],
['B', 30],
['C', 10],
['D', 15],

['A', 8],
]

# Assuming elements in l2 are ordered, try to make groups
# of the same length of l1.
l_aux = l1[:]
l3 = [[]]
for x in l2:
    if x[0] in l_aux:
        l3[-1].append(x)
        l_aux.remove(x[0])
        continue
    for y in l_aux:
        l3[-1].append([y, 'WHATEVER'])
    l3.append([x])
    l_aux = l1[:]
    l_aux.remove(x[0])
for y in l_aux:
    l3[-1].append([y, 'WHATEVER'])
# Now, you have the elements you want grouped.
# Last step: sort and flat the list:
l3 = [y for x in l3 for y in sorted(x)]
print '\n'.join(str(x) for x in l3)
# ['A', 10]
# ['B', 20]
# ['C', 'WHATEVER']
# ['D', 5]
# ['A', 15]
# ['B', 30]
# ['C', 10]
# ['D', 15]
# ['A', 8]
# ['B', 'WHATEVER']
# ['C', 'WHATEVER']
# ['D', 'WHATEVER']
这里有一个肮脏的方法:

l1 = [
'A',
'B',
'C',
'D',
]

l2 = [
['A', 10],
['B', 20],
['D', 5],

['A', 15],
['B', 30],
['C', 10],
['D', 15],

['A', 8],
]

# Assuming elements in l2 are ordered, try to make groups
# of the same length of l1.
l_aux = l1[:]
l3 = [[]]
for x in l2:
    if x[0] in l_aux:
        l3[-1].append(x)
        l_aux.remove(x[0])
        continue
    for y in l_aux:
        l3[-1].append([y, 'WHATEVER'])
    l3.append([x])
    l_aux = l1[:]
    l_aux.remove(x[0])
for y in l_aux:
    l3[-1].append([y, 'WHATEVER'])
# Now, you have the elements you want grouped.
# Last step: sort and flat the list:
l3 = [y for x in l3 for y in sorted(x)]
print '\n'.join(str(x) for x in l3)
# ['A', 10]
# ['B', 20]
# ['C', 'WHATEVER']
# ['D', 5]
# ['A', 15]
# ['B', 30]
# ['C', 10]
# ['D', 15]
# ['A', 8]
# ['B', 'WHATEVER']
# ['C', 'WHATEVER']
# ['D', 'WHATEVER']

这里的问题是字典如何存储值。字典将获取您的密钥,在其上使用
\uuuuuuuuuuuuuu散列
函数,然后存储该值。当涉及到字符串时,当
\uuuuuu散列编辑时,具有相同值的两个字符串将具有相同的输出

>>> a = "foo"
>>> b = "foo"
>>> a == b
True
>>> a.__hash__()
-905768032644956145
>>> b.__hash__()
-905768032644956145
如您所见,当
\uuuu hash\uuuu
编辑时,它们都具有相同的值。因此,当字典尝试存储两个相同的键时,它将覆盖以前的值,而不是创建新键

查看第一个和第二个示例,您可以使用字典列表(假设每个值都以
“a”
“first Name”
开头)。所以你可以这样做:

dc = []
for s in l2:
    if s[0] != "First Name":
        dc[-1][s[0]] = s[1]
    else:
        dc.append({s[0]: s[1]})
然后,要检索您从
dc
输入的第一个人的
“名字”
,您可以使用以下命令:

dc[0]["First Name"]
它的一个扩展是将它们存储为类。假设我们有一个名为
Person
的类:

class Person(object):
    def __init__(self, personal_information):
        super(Person, self).__init__()
        self.first_name = personal_information["First Name"]
        if "Last Name" in personal_information.keys():
            self.last_name = personal_information["Last Name"]
        if "Email" in personal_information.keys():
            self.email = personal_information["Email"]
        if "City" in personal_information.keys():
            self.city = personal_information["City"]
    def __repr__(self):
        # Just to make things look clean
        return "Person("+self.first_name+")"
只需传递已存储在
dc
中的字典,即可存储所有数据:

people = []

for s in dc:
    people.append(Person(s))
当您想要访问第一个人的名字时:

>>> people
[Person(John), Person(Jane)]
>>> people[0].first_name
'John'

数据结构的类型取决于您。

这里的问题是字典如何存储值。字典将获取您的密钥,在其上使用
\uuuuuuuuuuuuuu散列
函数,然后存储该值。当涉及到字符串时,当
\uuuuuu散列编辑时,具有相同值的两个字符串将具有相同的输出

>>> a = "foo"
>>> b = "foo"
>>> a == b
True
>>> a.__hash__()
-905768032644956145
>>> b.__hash__()
-905768032644956145
如您所见,当
\uuuu hash\uuuu
编辑时,它们都具有相同的值。因此,当字典尝试存储两个相同的键时,它将覆盖以前的值,而不是创建新键

查看第一个和第二个示例,您可以使用字典列表(假设每个值都以
“a”
“first Name”
开头)。所以你可以这样做:

dc = []
for s in l2:
    if s[0] != "First Name":
        dc[-1][s[0]] = s[1]
    else:
        dc.append({s[0]: s[1]})
然后,要检索您从
dc
输入的第一个人的
“名字”
,您可以使用以下命令:

dc[0]["First Name"]
它的一个扩展是将它们存储为类。假设我们有一个名为
Person
的类:

class Person(object):
    def __init__(self, personal_information):
        super(Person, self).__init__()
        self.first_name = personal_information["First Name"]
        if "Last Name" in personal_information.keys():
            self.last_name = personal_information["Last Name"]
        if "Email" in personal_information.keys():
            self.email = personal_information["Email"]
        if "City" in personal_information.keys():
            self.city = personal_information["City"]
    def __repr__(self):
        # Just to make things look clean
        return "Person("+self.first_name+")"
只需传递已存储在
dc
中的字典,即可存储所有数据:

people = []

for s in dc:
    people.append(Person(s))
当您想要访问第一个人的名字时:

>>> people
[Person(John), Person(Jane)]
>>> people[0].first_name
'John'

数据结构的类型取决于您。

预期的输出看起来非常像您已经拥有的
l2
!?是的,我忘了提到值不是预先确定的,它们只有相同的键。我将编辑这个问题。你看,修改并给出另一个例子@Schwobaseggle您是否至少保证每个集合都有第一个值?比如,可能只有两个
电子邮件
条目彼此相邻,因为第二个条目是给另一个人的吗?@Tadhgmandald Jensen是的,第一个名字是必需的,电子邮件也是必需的,所以不会有两个电子邮件相邻。但是如果有,它不会像这样发生吗?如中所示,只显示最后一个值?预期的输出看起来非常像您已经拥有的
l2
!?是的,我忘了提到值不是预先确定的,它们只有相同的键。我将编辑这个问题。你看,修改并给出另一个例子@Schwobaseggle您是否至少保证每个集合都有第一个值?比如,可能只有两个
电子邮件
条目彼此相邻,因为第二个条目是给另一个人的吗?@Tadhgmandald Jensen是的,第一个名字是必需的,电子邮件也是必需的,所以不会有两个电子邮件相邻。但是如果有,它不会像这样发生吗?如中所示,仅显示最后一个值?这也不考虑顺序,仅当它看到以前使用过的数据对时,才会拆分条目,因此输入
l2=[['a',10],'C',5],'B',15],'D',30]]
最有可能代表两个独立的条目,但在您的代码中,它只被视为一个。因此,它可能会失去同步,像
l2=['A',10],'C',5],'B',15],'C',30],'A',1],'B',2],'C',3],'D',4],
这样的输入会产生有趣的结果,但不太可能是正确的,每个条目都将包含第一个键,这样就不会对它们的用例造成不同步的混乱