Python 具有重复值的联接列表
首先,这是到目前为止我对代码的了解,我将给出一点解释:Python 具有重复值的联接列表,python,list,python-3.x,Python,List,Python 3.x,首先,这是到目前为止我对代码的了解,我将给出一点解释: ll1 = [ 'A', 'B', 'C', 'D' ] l2 = [ ['A', 10], ['B', 20], ['D', 5], ['A', 15], ['B', 30], ['C', 10], ['D', 15] ] dc = dict(l2) l3 = [[k, dc.get(k, 0)] for k in l1] 结果是: ['A', 15] ['B', 30] ['C', 10] ['D', 15] 第一列表l1由固定
ll1 = [
'A',
'B',
'C',
'D'
]
l2 = [
['A', 10],
['B', 20],
['D', 5],
['A', 15],
['B', 30],
['C', 10],
['D', 15]
]
dc = dict(l2)
l3 = [[k, dc.get(k, 0)] for k in l1]
结果是:
['A', 15]
['B', 30]
['C', 10]
['D', 15]
第一列表l1由固定数量的键组成,第二列表l2具有第一列表中给定的每个键的值。这里的l2只是一个例子,因为我稍后将获得值(这些值将作为列表给出),但它们的键与l1相同。每个键都需要显示,一个键可以重复,但有些键可能有空值(例如C项)
但是,当列表成为dict时,每个键的第一个值都会被丢弃,返回字典的唯一键
如何使结果与下面的结果相似
['A', 10]
['B', 20]
['C', 0]
['D', 5]
['A', 15]
['B', 30]
['C', 10]
['D', 15]
另一个例子是:
database_keys = [
'First Name',
'Last Name',
'Email',
'City'
]
database_input = [
['First Name', 'John'],
['Last Name', 'Doe'],
['Email', 'johndoe@test.com'],
['First Name', 'Jane'],
['Email', 'jane@test.com']
]
Output:
['First Name', 'John']
['Last Name', 'Doe']
['Email', 'johndoe@test.com']
['City', None]
['First Name', 'Jane']
['Last Name', None]
['Email', 'jane@test.com']
['City', None]
我会使用生成器来填充缺少的值,只保留一个键,当下一个需要的键不是数据中的键时,只生成空值:
import itertools
def fill_the_blanks(data, keys):
keys = itertools.cycle(keys)
for name, value in data:
k = next(keys)
while name!=k:
yield [k,None]
k = next(keys)
yield [name,value]
>>> from pprint import pprint
>>> pprint( list(fill_the_blanks(l2, ll1)) )
[['A', 10],
['B', 20],
['C', None],
['D', 5],
['A', 15],
['B', 30],
['C', 10],
['D', 15]]
>>> pprint( list(fill_the_blanks(database_input,database_keys)) )
[['First Name', 'John'],
['Last Name', 'Doe'],
['Email', 'johndoe@test.com'],
['City', None],
['First Name', 'Jane'],
['Last Name', None],
['Email', 'jane@test.com']]
或者,如果您知道第一个键'first Name'
将始终标记条目的开头,为什么不直接使用,然后填写,直到达到下一个“first value”:
def gen_dicts(data, keys):
first_key = keys[0]
entry = None #placeholder for first time
for name, value in data:
if name == first_key:
if entry is not None: #skip first time
yield entry
entry = dict.fromkeys(keys)
entry[name] = value
yield entry #last one
>>> from pprint import pprint
>>> pprint( list(gen_dicts(l2, ll1)) )
[{'A': 10, 'B': 20, 'C': None, 'D': 5}, {'A': 15, 'B': 30, 'C': 10, 'D': 15}]
>>> pprint( list(gen_dicts(database_input, database_keys)) )
[{'City': None,
'Email': 'johndoe@test.com',
'First Name': 'John',
'Last Name': 'Doe'},
{'City': None,
'Email': 'jane@test.com',
'First Name': 'Jane',
'Last Name': None}]
我会使用生成器来填充缺少的值,只保留一个键,当下一个需要的键不是数据中的键时,只生成空值:
import itertools
def fill_the_blanks(data, keys):
keys = itertools.cycle(keys)
for name, value in data:
k = next(keys)
while name!=k:
yield [k,None]
k = next(keys)
yield [name,value]
>>> from pprint import pprint
>>> pprint( list(fill_the_blanks(l2, ll1)) )
[['A', 10],
['B', 20],
['C', None],
['D', 5],
['A', 15],
['B', 30],
['C', 10],
['D', 15]]
>>> pprint( list(fill_the_blanks(database_input,database_keys)) )
[['First Name', 'John'],
['Last Name', 'Doe'],
['Email', 'johndoe@test.com'],
['City', None],
['First Name', 'Jane'],
['Last Name', None],
['Email', 'jane@test.com']]
或者,如果您知道第一个键'first Name'
将始终标记条目的开头,为什么不直接使用,然后填写,直到达到下一个“first value”:
def gen_dicts(data, keys):
first_key = keys[0]
entry = None #placeholder for first time
for name, value in data:
if name == first_key:
if entry is not None: #skip first time
yield entry
entry = dict.fromkeys(keys)
entry[name] = value
yield entry #last one
>>> from pprint import pprint
>>> pprint( list(gen_dicts(l2, ll1)) )
[{'A': 10, 'B': 20, 'C': None, 'D': 5}, {'A': 15, 'B': 30, 'C': 10, 'D': 15}]
>>> pprint( list(gen_dicts(database_input, database_keys)) )
[{'City': None,
'Email': 'johndoe@test.com',
'First Name': 'John',
'Last Name': 'Doe'},
{'City': None,
'Email': 'jane@test.com',
'First Name': 'Jane',
'Last Name': None}]
这里有一个肮脏的方法:
l1 = [
'A',
'B',
'C',
'D',
]
l2 = [
['A', 10],
['B', 20],
['D', 5],
['A', 15],
['B', 30],
['C', 10],
['D', 15],
['A', 8],
]
# Assuming elements in l2 are ordered, try to make groups
# of the same length of l1.
l_aux = l1[:]
l3 = [[]]
for x in l2:
if x[0] in l_aux:
l3[-1].append(x)
l_aux.remove(x[0])
continue
for y in l_aux:
l3[-1].append([y, 'WHATEVER'])
l3.append([x])
l_aux = l1[:]
l_aux.remove(x[0])
for y in l_aux:
l3[-1].append([y, 'WHATEVER'])
# Now, you have the elements you want grouped.
# Last step: sort and flat the list:
l3 = [y for x in l3 for y in sorted(x)]
print '\n'.join(str(x) for x in l3)
# ['A', 10]
# ['B', 20]
# ['C', 'WHATEVER']
# ['D', 5]
# ['A', 15]
# ['B', 30]
# ['C', 10]
# ['D', 15]
# ['A', 8]
# ['B', 'WHATEVER']
# ['C', 'WHATEVER']
# ['D', 'WHATEVER']
这里有一个肮脏的方法:
l1 = [
'A',
'B',
'C',
'D',
]
l2 = [
['A', 10],
['B', 20],
['D', 5],
['A', 15],
['B', 30],
['C', 10],
['D', 15],
['A', 8],
]
# Assuming elements in l2 are ordered, try to make groups
# of the same length of l1.
l_aux = l1[:]
l3 = [[]]
for x in l2:
if x[0] in l_aux:
l3[-1].append(x)
l_aux.remove(x[0])
continue
for y in l_aux:
l3[-1].append([y, 'WHATEVER'])
l3.append([x])
l_aux = l1[:]
l_aux.remove(x[0])
for y in l_aux:
l3[-1].append([y, 'WHATEVER'])
# Now, you have the elements you want grouped.
# Last step: sort and flat the list:
l3 = [y for x in l3 for y in sorted(x)]
print '\n'.join(str(x) for x in l3)
# ['A', 10]
# ['B', 20]
# ['C', 'WHATEVER']
# ['D', 5]
# ['A', 15]
# ['B', 30]
# ['C', 10]
# ['D', 15]
# ['A', 8]
# ['B', 'WHATEVER']
# ['C', 'WHATEVER']
# ['D', 'WHATEVER']
这里的问题是字典如何存储值。字典将获取您的密钥,在其上使用
\uuuuuuuuuuuuuu散列
函数,然后存储该值。当涉及到字符串时,当\uuuuuu散列编辑时,具有相同值的两个字符串将具有相同的输出
>>> a = "foo"
>>> b = "foo"
>>> a == b
True
>>> a.__hash__()
-905768032644956145
>>> b.__hash__()
-905768032644956145
如您所见,当\uuuu hash\uuuu
编辑时,它们都具有相同的值。因此,当字典尝试存储两个相同的键时,它将覆盖以前的值,而不是创建新键
查看第一个和第二个示例,您可以使用字典列表(假设每个值都以“a”
或“first Name”
开头)。所以你可以这样做:
dc = []
for s in l2:
if s[0] != "First Name":
dc[-1][s[0]] = s[1]
else:
dc.append({s[0]: s[1]})
然后,要检索您从dc
输入的第一个人的“名字”
,您可以使用以下命令:
dc[0]["First Name"]
它的一个扩展是将它们存储为类。假设我们有一个名为Person
的类:
class Person(object):
def __init__(self, personal_information):
super(Person, self).__init__()
self.first_name = personal_information["First Name"]
if "Last Name" in personal_information.keys():
self.last_name = personal_information["Last Name"]
if "Email" in personal_information.keys():
self.email = personal_information["Email"]
if "City" in personal_information.keys():
self.city = personal_information["City"]
def __repr__(self):
# Just to make things look clean
return "Person("+self.first_name+")"
只需传递已存储在dc
中的字典,即可存储所有数据:
people = []
for s in dc:
people.append(Person(s))
当您想要访问第一个人的名字时:
>>> people
[Person(John), Person(Jane)]
>>> people[0].first_name
'John'
数据结构的类型取决于您。这里的问题是字典如何存储值。字典将获取您的密钥,在其上使用\uuuuuuuuuuuuuu散列
函数,然后存储该值。当涉及到字符串时,当\uuuuuu散列编辑时,具有相同值的两个字符串将具有相同的输出
>>> a = "foo"
>>> b = "foo"
>>> a == b
True
>>> a.__hash__()
-905768032644956145
>>> b.__hash__()
-905768032644956145
如您所见,当\uuuu hash\uuuu
编辑时,它们都具有相同的值。因此,当字典尝试存储两个相同的键时,它将覆盖以前的值,而不是创建新键
查看第一个和第二个示例,您可以使用字典列表(假设每个值都以“a”
或“first Name”
开头)。所以你可以这样做:
dc = []
for s in l2:
if s[0] != "First Name":
dc[-1][s[0]] = s[1]
else:
dc.append({s[0]: s[1]})
然后,要检索您从dc
输入的第一个人的“名字”
,您可以使用以下命令:
dc[0]["First Name"]
它的一个扩展是将它们存储为类。假设我们有一个名为Person
的类:
class Person(object):
def __init__(self, personal_information):
super(Person, self).__init__()
self.first_name = personal_information["First Name"]
if "Last Name" in personal_information.keys():
self.last_name = personal_information["Last Name"]
if "Email" in personal_information.keys():
self.email = personal_information["Email"]
if "City" in personal_information.keys():
self.city = personal_information["City"]
def __repr__(self):
# Just to make things look clean
return "Person("+self.first_name+")"
只需传递已存储在dc
中的字典,即可存储所有数据:
people = []
for s in dc:
people.append(Person(s))
当您想要访问第一个人的名字时:
>>> people
[Person(John), Person(Jane)]
>>> people[0].first_name
'John'
数据结构的类型取决于您。预期的输出看起来非常像您已经拥有的l2
!?是的,我忘了提到值不是预先确定的,它们只有相同的键。我将编辑这个问题。你看,修改并给出另一个例子@Schwobaseggle您是否至少保证每个集合都有第一个值?比如,可能只有两个电子邮件条目彼此相邻,因为第二个条目是给另一个人的吗?@Tadhgmandald Jensen是的,第一个名字是必需的,电子邮件也是必需的,所以不会有两个电子邮件相邻。但是如果有,它不会像这样发生吗?如中所示,只显示最后一个值?预期的输出看起来非常像您已经拥有的l2
!?是的,我忘了提到值不是预先确定的,它们只有相同的键。我将编辑这个问题。你看,修改并给出另一个例子@Schwobaseggle您是否至少保证每个集合都有第一个值?比如,可能只有两个电子邮件条目彼此相邻,因为第二个条目是给另一个人的吗?@Tadhgmandald Jensen是的,第一个名字是必需的,电子邮件也是必需的,所以不会有两个电子邮件相邻。但是如果有,它不会像这样发生吗?如中所示,仅显示最后一个值?这也不考虑顺序,仅当它看到以前使用过的数据对时,才会拆分条目,因此输入l2=[['a',10],'C',5],'B',15],'D',30]]
最有可能代表两个独立的条目,但在您的代码中,它只被视为一个。因此,它可能会失去同步,像l2=['A',10],'C',5],'B',15],'C',30],'A',1],'B',2],'C',3],'D',4],
这样的输入会产生有趣的结果,但不太可能是正确的,每个条目都将包含第一个键,这样就不会对它们的用例造成不同步的混乱