将Python中的三个列表与排序相结合
如何以如下方式高效、智能地组合3个列表将Python中的三个列表与排序相结合,python,list,Python,List,如何以如下方式高效、智能地组合3个列表 sex = ['M', 'M', 'F', 'F', 'M', 'F', 'M', 'F', 'F', 'F'] actresses = ['Natalie Portman', 'Anne Hathaway', 'Talia Shire', 'Diane Keaton', 'Keira Knightley', 'Uma Thurman'] actors = ['Morgan Freeman', 'Leonardo DiCaprio', 'Robert
sex = ['M', 'M', 'F', 'F', 'M', 'F', 'M', 'F', 'F', 'F']
actresses = ['Natalie Portman', 'Anne Hathaway', 'Talia Shire', 'Diane Keaton', 'Keira Knightley', 'Uma Thurman']
actors = ['Morgan Freeman', 'Leonardo DiCaprio', 'Robert De Niro', 'Brad Pitt']
结果:
[('M', 'Morgan Freeman'),
('M', 'Leonardo DiCaprio'),
('F', 'Natalie Portman'),
('F', 'Anne Hathaway'),
('M', 'Robert De Niro'),
('F', 'Talia Shire'),
('M', 'Brad Pitt'),
('F', 'Diane Keaton'),
('F', 'Keira Knightley'),
('F', 'Uma Thurman')]
我的解决方案:
sex = ['M', 'M', 'F', 'F', 'M', 'F', 'M', 'F', 'F', 'F']
actresses = ['Natalie Portman', 'Anne Hathaway', 'Talia Shire', 'Diane Keaton', 'Keira Knightley', 'Uma Thurman', ]
actors = ['Morgan Freeman', 'Leonardo DiCaprio', 'Robert De Niro', 'Brad Pitt']
result = []
for s in sex:
if s == 'F':
result.append((s, actresses.pop(0)))
elif s == 'M':
result.append((s, actors.pop(0)))
print(f'result = {result}')
对于一个长长的列表,例如100万个项目,最好的方法是什么?您从一开始就弹出了一个列表,这个列表已经有了。你可以做的是为演员列表保留一个索引,并在循环中增加它们 性别=['M','M','F','F','M','F','M','F','F'] 女演员=['Natalie Portman'、'Anne Hathaway'、'Talia Shire'、'Diane Keaton'、'Keira Knightley'、'Uma Thurman',] 演员=[‘摩根·弗里曼’、‘莱昂纳多·迪卡普里奥’、‘罗伯特·德尼罗’、‘布拉德·皮特’] 结果=[] 演员i=0 女演员i=0 关于性方面的问题: 如果s=='F': result.appends,女演员[女演员] 女演员_i+=1 elif s=='M': result.appends,actors[actors_i] 演员i+=1 printf'result={result}'
在这一点之后,我认为除了使代码更具可读性之外,没有任何改进了,因为您必须检查sex列表中的每一项,并且您正在使用循环中成本为O1的操作。因此,复杂性正在显现 您正在从列表的开头弹出,该列表已被删除。你可以做的是为演员列表保留一个索引,并在循环中增加它们 性别=['M','M','F','F','M','F','M','F','F'] 女演员=['Natalie Portman'、'Anne Hathaway'、'Talia Shire'、'Diane Keaton'、'Keira Knightley'、'Uma Thurman',] 演员=[‘摩根·弗里曼’、‘莱昂纳多·迪卡普里奥’、‘罗伯特·德尼罗’、‘布拉德·皮特’] 结果=[] 演员i=0 女演员i=0 关于性方面的问题: 如果s=='F': result.appends,女演员[女演员] 女演员_i+=1 elif s=='M': result.appends,actors[actors_i] 演员i+=1 printf'result={result}'
在这一点之后,我认为除了使代码更具可读性之外,没有任何改进了,因为您必须检查sex列表中的每一项,并且您正在使用循环中成本为O1的操作。因此,复杂性正在显现 考虑到所有演员都有“M”的标签,所有女演员都有“F”的标签,你可以使用pandas对信息进行分组,这种方式应该比在大列表中循环更快 以下是一个例子:
import pandas as pd
actresses = ['Natalie Portman', 'Anne Hathaway', 'Talia Shire', 'Diane Keaton', 'Keira Knightley', 'Uma Thurman', ]
actors = ['Morgan Freeman', 'Leonardo DiCaprio', 'Robert De Niro', 'Brad Pitt']
df_actresses = pd.DataFrame(actresses, columns=['name'])
df_actors = pd.DataFrame(actors, columns=['name'])
df_actresses['sex'] = 'F'
df_actors['sex'] = 'M'
df = pd.concat([df_actresses, df_actors], axis=0)
# if you really need it to be a list
result = df.values.tolist()
考虑到所有演员都有一个“M”标签,所有女演员都有一个“F”标签,你可以使用pandas对信息进行分组,这种方式应该比在大列表中循环更快 以下是一个例子:
import pandas as pd
actresses = ['Natalie Portman', 'Anne Hathaway', 'Talia Shire', 'Diane Keaton', 'Keira Knightley', 'Uma Thurman', ]
actors = ['Morgan Freeman', 'Leonardo DiCaprio', 'Robert De Niro', 'Brad Pitt']
df_actresses = pd.DataFrame(actresses, columns=['name'])
df_actors = pd.DataFrame(actors, columns=['name'])
df_actresses['sex'] = 'F'
df_actors['sex'] = 'M'
df = pd.concat([df_actresses, df_actors], axis=0)
# if you really need it to be a list
result = df.values.tolist()
您可以在字典中放置对列表的引用,并进行列表理解
In [8]: sexes = ['M', 'M', 'F', 'F', 'M', 'F', 'M', 'F', 'F', 'F']
...: actresses = ['Natalie Portman', 'Anne Hathaway', 'Talia Shire', 'Diane Keaton', 'Keira Knightley', 'Uma Thurman', ]
...: actors = ['Morgan Freeman', 'Leonardo DiCaprio', 'Robert De Niro', 'Brad Pitt']
...:
...: mf = {'M':iter(actors), 'F':iter(actresses)}
...: [(sex, next(mf[sex])) for sex in sexes]
Out[8]:
[('M', 'Morgan Freeman'),
('M', 'Leonardo DiCaprio'),
('F', 'Natalie Portman'),
('F', 'Anne Hathaway'),
('M', 'Robert De Niro'),
('F', 'Talia Shire'),
('M', 'Brad Pitt'),
('F', 'Diane Keaton'),
('F', 'Keira Knightley'),
('F', 'Uma Thurman')]
In [9]:
pairs = ((sex, next(mf[s])) for sex in sexes)
for sex, person in pairs:
...
如果你的列表很长,并且你打算一次消费一对性伴侣,你可以使用生成器表达式来代替列表
In [8]: sexes = ['M', 'M', 'F', 'F', 'M', 'F', 'M', 'F', 'F', 'F']
...: actresses = ['Natalie Portman', 'Anne Hathaway', 'Talia Shire', 'Diane Keaton', 'Keira Knightley', 'Uma Thurman', ]
...: actors = ['Morgan Freeman', 'Leonardo DiCaprio', 'Robert De Niro', 'Brad Pitt']
...:
...: mf = {'M':iter(actors), 'F':iter(actresses)}
...: [(sex, next(mf[sex])) for sex in sexes]
Out[8]:
[('M', 'Morgan Freeman'),
('M', 'Leonardo DiCaprio'),
('F', 'Natalie Portman'),
('F', 'Anne Hathaway'),
('M', 'Robert De Niro'),
('F', 'Talia Shire'),
('M', 'Brad Pitt'),
('F', 'Diane Keaton'),
('F', 'Keira Knightley'),
('F', 'Uma Thurman')]
In [9]:
pairs = ((sex, next(mf[s])) for sex in sexes)
for sex, person in pairs:
...
甚至可能更简单
for sex in sexes:
person = next(mf[sex])
...
如果列表存储在磁盘上,则可以使用上面介绍的相同模式,但使用生成器表达式代替列表
mf = {'M':(line.strip() for line in open('male_performers.txt'),
'F':(line.strip() for line in open('female_performers.txt')}
sexes = (line.strip() for line in open('sexes.txt'))
for sex in sexes:
performer = next(mf[sex])
您可以在字典中放置对列表的引用,并进行列表理解
In [8]: sexes = ['M', 'M', 'F', 'F', 'M', 'F', 'M', 'F', 'F', 'F']
...: actresses = ['Natalie Portman', 'Anne Hathaway', 'Talia Shire', 'Diane Keaton', 'Keira Knightley', 'Uma Thurman', ]
...: actors = ['Morgan Freeman', 'Leonardo DiCaprio', 'Robert De Niro', 'Brad Pitt']
...:
...: mf = {'M':iter(actors), 'F':iter(actresses)}
...: [(sex, next(mf[sex])) for sex in sexes]
Out[8]:
[('M', 'Morgan Freeman'),
('M', 'Leonardo DiCaprio'),
('F', 'Natalie Portman'),
('F', 'Anne Hathaway'),
('M', 'Robert De Niro'),
('F', 'Talia Shire'),
('M', 'Brad Pitt'),
('F', 'Diane Keaton'),
('F', 'Keira Knightley'),
('F', 'Uma Thurman')]
In [9]:
pairs = ((sex, next(mf[s])) for sex in sexes)
for sex, person in pairs:
...
如果你的列表很长,并且你打算一次消费一对性伴侣,你可以使用生成器表达式来代替列表
In [8]: sexes = ['M', 'M', 'F', 'F', 'M', 'F', 'M', 'F', 'F', 'F']
...: actresses = ['Natalie Portman', 'Anne Hathaway', 'Talia Shire', 'Diane Keaton', 'Keira Knightley', 'Uma Thurman', ]
...: actors = ['Morgan Freeman', 'Leonardo DiCaprio', 'Robert De Niro', 'Brad Pitt']
...:
...: mf = {'M':iter(actors), 'F':iter(actresses)}
...: [(sex, next(mf[sex])) for sex in sexes]
Out[8]:
[('M', 'Morgan Freeman'),
('M', 'Leonardo DiCaprio'),
('F', 'Natalie Portman'),
('F', 'Anne Hathaway'),
('M', 'Robert De Niro'),
('F', 'Talia Shire'),
('M', 'Brad Pitt'),
('F', 'Diane Keaton'),
('F', 'Keira Knightley'),
('F', 'Uma Thurman')]
In [9]:
pairs = ((sex, next(mf[s])) for sex in sexes)
for sex, person in pairs:
...
甚至可能更简单
for sex in sexes:
person = next(mf[sex])
...
如果列表存储在磁盘上,则可以使用上面介绍的相同模式,但使用生成器表达式代替列表
mf = {'M':(line.strip() for line in open('male_performers.txt'),
'F':(line.strip() for line in open('female_performers.txt')}
sexes = (line.strip() for line in open('sexes.txt'))
for sex in sexes:
performer = next(mf[sex])
谢谢你的回答。是的,在这种情况下使用pop0是一个非常糟糕的主意。我试图比较100万个伪项的所有解决方案。在我看来,结果非常好,除了使用pop0 结果:
combine_with_pop Items = 1000000. Average time: 45.49504270553589 secs
combine_without_pop Items = 1000000. Average time: 0.33301634788513185 secs
combine_dict Items = 1000000. Average time: 0.21431212425231932 secs
combine_generator Items = 1000000. Average time: 0.2770370960235596 secs
combine_frames Items = 1000000. Average time: 0.06862187385559082 secs
测试:
import pandas as pd
import string
import random
import time
import inspect
from statistics import mean
result_size = 1000000
g_number_of_repetitions = 5
def init():
# Generate sexes
population = ('M', 'F')
male_weight = 0.48
weights = (0.4, 1 - male_weight)
actresses = []
actors = []
sexes = random.choices(population, weights, k=result_size)
male_amount = sexes.count('M')
female_amount = result_size - male_amount
# Generate pseudo 'actresses' and 'actors'
act_len = 20
for a in range(female_amount):
actresses.append(''.join(random.choices(string.ascii_lowercase, k=act_len)))
for a in range(male_amount):
actors.append(''.join(random.choices(string.ascii_lowercase, k=act_len)))
return sexes, actresses, actors
def combine_with_pop(number_of_repetitions, sexes, random_actresses, random_actors):
time_measurements = []
for i in range(number_of_repetitions):
actors = random_actors[:]
actresses = random_actresses[:]
result = []
t0 = time.time()
for s in sexes:
if s == 'F':
result.append((s, actresses.pop(0)))
elif s == 'M':
result.append((s, actors.pop(0)))
time_one_round = time.time() - t0
time_measurements.append(time_one_round)
print(
f'{inspect.currentframe().f_code.co_name.ljust(20)} '
f'Items = {result_size}. Average time: {str(mean(time_measurements))} secs')
def combine_without_pop(number_of_repetitions, sexes, random_actresses, random_actors):
time_measurements = []
for i in range(number_of_repetitions):
actors = random_actors[:]
actresses = random_actresses[:]
result = []
actors_i = 0
actresses_i = 0
t0 = time.time()
for s in sexes:
if s == 'F':
result.append((s, actresses[actresses_i]))
actresses_i += 1
elif s == 'M':
result.append((s, actors[actors_i]))
actors_i += 1
time_one_round = time.time() - t0
time_measurements.append(time_one_round)
print(
f'{inspect.currentframe().f_code.co_name.ljust(20)} '
f'Items = {result_size}. Average time: {str(mean(time_measurements))} secs')
def combine_dict(number_of_repetitions, sexes, random_actresses, random_actors):
time_measurements = []
for i in range(number_of_repetitions):
actors = random_actors[:]
actresses = random_actresses[:]
result = []
t0 = time.time()
mf = {'M': iter(actors), 'F': iter(actresses)}
result = [(sex, next(mf[sex])) for sex in sexes]
time_one_round = time.time() - t0
time_measurements.append(time_one_round)
print(
f'{inspect.currentframe().f_code.co_name.ljust(20)} '
f'Items = {result_size}. Average time: {str(mean(time_measurements))} secs')
def combine_generator(number_of_repetitions, sexes, random_actresses, random_actors):
time_measurements = []
for i in range(number_of_repetitions):
actors = random_actors[:]
actresses = random_actresses[:]
result = []
t0 = time.time()
mf = {'M': iter(actors), 'F': iter(actresses)}
for sex in sexes:
person = next(mf[sex])
result.append((sex, person))
time_one_round = time.time() - t0
time_measurements.append(time_one_round)
print(
f'{inspect.currentframe().f_code.co_name.ljust(20)} '
f'Items = {result_size}. Average time: {str(mean(time_measurements))} secs')
def combine_frames(number_of_repetitions, sexes, random_actresses, random_actors):
time_measurements = []
for i in range(number_of_repetitions):
actors = random_actors[:]
actresses = random_actresses[:]
result = []
df_actresses = pd.DataFrame(actresses, columns=['name'])
df_actors = pd.DataFrame(actors, columns=['name'])
t0 = time.time()
df_actresses['sex'] = 'F'
df_actors['sex'] = 'M'
df = pd.concat([df_actresses, df_actors], axis=0)
# if you really need it to be a list
# result = df.values.tolist()
time_one_round = time.time() - t0
time_measurements.append(time_one_round)
print(
f'{inspect.currentframe().f_code.co_name.ljust(20)} '
f'Items = {result_size}. Average time: {str(mean(time_measurements))} secs')
g_sexes, g_actresses, g_actors = init()
combine_with_pop(g_number_of_repetitions, g_sexes, g_actresses, g_actors)
combine_without_pop(g_number_of_repetitions, g_sexes, g_actresses, g_actors)
combine_dict(g_number_of_repetitions, g_sexes, g_actresses, g_actors)
combine_generator(g_number_of_repetitions, g_sexes, g_actresses, g_actors)
combine_frames(g_number_of_repetitions, g_sexes, g_actresses, g_actors)
谢谢你的回答。是的,在这种情况下使用pop0是一个非常糟糕的主意。我试图比较100万个伪项的所有解决方案。在我看来,结果非常好,除了使用pop0 结果:
combine_with_pop Items = 1000000. Average time: 45.49504270553589 secs
combine_without_pop Items = 1000000. Average time: 0.33301634788513185 secs
combine_dict Items = 1000000. Average time: 0.21431212425231932 secs
combine_generator Items = 1000000. Average time: 0.2770370960235596 secs
combine_frames Items = 1000000. Average time: 0.06862187385559082 secs
测试:
import pandas as pd
import string
import random
import time
import inspect
from statistics import mean
result_size = 1000000
g_number_of_repetitions = 5
def init():
# Generate sexes
population = ('M', 'F')
male_weight = 0.48
weights = (0.4, 1 - male_weight)
actresses = []
actors = []
sexes = random.choices(population, weights, k=result_size)
male_amount = sexes.count('M')
female_amount = result_size - male_amount
# Generate pseudo 'actresses' and 'actors'
act_len = 20
for a in range(female_amount):
actresses.append(''.join(random.choices(string.ascii_lowercase, k=act_len)))
for a in range(male_amount):
actors.append(''.join(random.choices(string.ascii_lowercase, k=act_len)))
return sexes, actresses, actors
def combine_with_pop(number_of_repetitions, sexes, random_actresses, random_actors):
time_measurements = []
for i in range(number_of_repetitions):
actors = random_actors[:]
actresses = random_actresses[:]
result = []
t0 = time.time()
for s in sexes:
if s == 'F':
result.append((s, actresses.pop(0)))
elif s == 'M':
result.append((s, actors.pop(0)))
time_one_round = time.time() - t0
time_measurements.append(time_one_round)
print(
f'{inspect.currentframe().f_code.co_name.ljust(20)} '
f'Items = {result_size}. Average time: {str(mean(time_measurements))} secs')
def combine_without_pop(number_of_repetitions, sexes, random_actresses, random_actors):
time_measurements = []
for i in range(number_of_repetitions):
actors = random_actors[:]
actresses = random_actresses[:]
result = []
actors_i = 0
actresses_i = 0
t0 = time.time()
for s in sexes:
if s == 'F':
result.append((s, actresses[actresses_i]))
actresses_i += 1
elif s == 'M':
result.append((s, actors[actors_i]))
actors_i += 1
time_one_round = time.time() - t0
time_measurements.append(time_one_round)
print(
f'{inspect.currentframe().f_code.co_name.ljust(20)} '
f'Items = {result_size}. Average time: {str(mean(time_measurements))} secs')
def combine_dict(number_of_repetitions, sexes, random_actresses, random_actors):
time_measurements = []
for i in range(number_of_repetitions):
actors = random_actors[:]
actresses = random_actresses[:]
result = []
t0 = time.time()
mf = {'M': iter(actors), 'F': iter(actresses)}
result = [(sex, next(mf[sex])) for sex in sexes]
time_one_round = time.time() - t0
time_measurements.append(time_one_round)
print(
f'{inspect.currentframe().f_code.co_name.ljust(20)} '
f'Items = {result_size}. Average time: {str(mean(time_measurements))} secs')
def combine_generator(number_of_repetitions, sexes, random_actresses, random_actors):
time_measurements = []
for i in range(number_of_repetitions):
actors = random_actors[:]
actresses = random_actresses[:]
result = []
t0 = time.time()
mf = {'M': iter(actors), 'F': iter(actresses)}
for sex in sexes:
person = next(mf[sex])
result.append((sex, person))
time_one_round = time.time() - t0
time_measurements.append(time_one_round)
print(
f'{inspect.currentframe().f_code.co_name.ljust(20)} '
f'Items = {result_size}. Average time: {str(mean(time_measurements))} secs')
def combine_frames(number_of_repetitions, sexes, random_actresses, random_actors):
time_measurements = []
for i in range(number_of_repetitions):
actors = random_actors[:]
actresses = random_actresses[:]
result = []
df_actresses = pd.DataFrame(actresses, columns=['name'])
df_actors = pd.DataFrame(actors, columns=['name'])
t0 = time.time()
df_actresses['sex'] = 'F'
df_actors['sex'] = 'M'
df = pd.concat([df_actresses, df_actors], axis=0)
# if you really need it to be a list
# result = df.values.tolist()
time_one_round = time.time() - t0
time_measurements.append(time_one_round)
print(
f'{inspect.currentframe().f_code.co_name.ljust(20)} '
f'Items = {result_size}. Average time: {str(mean(time_measurements))} secs')
g_sexes, g_actresses, g_actors = init()
combine_with_pop(g_number_of_repetitions, g_sexes, g_actresses, g_actors)
combine_without_pop(g_number_of_repetitions, g_sexes, g_actresses, g_actors)
combine_dict(g_number_of_repetitions, g_sexes, g_actresses, g_actors)
combine_generator(g_number_of_repetitions, g_sexes, g_actresses, g_actors)
combine_frames(g_number_of_repetitions, g_sexes, g_actresses, g_actors)
你考虑过使用熊猫数据帧吗?你可以简单地将女演员存储到一个数据框中,将演员存储到另一个数据框中,在每个数据框中添加一个性别列df_actors['gender']='M'和df_actors['gender']='F',然后合并数据框。我认为没有更好的方法了。你有一个ON算法,根据问题的性质,你必须一个接一个地检查sex数组。最好的方法-我想最好的方法是有效的方法。使用collections.deque而不是list。你考虑过使用pandas数据帧吗?你可以简单地将女演员存储到一个数据框中,将演员存储到另一个数据框中,在每个数据框中添加一个性别列df_actors['gender']='M'和df_actors['gender']='F',然后合并数据框。我认为没有更好的方法了。你有一个ON算法,根据问题的性质,你必须一个接一个地检查sex数组。最好的方法-我想最好的方法是有效的方法。使用collections.deque而不是list。