将Python中的三个列表与排序相结合

将Python中的三个列表与排序相结合,python,list,Python,List,如何以如下方式高效、智能地组合3个列表 sex = ['M', 'M', 'F', 'F', 'M', 'F', 'M', 'F', 'F', 'F'] actresses = ['Natalie Portman', 'Anne Hathaway', 'Talia Shire', 'Diane Keaton', 'Keira Knightley', 'Uma Thurman'] actors = ['Morgan Freeman', 'Leonardo DiCaprio', 'Robert

如何以如下方式高效、智能地组合3个列表

 sex = ['M', 'M', 'F', 'F', 'M', 'F', 'M', 'F', 'F', 'F']
 actresses = ['Natalie Portman', 'Anne Hathaway', 'Talia Shire', 'Diane Keaton', 'Keira Knightley', 'Uma Thurman']
 actors = ['Morgan Freeman', 'Leonardo DiCaprio', 'Robert De Niro', 'Brad Pitt']
结果:

[('M', 'Morgan Freeman'),
 ('M', 'Leonardo DiCaprio'),
 ('F', 'Natalie Portman'),
 ('F', 'Anne Hathaway'),
 ('M', 'Robert De Niro'),
 ('F', 'Talia Shire'),
 ('M', 'Brad Pitt'),
 ('F', 'Diane Keaton'),
 ('F', 'Keira Knightley'),
 ('F', 'Uma Thurman')]
我的解决方案:

sex = ['M', 'M', 'F', 'F', 'M', 'F', 'M', 'F', 'F', 'F']
actresses = ['Natalie Portman', 'Anne Hathaway', 'Talia Shire', 'Diane Keaton', 'Keira Knightley', 'Uma Thurman', ]
actors = ['Morgan Freeman', 'Leonardo DiCaprio', 'Robert De Niro', 'Brad Pitt']
result = []

for s in sex:
    if s == 'F':
        result.append((s, actresses.pop(0)))
    elif s == 'M':
        result.append((s, actors.pop(0)))

print(f'result = {result}')

对于一个长长的列表,例如100万个项目,最好的方法是什么?

您从一开始就弹出了一个列表,这个列表已经有了。你可以做的是为演员列表保留一个索引,并在循环中增加它们

性别=['M','M','F','F','M','F','M','F','F'] 女演员=['Natalie Portman'、'Anne Hathaway'、'Talia Shire'、'Diane Keaton'、'Keira Knightley'、'Uma Thurman',] 演员=[‘摩根·弗里曼’、‘莱昂纳多·迪卡普里奥’、‘罗伯特·德尼罗’、‘布拉德·皮特’] 结果=[] 演员i=0 女演员i=0 关于性方面的问题: 如果s=='F': result.appends,女演员[女演员] 女演员_i+=1 elif s=='M': result.appends,actors[actors_i] 演员i+=1 printf'result={result}'
在这一点之后,我认为除了使代码更具可读性之外,没有任何改进了,因为您必须检查sex列表中的每一项,并且您正在使用循环中成本为O1的操作。因此,复杂性正在显现

您正在从列表的开头弹出,该列表已被删除。你可以做的是为演员列表保留一个索引,并在循环中增加它们

性别=['M','M','F','F','M','F','M','F','F'] 女演员=['Natalie Portman'、'Anne Hathaway'、'Talia Shire'、'Diane Keaton'、'Keira Knightley'、'Uma Thurman',] 演员=[‘摩根·弗里曼’、‘莱昂纳多·迪卡普里奥’、‘罗伯特·德尼罗’、‘布拉德·皮特’] 结果=[] 演员i=0 女演员i=0 关于性方面的问题: 如果s=='F': result.appends,女演员[女演员] 女演员_i+=1 elif s=='M': result.appends,actors[actors_i] 演员i+=1 printf'result={result}'
在这一点之后,我认为除了使代码更具可读性之外,没有任何改进了,因为您必须检查sex列表中的每一项,并且您正在使用循环中成本为O1的操作。因此,复杂性正在显现

考虑到所有演员都有“M”的标签,所有女演员都有“F”的标签,你可以使用pandas对信息进行分组,这种方式应该比在大列表中循环更快

以下是一个例子:

import pandas as pd

actresses = ['Natalie Portman', 'Anne Hathaway', 'Talia Shire', 'Diane Keaton', 'Keira Knightley', 'Uma Thurman', ]
actors = ['Morgan Freeman', 'Leonardo DiCaprio', 'Robert De Niro', 'Brad Pitt']

df_actresses = pd.DataFrame(actresses, columns=['name'])
df_actors = pd.DataFrame(actors, columns=['name'])

df_actresses['sex'] = 'F'
df_actors['sex'] = 'M'

df = pd.concat([df_actresses, df_actors], axis=0)

# if you really need it to be a list
result = df.values.tolist()

考虑到所有演员都有一个“M”标签,所有女演员都有一个“F”标签,你可以使用pandas对信息进行分组,这种方式应该比在大列表中循环更快

以下是一个例子:

import pandas as pd

actresses = ['Natalie Portman', 'Anne Hathaway', 'Talia Shire', 'Diane Keaton', 'Keira Knightley', 'Uma Thurman', ]
actors = ['Morgan Freeman', 'Leonardo DiCaprio', 'Robert De Niro', 'Brad Pitt']

df_actresses = pd.DataFrame(actresses, columns=['name'])
df_actors = pd.DataFrame(actors, columns=['name'])

df_actresses['sex'] = 'F'
df_actors['sex'] = 'M'

df = pd.concat([df_actresses, df_actors], axis=0)

# if you really need it to be a list
result = df.values.tolist()

您可以在字典中放置对列表的引用,并进行列表理解

In [8]: sexes = ['M', 'M', 'F', 'F', 'M', 'F', 'M', 'F', 'F', 'F'] 
   ...: actresses = ['Natalie Portman', 'Anne Hathaway', 'Talia Shire', 'Diane Keaton', 'Keira Knightley', 'Uma Thurman', ] 
   ...: actors = ['Morgan Freeman', 'Leonardo DiCaprio', 'Robert De Niro', 'Brad Pitt']
   ...: 
   ...: mf = {'M':iter(actors), 'F':iter(actresses)} 
   ...: [(sex, next(mf[sex])) for sex in sexes]                                                                                                 
Out[8]: 
[('M', 'Morgan Freeman'),
 ('M', 'Leonardo DiCaprio'),
 ('F', 'Natalie Portman'),
 ('F', 'Anne Hathaway'),
 ('M', 'Robert De Niro'),
 ('F', 'Talia Shire'),
 ('M', 'Brad Pitt'),
 ('F', 'Diane Keaton'),
 ('F', 'Keira Knightley'),
 ('F', 'Uma Thurman')]

In [9]:
pairs = ((sex, next(mf[s])) for sex in sexes)
for sex, person in pairs:
    ...
如果你的列表很长,并且你打算一次消费一对性伴侣,你可以使用生成器表达式来代替列表

In [8]: sexes = ['M', 'M', 'F', 'F', 'M', 'F', 'M', 'F', 'F', 'F'] 
   ...: actresses = ['Natalie Portman', 'Anne Hathaway', 'Talia Shire', 'Diane Keaton', 'Keira Knightley', 'Uma Thurman', ] 
   ...: actors = ['Morgan Freeman', 'Leonardo DiCaprio', 'Robert De Niro', 'Brad Pitt']
   ...: 
   ...: mf = {'M':iter(actors), 'F':iter(actresses)} 
   ...: [(sex, next(mf[sex])) for sex in sexes]                                                                                                 
Out[8]: 
[('M', 'Morgan Freeman'),
 ('M', 'Leonardo DiCaprio'),
 ('F', 'Natalie Portman'),
 ('F', 'Anne Hathaway'),
 ('M', 'Robert De Niro'),
 ('F', 'Talia Shire'),
 ('M', 'Brad Pitt'),
 ('F', 'Diane Keaton'),
 ('F', 'Keira Knightley'),
 ('F', 'Uma Thurman')]

In [9]:
pairs = ((sex, next(mf[s])) for sex in sexes)
for sex, person in pairs:
    ...
甚至可能更简单

for sex in sexes:
    person =  next(mf[sex])
    ...
如果列表存储在磁盘上,则可以使用上面介绍的相同模式,但使用生成器表达式代替列表

mf = {'M':(line.strip() for line in open('male_performers.txt'), 
      'F':(line.strip() for line in open('female_performers.txt')}
sexes = (line.strip() for line in open('sexes.txt'))

for sex in sexes:
     performer = next(mf[sex])

您可以在字典中放置对列表的引用,并进行列表理解

In [8]: sexes = ['M', 'M', 'F', 'F', 'M', 'F', 'M', 'F', 'F', 'F'] 
   ...: actresses = ['Natalie Portman', 'Anne Hathaway', 'Talia Shire', 'Diane Keaton', 'Keira Knightley', 'Uma Thurman', ] 
   ...: actors = ['Morgan Freeman', 'Leonardo DiCaprio', 'Robert De Niro', 'Brad Pitt']
   ...: 
   ...: mf = {'M':iter(actors), 'F':iter(actresses)} 
   ...: [(sex, next(mf[sex])) for sex in sexes]                                                                                                 
Out[8]: 
[('M', 'Morgan Freeman'),
 ('M', 'Leonardo DiCaprio'),
 ('F', 'Natalie Portman'),
 ('F', 'Anne Hathaway'),
 ('M', 'Robert De Niro'),
 ('F', 'Talia Shire'),
 ('M', 'Brad Pitt'),
 ('F', 'Diane Keaton'),
 ('F', 'Keira Knightley'),
 ('F', 'Uma Thurman')]

In [9]:
pairs = ((sex, next(mf[s])) for sex in sexes)
for sex, person in pairs:
    ...
如果你的列表很长,并且你打算一次消费一对性伴侣,你可以使用生成器表达式来代替列表

In [8]: sexes = ['M', 'M', 'F', 'F', 'M', 'F', 'M', 'F', 'F', 'F'] 
   ...: actresses = ['Natalie Portman', 'Anne Hathaway', 'Talia Shire', 'Diane Keaton', 'Keira Knightley', 'Uma Thurman', ] 
   ...: actors = ['Morgan Freeman', 'Leonardo DiCaprio', 'Robert De Niro', 'Brad Pitt']
   ...: 
   ...: mf = {'M':iter(actors), 'F':iter(actresses)} 
   ...: [(sex, next(mf[sex])) for sex in sexes]                                                                                                 
Out[8]: 
[('M', 'Morgan Freeman'),
 ('M', 'Leonardo DiCaprio'),
 ('F', 'Natalie Portman'),
 ('F', 'Anne Hathaway'),
 ('M', 'Robert De Niro'),
 ('F', 'Talia Shire'),
 ('M', 'Brad Pitt'),
 ('F', 'Diane Keaton'),
 ('F', 'Keira Knightley'),
 ('F', 'Uma Thurman')]

In [9]:
pairs = ((sex, next(mf[s])) for sex in sexes)
for sex, person in pairs:
    ...
甚至可能更简单

for sex in sexes:
    person =  next(mf[sex])
    ...
如果列表存储在磁盘上,则可以使用上面介绍的相同模式,但使用生成器表达式代替列表

mf = {'M':(line.strip() for line in open('male_performers.txt'), 
      'F':(line.strip() for line in open('female_performers.txt')}
sexes = (line.strip() for line in open('sexes.txt'))

for sex in sexes:
     performer = next(mf[sex])

谢谢你的回答。是的,在这种情况下使用pop0是一个非常糟糕的主意。我试图比较100万个伪项的所有解决方案。在我看来,结果非常好,除了使用pop0

结果:

combine_with_pop     Items = 1000000. Average time: 45.49504270553589 secs
combine_without_pop  Items = 1000000. Average time:  0.33301634788513185 secs
combine_dict         Items = 1000000. Average time:  0.21431212425231932 secs
combine_generator    Items = 1000000. Average time:  0.2770370960235596 secs
combine_frames       Items = 1000000. Average time:  0.06862187385559082 secs
测试:

import pandas as pd
import string
import random
import time
import inspect
from statistics import mean

result_size = 1000000
g_number_of_repetitions = 5


def init():
    # Generate sexes
    population = ('M', 'F')
    male_weight = 0.48
    weights = (0.4, 1 - male_weight)
    actresses = []
    actors = []
    sexes = random.choices(population, weights, k=result_size)
    male_amount = sexes.count('M')
    female_amount = result_size - male_amount

    # Generate pseudo 'actresses' and 'actors'
    act_len = 20
    for a in range(female_amount):
        actresses.append(''.join(random.choices(string.ascii_lowercase, k=act_len)))
    for a in range(male_amount):
        actors.append(''.join(random.choices(string.ascii_lowercase, k=act_len)))
    return sexes, actresses, actors


def combine_with_pop(number_of_repetitions, sexes, random_actresses, random_actors):
    time_measurements = []
    for i in range(number_of_repetitions):
        actors = random_actors[:]
        actresses = random_actresses[:]
        result = []
        t0 = time.time()
        for s in sexes:
            if s == 'F':
                result.append((s, actresses.pop(0)))
            elif s == 'M':
                result.append((s, actors.pop(0)))
        time_one_round = time.time() - t0
        time_measurements.append(time_one_round)
    print(
        f'{inspect.currentframe().f_code.co_name.ljust(20)} '
        f'Items = {result_size}. Average time: {str(mean(time_measurements))} secs')


def combine_without_pop(number_of_repetitions, sexes, random_actresses, random_actors):
    time_measurements = []
    for i in range(number_of_repetitions):
        actors = random_actors[:]
        actresses = random_actresses[:]
        result = []
        actors_i = 0
        actresses_i = 0
        t0 = time.time()
        for s in sexes:
            if s == 'F':
                result.append((s, actresses[actresses_i]))
                actresses_i += 1
            elif s == 'M':
                result.append((s, actors[actors_i]))
                actors_i += 1
        time_one_round = time.time() - t0
        time_measurements.append(time_one_round)

    print(
        f'{inspect.currentframe().f_code.co_name.ljust(20)} '
        f'Items = {result_size}. Average time: {str(mean(time_measurements))} secs')


def combine_dict(number_of_repetitions, sexes, random_actresses, random_actors):
    time_measurements = []
    for i in range(number_of_repetitions):
        actors = random_actors[:]
        actresses = random_actresses[:]
        result = []
        t0 = time.time()

        mf = {'M': iter(actors), 'F': iter(actresses)}
        result = [(sex, next(mf[sex])) for sex in sexes]
        time_one_round = time.time() - t0
        time_measurements.append(time_one_round)

    print(
        f'{inspect.currentframe().f_code.co_name.ljust(20)} '
        f'Items = {result_size}. Average time: {str(mean(time_measurements))} secs')


def combine_generator(number_of_repetitions, sexes, random_actresses, random_actors):
    time_measurements = []
    for i in range(number_of_repetitions):
        actors = random_actors[:]
        actresses = random_actresses[:]
        result = []
        t0 = time.time()

        mf = {'M': iter(actors), 'F': iter(actresses)}
        for sex in sexes:
            person = next(mf[sex])
            result.append((sex, person))

        time_one_round = time.time() - t0
        time_measurements.append(time_one_round)

    print(
        f'{inspect.currentframe().f_code.co_name.ljust(20)} '
        f'Items = {result_size}. Average time: {str(mean(time_measurements))} secs')


def combine_frames(number_of_repetitions, sexes, random_actresses, random_actors):
    time_measurements = []
    for i in range(number_of_repetitions):
        actors = random_actors[:]
        actresses = random_actresses[:]
        result = []
        df_actresses = pd.DataFrame(actresses, columns=['name'])
        df_actors = pd.DataFrame(actors, columns=['name'])

        t0 = time.time()

        df_actresses['sex'] = 'F'
        df_actors['sex'] = 'M'

        df = pd.concat([df_actresses, df_actors], axis=0)

        # if you really need it to be a list
        # result = df.values.tolist()

        time_one_round = time.time() - t0
        time_measurements.append(time_one_round)

    print(
        f'{inspect.currentframe().f_code.co_name.ljust(20)} '
        f'Items = {result_size}. Average time: {str(mean(time_measurements))} secs')


g_sexes, g_actresses, g_actors = init()
combine_with_pop(g_number_of_repetitions, g_sexes, g_actresses, g_actors)
combine_without_pop(g_number_of_repetitions, g_sexes, g_actresses, g_actors)
combine_dict(g_number_of_repetitions, g_sexes, g_actresses, g_actors)
combine_generator(g_number_of_repetitions, g_sexes, g_actresses, g_actors)
combine_frames(g_number_of_repetitions, g_sexes, g_actresses, g_actors)

谢谢你的回答。是的,在这种情况下使用pop0是一个非常糟糕的主意。我试图比较100万个伪项的所有解决方案。在我看来,结果非常好,除了使用pop0

结果:

combine_with_pop     Items = 1000000. Average time: 45.49504270553589 secs
combine_without_pop  Items = 1000000. Average time:  0.33301634788513185 secs
combine_dict         Items = 1000000. Average time:  0.21431212425231932 secs
combine_generator    Items = 1000000. Average time:  0.2770370960235596 secs
combine_frames       Items = 1000000. Average time:  0.06862187385559082 secs
测试:

import pandas as pd
import string
import random
import time
import inspect
from statistics import mean

result_size = 1000000
g_number_of_repetitions = 5


def init():
    # Generate sexes
    population = ('M', 'F')
    male_weight = 0.48
    weights = (0.4, 1 - male_weight)
    actresses = []
    actors = []
    sexes = random.choices(population, weights, k=result_size)
    male_amount = sexes.count('M')
    female_amount = result_size - male_amount

    # Generate pseudo 'actresses' and 'actors'
    act_len = 20
    for a in range(female_amount):
        actresses.append(''.join(random.choices(string.ascii_lowercase, k=act_len)))
    for a in range(male_amount):
        actors.append(''.join(random.choices(string.ascii_lowercase, k=act_len)))
    return sexes, actresses, actors


def combine_with_pop(number_of_repetitions, sexes, random_actresses, random_actors):
    time_measurements = []
    for i in range(number_of_repetitions):
        actors = random_actors[:]
        actresses = random_actresses[:]
        result = []
        t0 = time.time()
        for s in sexes:
            if s == 'F':
                result.append((s, actresses.pop(0)))
            elif s == 'M':
                result.append((s, actors.pop(0)))
        time_one_round = time.time() - t0
        time_measurements.append(time_one_round)
    print(
        f'{inspect.currentframe().f_code.co_name.ljust(20)} '
        f'Items = {result_size}. Average time: {str(mean(time_measurements))} secs')


def combine_without_pop(number_of_repetitions, sexes, random_actresses, random_actors):
    time_measurements = []
    for i in range(number_of_repetitions):
        actors = random_actors[:]
        actresses = random_actresses[:]
        result = []
        actors_i = 0
        actresses_i = 0
        t0 = time.time()
        for s in sexes:
            if s == 'F':
                result.append((s, actresses[actresses_i]))
                actresses_i += 1
            elif s == 'M':
                result.append((s, actors[actors_i]))
                actors_i += 1
        time_one_round = time.time() - t0
        time_measurements.append(time_one_round)

    print(
        f'{inspect.currentframe().f_code.co_name.ljust(20)} '
        f'Items = {result_size}. Average time: {str(mean(time_measurements))} secs')


def combine_dict(number_of_repetitions, sexes, random_actresses, random_actors):
    time_measurements = []
    for i in range(number_of_repetitions):
        actors = random_actors[:]
        actresses = random_actresses[:]
        result = []
        t0 = time.time()

        mf = {'M': iter(actors), 'F': iter(actresses)}
        result = [(sex, next(mf[sex])) for sex in sexes]
        time_one_round = time.time() - t0
        time_measurements.append(time_one_round)

    print(
        f'{inspect.currentframe().f_code.co_name.ljust(20)} '
        f'Items = {result_size}. Average time: {str(mean(time_measurements))} secs')


def combine_generator(number_of_repetitions, sexes, random_actresses, random_actors):
    time_measurements = []
    for i in range(number_of_repetitions):
        actors = random_actors[:]
        actresses = random_actresses[:]
        result = []
        t0 = time.time()

        mf = {'M': iter(actors), 'F': iter(actresses)}
        for sex in sexes:
            person = next(mf[sex])
            result.append((sex, person))

        time_one_round = time.time() - t0
        time_measurements.append(time_one_round)

    print(
        f'{inspect.currentframe().f_code.co_name.ljust(20)} '
        f'Items = {result_size}. Average time: {str(mean(time_measurements))} secs')


def combine_frames(number_of_repetitions, sexes, random_actresses, random_actors):
    time_measurements = []
    for i in range(number_of_repetitions):
        actors = random_actors[:]
        actresses = random_actresses[:]
        result = []
        df_actresses = pd.DataFrame(actresses, columns=['name'])
        df_actors = pd.DataFrame(actors, columns=['name'])

        t0 = time.time()

        df_actresses['sex'] = 'F'
        df_actors['sex'] = 'M'

        df = pd.concat([df_actresses, df_actors], axis=0)

        # if you really need it to be a list
        # result = df.values.tolist()

        time_one_round = time.time() - t0
        time_measurements.append(time_one_round)

    print(
        f'{inspect.currentframe().f_code.co_name.ljust(20)} '
        f'Items = {result_size}. Average time: {str(mean(time_measurements))} secs')


g_sexes, g_actresses, g_actors = init()
combine_with_pop(g_number_of_repetitions, g_sexes, g_actresses, g_actors)
combine_without_pop(g_number_of_repetitions, g_sexes, g_actresses, g_actors)
combine_dict(g_number_of_repetitions, g_sexes, g_actresses, g_actors)
combine_generator(g_number_of_repetitions, g_sexes, g_actresses, g_actors)
combine_frames(g_number_of_repetitions, g_sexes, g_actresses, g_actors)

你考虑过使用熊猫数据帧吗?你可以简单地将女演员存储到一个数据框中,将演员存储到另一个数据框中,在每个数据框中添加一个性别列df_actors['gender']='M'和df_actors['gender']='F',然后合并数据框。我认为没有更好的方法了。你有一个ON算法,根据问题的性质,你必须一个接一个地检查sex数组。最好的方法-我想最好的方法是有效的方法。使用collections.deque而不是list。你考虑过使用pandas数据帧吗?你可以简单地将女演员存储到一个数据框中,将演员存储到另一个数据框中,在每个数据框中添加一个性别列df_actors['gender']='M'和df_actors['gender']='F',然后合并数据框。我认为没有更好的方法了。你有一个ON算法,根据问题的性质,你必须一个接一个地检查sex数组。最好的方法-我想最好的方法是有效的方法。使用collections.deque而不是list。