Python 基于大型列表的多个条件查找所有组合

Python 基于大型列表的多个条件查找所有组合,python,mathematical-optimization,itertools,linear-programming,Python,Mathematical Optimization,Itertools,Linear Programming,我正试图计算一场梦幻自行车比赛的最佳队伍。我有一个csv文件,其中包含176名自行车手、他们的车队、他们所获得的分数以及将他们加入我的车队的价格。我正在努力寻找16名自行车选手中得分最高的队伍 适用于任何团队组成的规则包括: 这个队的总费用不能超过100英镑 同一队最多只能有4名自行车手参加幻想队 下面是我的csv文件的简短摘录 THOMAS Geraint,Team INEOS,142,13 SAGAN Peter,BORA - hansgrohe,522,11.5 GROENEWEGEN

我正试图计算一场梦幻自行车比赛的最佳队伍。我有一个csv文件,其中包含176名自行车手、他们的车队、他们所获得的分数以及将他们加入我的车队的价格。我正在努力寻找16名自行车选手中得分最高的队伍

适用于任何团队组成的规则包括:

  • 这个队的总费用不能超过100英镑
  • 同一队最多只能有4名自行车手参加幻想队
下面是我的csv文件的简短摘录

THOMAS Geraint,Team INEOS,142,13
SAGAN Peter,BORA - hansgrohe,522,11.5
GROENEWEGEN Dylan,Team Jumbo-Visma,205,11
FUGLSANG Jakob,Astana Pro Team,46,10
BERNAL Egan,Team INEOS,110,10
BARDET Romain,AG2R La Mondiale,21,9.5
QUINTANA Nairo,Movistar Team,58,9.5
YATES Adam,Mitchelton-Scott,40,9.5
VIVIANI Elia,Deceuninck - Quick Step,273,9.5
YATES Simon,Mitchelton-Scott,13,9
EWAN Caleb,Lotto Soudal,13,9
解决此问题的最简单方法是生成所有可能的团队组合的列表,然后应用规则排除不符合上述规则的团队。在这之后,计算每个队的总分并选择得分最高的队是很简单的。理论上,可以通过下面的代码生成可用团队的列表

database_csv = pd.read_csv('renner_db_example.csv')
renners = database_csv.to_dict(orient='records')

budget = 100
max_same_team = 4
team_total = 16

combos = itertools.combinations(renners,team_total)
usable_combos = []

for i in combos:
    if sum(persoon["Waarde"] for persoon in i)  < budget and all(z <= max_same_team for z in [len(list(group)) for key, group in groupby([persoon["Ploeg"] for persoon in i])]) == True:   
    usable_combos.append(i)    

database\u csv=pd.read\u csv('renner\u db\u example.csv'))
renners=数据库\u csv.to \u dict(orient='records')
预算=100
最大相同团队=4
团队总数=16
组合=itertools.组合(renners,团队总数)
可用的_组合=[]
对于组合中的i:
如果sum(persoon[“Waarde”]表示i中的persoon)
有很多算法可以找到最佳(或非常好的,取决于算法)解决方案:

  • 混合整数规划
  • 超启发式
  • 约束规划
下面是一段代码,它将使用ortools库和默认解算器找到最佳解决方案:


这段代码是如何解决这个问题的?正如@KyleParsons所说,它看起来像背包问题,可以使用整数规划建模


让我们定义变量
Xi(0I为您的问题添加另一个答案:

我发布的CSV实际上已被修改,我原来的CSV还包含每个车手的列表,以及他们在每个阶段的得分。此列表类似于
[0,40,13,0,2,55,1,17,0,14]
。我正在努力寻找整体表现最好的团队。因此,我有一个16名自行车运动员的团队,其中10名自行车运动员的得分计入每天的得分。然后将每天的得分相加,得到一个总分。目的是使最终的总分尽可能高

如果你认为我应该编辑我的第一篇文章,请让我知道,我认为这样更清楚,因为我的第一篇文章相当密集,回答了最初的问题

让我们引入一个新变量:

Zik = 1 if cyclist i is selected and is one of the top 10 in your team on day k
你需要把这些约束添加到链接变量Zik和Xi(如果没有选择骑自行车者,变量Zik不能是1,即如果席席=0)< /P> 现在我们需要添加这些约束,使
Lik
等于
Xi*Zik

For all i,k : Xi + Zik - 1 <= Lik
For all i,k : Lik <= 1/2 * (Xi + Zik)
每天选定的骑自行车者

Day 1 :
SAGAN Peter
VIVIANI Elia
ALAPHILIPPE Julian
MATTHEWS Michael
COLBRELLI Sonny
VAN AVERMAET Greg
STUYVEN Jasper
CICCONE Giulio
TEUNISSEN Mike
HERRADA Jesús

Day 2 :
SAGAN Peter
ALAPHILIPPE Julian
MATTHEWS Michael
TRENTIN Matteo
COLBRELLI Sonny
VAN AVERMAET Greg
STUYVEN Jasper
TEUNISSEN Mike
NIZZOLO Giacomo
MEURISSE Xandro

Day 3 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
MATTHEWS Michael
TRENTIN Matteo
VAN AVERMAET Greg
STUYVEN Jasper
CICCONE Giulio
TEUNISSEN Mike
HERRADA Jesús

Day 4 :
SAGAN Peter
VIVIANI Elia
PINOT Thibaut
MATTHEWS Michael
TRENTIN Matteo
COLBRELLI Sonny
VAN AVERMAET Greg
STUYVEN Jasper
TEUNISSEN Mike
HERRADA Jesús

Day 5 :
SAGAN Peter
VIVIANI Elia
ALAPHILIPPE Julian
PINOT Thibaut
MATTHEWS Michael
TRENTIN Matteo
COLBRELLI Sonny
VAN AVERMAET Greg
CICCONE Giulio
HERRADA Jesús

Day 6 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
ALAPHILIPPE Julian
MATTHEWS Michael
TRENTIN Matteo
COLBRELLI Sonny
STUYVEN Jasper
CICCONE Giulio
TEUNISSEN Mike

Day 7 :
SAGAN Peter
VIVIANI Elia
ALAPHILIPPE Julian
MATTHEWS Michael
COLBRELLI Sonny
VAN AVERMAET Greg
STUYVEN Jasper
TEUNISSEN Mike
HERRADA Jesús
MEURISSE Xandro

Day 8 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
ALAPHILIPPE Julian
MATTHEWS Michael
STUYVEN Jasper
TEUNISSEN Mike
HERRADA Jesús
NIZZOLO Giacomo
MEURISSE Xandro

Day 9 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
ALAPHILIPPE Julian
PINOT Thibaut
TRENTIN Matteo
COLBRELLI Sonny
VAN AVERMAET Greg
TEUNISSEN Mike
HERRADA Jesús

Day 10 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
PINOT Thibaut
COLBRELLI Sonny
STUYVEN Jasper
CICCONE Giulio
TEUNISSEN Mike
HERRADA Jesús
NIZZOLO Giacomo

让我们比较答案1和答案2的结果
打印(solver.Objective().Value())

第一个模型得到的是
3738.0
,第二个模型得到的是
3129.087388325567
。该值较低,因为每个阶段只选择10名自行车手,而不是16名

现在,如果保留第一个解决方案并使用新的评分方法,我们将得到
3122.9477585307413


我们可以认为第一个模型是足够好的:我们不需要引入新的变量/约束,模型保持简单,并且我们得到了一个几乎和复杂模型一样好的解决方案。有时它不需要是100%精确的,并且模型可以用一些近似更容易和快速地求解。< / P>这可能是基础。联盟难题。它看起来像背包问题的一个变体:我添加了一个指向完整CSV文件的链接。我相信为了首先生成所有组合,然后修剪无效的团队,您首先必须生成20062118235172477959495个组合(n选择k,其中n为176,k为16)。此值表示从可能的176个元素中选择16个元素的可能组合数。看起来像是一个离散优化问题。也许你会找到一些灵感或。其目的是通过保持总成本<100来最大化总分?非常有趣,非常感谢你的全面回答!我认为解释n您添加的内容非常清楚。但是,我发现提供的文档有点稀疏。我的最终目标是找到16人的最佳团队,每个阶段选出最好的10名车手。我发布的CSV实际上已经修改,我原来的CSV还包含每个车手的列表以及每个阶段的分数。该列表看起来像t他的

[0,40,13,0,2,55,1,17,0,14]
。我能通过简单地修改目标来做到这一点吗?还是我必须大规模地修改代码?这正是这个问题的难点所在。我正试图找到整体表现最好的团队。因此我有一个16名自行车手的团队,其中10名自行车手的分数计入每天的分数。ea的分数然后将每一天的积分相加,得到一个总分。目的是使最终总分尽可能高。我目前认为,可以通过为每个阶段的目标设定一个系数来实现这一点,但我不太清楚如何实现这样一个事实,即每个阶段只有十名最佳车手的分数计入final分数。@Thakkennes我发布了新问题的新答案。非常感谢!这看起来很棒。但是,当我尝试使用CSV以每天正确的分数运行此问题时,我在第4天得到了一个
AssertionError
,它只挑选了9个车手。我不太清楚是什么原因导致了这一问题,因为我还没有时间完全深入研究您的代码但是。你知道是什么原因造成的吗?我已经粘贴了我正在使用的CSV。我发现了哪里出了问题。骑车人每天的限制设置为最多10名骑车人,但这是错误的
# Link cyclist <-> team
For all j, Yj >= sum(Xi, for all i where Xi is part of team j)
# Max 4 cyclist per team
For all j, Yj <= 4
# Min 16 cyclists 
sum(Xi, 1<=i<=nb_cyclists) >= 16
# Max 16 cyclists 
sum(Xi, 1<=i<=nb_cyclists) <= 16
# Max cost 
sum(ci * Xi, 1<=i<=n_cyclists) <= 100 
# where ci = cost of cyclist i
# Objective
max sum(pi * Xi, 1<=i<=n_cyclists)
# where pi = nb_points of cyclist i
Zik = 1 if cyclist i is selected and is one of the top 10 in your team on day k
For all i, sum(Zik, 1<=k<=n_days) <= n_days * Xi
For all k, sum(Zik, 1<=i<=n_cyclists) <= 10
Maximize sum(pik * Xi * Zik, 1<=i<=n_cyclists, 1 <= k <= n_days)
# where pik = nb_points of cyclist i at day k
Maximize sum(pik * Lik, 1<=i<=n_cyclists, 1 <= k <= n_days)
# where pik = nb_points of cyclist i at day k
For all i,k : Xi + Zik - 1 <= Lik
For all i,k : Lik <= 1/2 * (Xi + Zik)
import ast
from ortools.linear_solver import pywraplp
import pandas as pd


solver = pywraplp.Solver('cyclist', pywraplp.Solver.CBC_MIXED_INTEGER_PROGRAMMING)
cyclist_df = pd.read_csv('cyclists_day.csv')
cyclist_df['Punten_day'] = cyclist_df['Punten_day'].apply(ast.literal_eval)

# Variables
variables_name = {}
variables_team = {}
variables_name_per_day = {}
variables_linear = {}

for _, row in cyclist_df.iterrows():
    variables_name[row['Naam']] = solver.IntVar(0, 1, 'x_{}'.format(row['Naam']))
    if row['Ploeg'] not in variables_team:
        variables_team[row['Ploeg']] = solver.IntVar(0, solver.infinity(), 'y_{}'.format(row['Ploeg']))

    for k in range(10):
        variables_name_per_day[(row['Naam'], k)] = solver.IntVar(0, 1, 'z_{}_{}'.format(row['Naam'], k))
        variables_linear[(row['Naam'], k)] = solver.IntVar(0, 1, 'l_{}_{}'.format(row['Naam'], k))

# Link cyclist <-> team
for team, var in variables_team.items():
    constraint = solver.Constraint(0, solver.infinity())
    constraint.SetCoefficient(var, 1)
    for cyclist in cyclist_df[cyclist_df.Ploeg == team]['Naam']:
        constraint.SetCoefficient(variables_name[cyclist], -1)

# Max 4 cyclist per team
for team, var in variables_team.items():
    constraint = solver.Constraint(0, 4)
    constraint.SetCoefficient(var, 1)

# Max cyclists
constraint_max_cyclists = solver.Constraint(16, 16)
for cyclist in variables_name.values():
    constraint_max_cyclists.SetCoefficient(cyclist, 1)

# Max cost
constraint_max_cost = solver.Constraint(0, 100)
for _, row in cyclist_df.iterrows():
    constraint_max_cost.SetCoefficient(variables_name[row['Naam']], row['Waarde'])

# Link Zik and Xi
for name, cyclist in variables_name.items():
    constraint_link_cyclist_day = solver.Constraint(-solver.infinity(), 0)
    constraint_link_cyclist_day.SetCoefficient(cyclist, - 10)
    for k in range(10):
        constraint_link_cyclist_day.SetCoefficient(variables_name_per_day[name, k], 1)

# Min/Max 10 cyclists per day
for k in range(10):
    constraint_cyclist_per_day = solver.Constraint(10, 10)
    for name in cyclist_df.Naam:
        constraint_cyclist_per_day.SetCoefficient(variables_name_per_day[name, k], 1)

# Linearization constraints 
for name, cyclist in variables_name.items():
    for k in range(10):
        constraint_linearization1 = solver.Constraint(-solver.infinity(), 1)
        constraint_linearization2 = solver.Constraint(-solver.infinity(), 0)

        constraint_linearization1.SetCoefficient(cyclist, 1)
        constraint_linearization1.SetCoefficient(variables_name_per_day[name, k], 1)
        constraint_linearization1.SetCoefficient(variables_linear[name, k], -1)

        constraint_linearization2.SetCoefficient(cyclist, -1/2)
        constraint_linearization2.SetCoefficient(variables_name_per_day[name, k], -1/2)
        constraint_linearization2.SetCoefficient(variables_linear[name, k], 1)

# Objective 
objective = solver.Objective()
objective.SetMaximization()

for _, row in cyclist_df.iterrows():
    for k in range(10):
        objective.SetCoefficient(variables_linear[row['Naam'], k], row['Punten_day'][k])

solver.Solve()

chosen_cyclists = [key for key, variable in variables_name.items() if variable.solution_value() > 0.5]

print('\n'.join(chosen_cyclists))

for k in range(10):
    print('\nDay {} :'.format(k + 1))
    chosen_cyclists_day = [name for (name, day), variable in variables_name_per_day.items() 
                       if (day == k and variable.solution_value() > 0.5)]
    assert len(chosen_cyclists_day) == 10
    assert all(chosen_cyclists_day[i] in chosen_cyclists for i in range(10))
    print('\n'.join(chosen_cyclists_day))
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
ALAPHILIPPE Julian
PINOT Thibaut
MATTHEWS Michael
TRENTIN Matteo
COLBRELLI Sonny
VAN AVERMAET Greg
STUYVEN Jasper
BENOOT Tiesj
CICCONE Giulio
TEUNISSEN Mike
HERRADA Jesús
MEURISSE Xandro
GRELLIER Fabien
Day 1 :
SAGAN Peter
VIVIANI Elia
ALAPHILIPPE Julian
MATTHEWS Michael
COLBRELLI Sonny
VAN AVERMAET Greg
STUYVEN Jasper
CICCONE Giulio
TEUNISSEN Mike
HERRADA Jesús

Day 2 :
SAGAN Peter
ALAPHILIPPE Julian
MATTHEWS Michael
TRENTIN Matteo
COLBRELLI Sonny
VAN AVERMAET Greg
STUYVEN Jasper
TEUNISSEN Mike
NIZZOLO Giacomo
MEURISSE Xandro

Day 3 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
MATTHEWS Michael
TRENTIN Matteo
VAN AVERMAET Greg
STUYVEN Jasper
CICCONE Giulio
TEUNISSEN Mike
HERRADA Jesús

Day 4 :
SAGAN Peter
VIVIANI Elia
PINOT Thibaut
MATTHEWS Michael
TRENTIN Matteo
COLBRELLI Sonny
VAN AVERMAET Greg
STUYVEN Jasper
TEUNISSEN Mike
HERRADA Jesús

Day 5 :
SAGAN Peter
VIVIANI Elia
ALAPHILIPPE Julian
PINOT Thibaut
MATTHEWS Michael
TRENTIN Matteo
COLBRELLI Sonny
VAN AVERMAET Greg
CICCONE Giulio
HERRADA Jesús

Day 6 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
ALAPHILIPPE Julian
MATTHEWS Michael
TRENTIN Matteo
COLBRELLI Sonny
STUYVEN Jasper
CICCONE Giulio
TEUNISSEN Mike

Day 7 :
SAGAN Peter
VIVIANI Elia
ALAPHILIPPE Julian
MATTHEWS Michael
COLBRELLI Sonny
VAN AVERMAET Greg
STUYVEN Jasper
TEUNISSEN Mike
HERRADA Jesús
MEURISSE Xandro

Day 8 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
ALAPHILIPPE Julian
MATTHEWS Michael
STUYVEN Jasper
TEUNISSEN Mike
HERRADA Jesús
NIZZOLO Giacomo
MEURISSE Xandro

Day 9 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
ALAPHILIPPE Julian
PINOT Thibaut
TRENTIN Matteo
COLBRELLI Sonny
VAN AVERMAET Greg
TEUNISSEN Mike
HERRADA Jesús

Day 10 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
PINOT Thibaut
COLBRELLI Sonny
STUYVEN Jasper
CICCONE Giulio
TEUNISSEN Mike
HERRADA Jesús
NIZZOLO Giacomo