Python 如何从不包含重复的列表中查找一组值
在Python中有一个列表,类似于:Python 如何从不包含重复的列表中查找一组值,python,Python,在Python中有一个列表,类似于: l = [[ 1, 2, 3], [18, 20, 22], [ 3, 14, 16], [ 1, 3, 05], [18, 2, 16]] 您将如何从每个子列表中选择一个值,以便不重复单个值,并使结果列表的总和最小化 result = [1, 18, 3, 5, 2] 编辑:我以前的解决方案只在大多数情况下有效,这应该在所有情况下都有效: from itertools import product l =
l = [[ 1, 2, 3],
[18, 20, 22],
[ 3, 14, 16],
[ 1, 3, 05],
[18, 2, 16]]
您将如何从每个子列表中选择一个值,以便不重复单个值,并使结果列表的总和最小化
result = [1, 18, 3, 5, 2]
编辑:我以前的解决方案只在大多数情况下有效,这应该在所有情况下都有效:
from itertools import product
l = [[1, 2, 3], [18, 20, 22], [3, 14, 16], [1, 3, 5], [18, 2, 16]]
result = None
for item in product(*l):
if len(item) > len(set(item)):
# Try the next combination if there are duplicates
continue
if result is None or sum(result) > sum(item):
result = item
print(result)
输出
(1, 18, 3, 5, 2)
(1, 18, 3, 5, 2)
seed 22290
Number of selections: 4096
[
[6, 11, 22, 23],
[9, 14, 17, 19],
[5, 9, 16, 22],
[5, 6, 9, 13],
[1, 3, 6, 22],
[4, 5, 6, 13],
]
solve_Maurice (11, 9, 5, 6, 1, 4)
solve_prodgen (11, 9, 5, 6, 1, 4)
solve_recgen [11, 9, 5, 6, 1, 4]
solve_recgen [0.5476037560001714, 0.549133045002236, 0.5647858490046929]
solve_prodgen [1.2500368960027117, 1.296529343999282, 1.3022710209988873]
solve_Maurice [1.485518219997175, 1.489505891004228, 1.784105566002836]
编辑:我以前的解决方案只在大多数情况下有效,这应该在所有情况下都有效:
from itertools import product
l = [[1, 2, 3], [18, 20, 22], [3, 14, 16], [1, 3, 5], [18, 2, 16]]
result = None
for item in product(*l):
if len(item) > len(set(item)):
# Try the next combination if there are duplicates
continue
if result is None or sum(result) > sum(item):
result = item
print(result)
输出
(1, 18, 3, 5, 2)
(1, 18, 3, 5, 2)
seed 22290
Number of selections: 4096
[
[6, 11, 22, 23],
[9, 14, 17, 19],
[5, 9, 16, 22],
[5, 6, 9, 13],
[1, 3, 6, 22],
[4, 5, 6, 13],
]
solve_Maurice (11, 9, 5, 6, 1, 4)
solve_prodgen (11, 9, 5, 6, 1, 4)
solve_recgen [11, 9, 5, 6, 1, 4]
solve_recgen [0.5476037560001714, 0.549133045002236, 0.5647858490046929]
solve_prodgen [1.2500368960027117, 1.296529343999282, 1.3022710209988873]
solve_Maurice [1.485518219997175, 1.489505891004228, 1.784105566002836]
这里有一个紧凑的暴力解决方案,因此它必须执行
列**行
测试,这是不好的。我怀疑有一种算法通常更有效,但在最坏的情况下,可能需要检查所有的可能性
from itertools import product
lst = [
[ 1, 2, 3],
[18, 20, 22],
[ 3, 14, 16],
[ 1, 3, 5],
[18, 2, 16],
]
nrows = len(lst)
m = min((t for t in product(*lst) if len(set(t)) == nrows), key=sum)
print(m)
输出
(1, 18, 3, 5, 2)
(1, 18, 3, 5, 2)
seed 22290
Number of selections: 4096
[
[6, 11, 22, 23],
[9, 14, 17, 19],
[5, 9, 16, 22],
[5, 6, 9, 13],
[1, 3, 6, 22],
[4, 5, 6, 13],
]
solve_Maurice (11, 9, 5, 6, 1, 4)
solve_prodgen (11, 9, 5, 6, 1, 4)
solve_recgen [11, 9, 5, 6, 1, 4]
solve_recgen [0.5476037560001714, 0.549133045002236, 0.5647858490046929]
solve_prodgen [1.2500368960027117, 1.296529343999282, 1.3022710209988873]
solve_Maurice [1.485518219997175, 1.489505891004228, 1.784105566002836]
这里有一个更快的版本,它使用递归生成器而不是
itertools.product
def select(data, seq):
if data:
for seq in select(data[:-1], seq):
for u in data[-1]:
if u not in seq:
yield seq + [u]
else:
yield seq
def solve(data):
return min(select(data, []), key=sum)
这里是递归生成器的一个修改版本,它可以按顺序进行排序,但当然速度较慢,而且会消耗更多的RAM。如果对输入数据进行排序,它通常会很快找到最小值,但我无法找到一种简单的方法,在找到最小值时让它停止
def select(data, selected):
if data:
for selected in sorted(select(data[:-1], selected), key=sum):
for u in data[-1]:
if u not in selected:
yield selected + [u]
else:
yield selected
这里有一些计时代码,比较了Maurice和我的解决方案的速度。它在Python2和Python3上运行。在运行旧的Debian Linux版本的2GHz 32位机器上,我在Python2.6和Python3.6上得到了类似的时间结果
from __future__ import print_function, division
from timeit import Timer
from itertools import product
from random import seed, sample, randrange
n = randrange(0, 1 << 32)
print('seed', n)
seed(n)
def show(data):
indent = ' ' * 4
s = '\n'.join(['{0}{1},'.format(indent, row) for row in data])
print('[\n{0}\n]\n'.format(s))
def make_data(rows, cols):
maxn = rows * cols
nums = range(1, maxn)
return [sample(nums, cols) for _ in range(rows)]
def sort_data(data):
newdata = [sorted(row) for row in data]
newdata.sort(reverse=True, key=sum)
return newdata
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
def solve_Maurice(data):
result = None
for item in product(*data):
if len(item) > len(set(item)):
# Try the next combination if there are duplicates
continue
if result is None or sum(result) > sum(item):
result = item
return result
def solve_prodgen(data):
rows = len(data)
return min((t for t in product(*data) if len(set(t)) == rows), key=sum)
def select(data, seq):
if data:
for seq in select(data[:-1], seq):
for u in data[-1]:
if u not in seq:
yield seq + [u]
else:
yield seq
def solve_recgen(data):
return min(select(data, []), key=sum)
funcs = (
solve_Maurice,
solve_prodgen,
solve_recgen,
)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
def verify():
for func in funcs:
fname = func.__name__
seq = func(data)
print('{0:14} {1}'.format(fname, seq))
print()
def time_test(loops, reps):
''' Print timing stats for all the functions '''
timings = []
for func in funcs:
fname = func.__name__
setup = 'from __main__ import data, ' + fname
cmd = fname + '(data)'
t = Timer(cmd, setup)
result = t.repeat(reps, loops)
result.sort()
timings.append((result, fname))
timings.sort()
for result, fname in timings:
print('{0:14} {1}'.format(fname, result))
rows, cols = 6, 4
print('Number of selections:', cols ** rows)
data = make_data(rows, cols)
data = sort_data(data)
show(data)
verify()
loops, reps = 100, 3
time_test(loops, reps)
这里有一个紧凑的暴力解决方案,因此它必须执行
列**行
测试,这是不好的。我怀疑有一种算法通常更有效,但在最坏的情况下,可能需要检查所有的可能性
from itertools import product
lst = [
[ 1, 2, 3],
[18, 20, 22],
[ 3, 14, 16],
[ 1, 3, 5],
[18, 2, 16],
]
nrows = len(lst)
m = min((t for t in product(*lst) if len(set(t)) == nrows), key=sum)
print(m)
输出
(1, 18, 3, 5, 2)
(1, 18, 3, 5, 2)
seed 22290
Number of selections: 4096
[
[6, 11, 22, 23],
[9, 14, 17, 19],
[5, 9, 16, 22],
[5, 6, 9, 13],
[1, 3, 6, 22],
[4, 5, 6, 13],
]
solve_Maurice (11, 9, 5, 6, 1, 4)
solve_prodgen (11, 9, 5, 6, 1, 4)
solve_recgen [11, 9, 5, 6, 1, 4]
solve_recgen [0.5476037560001714, 0.549133045002236, 0.5647858490046929]
solve_prodgen [1.2500368960027117, 1.296529343999282, 1.3022710209988873]
solve_Maurice [1.485518219997175, 1.489505891004228, 1.784105566002836]
这里有一个更快的版本,它使用递归生成器而不是
itertools.product
def select(data, seq):
if data:
for seq in select(data[:-1], seq):
for u in data[-1]:
if u not in seq:
yield seq + [u]
else:
yield seq
def solve(data):
return min(select(data, []), key=sum)
这里是递归生成器的一个修改版本,它可以按顺序进行排序,但当然速度较慢,而且会消耗更多的RAM。如果对输入数据进行排序,它通常会很快找到最小值,但我无法找到一种简单的方法,在找到最小值时让它停止
def select(data, selected):
if data:
for selected in sorted(select(data[:-1], selected), key=sum):
for u in data[-1]:
if u not in selected:
yield selected + [u]
else:
yield selected
这里有一些计时代码,比较了Maurice和我的解决方案的速度。它在Python2和Python3上运行。在运行旧的Debian Linux版本的2GHz 32位机器上,我在Python2.6和Python3.6上得到了类似的时间结果
from __future__ import print_function, division
from timeit import Timer
from itertools import product
from random import seed, sample, randrange
n = randrange(0, 1 << 32)
print('seed', n)
seed(n)
def show(data):
indent = ' ' * 4
s = '\n'.join(['{0}{1},'.format(indent, row) for row in data])
print('[\n{0}\n]\n'.format(s))
def make_data(rows, cols):
maxn = rows * cols
nums = range(1, maxn)
return [sample(nums, cols) for _ in range(rows)]
def sort_data(data):
newdata = [sorted(row) for row in data]
newdata.sort(reverse=True, key=sum)
return newdata
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
def solve_Maurice(data):
result = None
for item in product(*data):
if len(item) > len(set(item)):
# Try the next combination if there are duplicates
continue
if result is None or sum(result) > sum(item):
result = item
return result
def solve_prodgen(data):
rows = len(data)
return min((t for t in product(*data) if len(set(t)) == rows), key=sum)
def select(data, seq):
if data:
for seq in select(data[:-1], seq):
for u in data[-1]:
if u not in seq:
yield seq + [u]
else:
yield seq
def solve_recgen(data):
return min(select(data, []), key=sum)
funcs = (
solve_Maurice,
solve_prodgen,
solve_recgen,
)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
def verify():
for func in funcs:
fname = func.__name__
seq = func(data)
print('{0:14} {1}'.format(fname, seq))
print()
def time_test(loops, reps):
''' Print timing stats for all the functions '''
timings = []
for func in funcs:
fname = func.__name__
setup = 'from __main__ import data, ' + fname
cmd = fname + '(data)'
t = Timer(cmd, setup)
result = t.repeat(reps, loops)
result.sort()
timings.append((result, fname))
timings.sort()
for result, fname in timings:
print('{0:14} {1}'.format(fname, result))
rows, cols = 6, 4
print('Number of selections:', cols ** rows)
data = make_data(rows, cols)
data = sort_data(data)
show(data)
verify()
loops, reps = 100, 3
time_test(loops, reps)
假设全部为正:在每个子列表中选择最小值。如果复制了任何元素,请从两个相应的子列表中选择与复制元素不同的最小数字。查找
min()。你至少试过做些什么吗?我想你正在使用Python2。Python3为带前导零的整数文本引发了SyntaxError:invalid token
。为了可读性,我添加了零,没有进行检查。改用spaces now.FWIW,我添加了一个更快的版本(并在我的答案中添加了一些计时代码)。如果复制了任何元素,请从两个相应的子列表中选择与复制元素不同的最小数字。查找min()。你至少试过做些什么吗?我想你正在使用Python2。Python3为带前导零的整数文本引发了SyntaxError:invalid token
。为了可读性,我添加了零,没有进行检查。使用spaces now代替.FWIW,我添加了一个更快的版本(在我的答案中添加了一些计时代码)。您可以将其作为嵌套列表理解,不需要额外的,因为我在l
中悬挂在外面。@AkshatMahajan列表理解可能会使代码更加混乱,尤其是在这种情况下IMO[[1,2,3],[1,18,19]]也不适用于您的解决方案。一种方法是对列表的每一个可能的排序都使用您的方法。但这是糟糕的复杂性well@PM2RingAlburkerk是对的,我添加了另一个解决方案,它基本上尝试了所有的组合-不那么简单,但现在应该可以工作了。是的,你的新代码是正确的。我一直在试图找到一种更智能的算法,知道什么时候停止搜索,但到目前为止我运气不好。FWIW,我在我的答案中添加了一个新版本和一些计时代码。您可以将其作为嵌套列表理解,不需要额外的,因为我在l
中悬挂在外面。@AkshatMahajan列表理解可能会使代码更加混乱,尤其是在这种情况下,IMO[[1,2,3],[1,18,19]]也不能与您的解决方案一起工作。一种方法是对列表的每一个可能的排序都使用您的方法。但这是糟糕的复杂性well@PM2RingAlburkerk是对的,我添加了另一个解决方案,它基本上尝试了所有的组合-不那么简单,但现在应该可以工作了。是的,你的新代码是正确的。我一直在试图找到一种更智能的算法,知道什么时候停止搜索,但到目前为止我运气不好。FWIW,我在答案中添加了一个新版本和一些计时代码。