Python 如何从不包含重复的列表中查找一组值

Python 如何从不包含重复的列表中查找一组值,python,Python,在Python中有一个列表,类似于: l = [[ 1, 2, 3], [18, 20, 22], [ 3, 14, 16], [ 1, 3, 05], [18, 2, 16]] 您将如何从每个子列表中选择一个值,以便不重复单个值,并使结果列表的总和最小化 result = [1, 18, 3, 5, 2] 编辑:我以前的解决方案只在大多数情况下有效,这应该在所有情况下都有效: from itertools import product l =

在Python中有一个列表,类似于:

l = [[ 1,  2,  3],
     [18, 20, 22],
     [ 3, 14, 16],
     [ 1,  3, 05],
     [18,  2, 16]]
您将如何从每个子列表中选择一个值,以便不重复单个值,并使结果列表的总和最小化

result = [1, 18, 3, 5, 2]

编辑:我以前的解决方案只在大多数情况下有效,这应该在所有情况下都有效:

from itertools import product
l = [[1, 2, 3], [18, 20, 22], [3, 14, 16], [1, 3, 5], [18, 2, 16]]

result = None
for item in product(*l):
    if len(item) > len(set(item)):
        # Try the next combination if there are duplicates
        continue
    if result is None or sum(result) > sum(item):
        result = item
print(result)
输出

(1, 18, 3, 5, 2)
(1, 18, 3, 5, 2)
seed 22290
Number of selections: 4096
[
    [6, 11, 22, 23],
    [9, 14, 17, 19],
    [5, 9, 16, 22],
    [5, 6, 9, 13],
    [1, 3, 6, 22],
    [4, 5, 6, 13],
]

solve_Maurice  (11, 9, 5, 6, 1, 4)
solve_prodgen  (11, 9, 5, 6, 1, 4)
solve_recgen   [11, 9, 5, 6, 1, 4]

solve_recgen   [0.5476037560001714, 0.549133045002236, 0.5647858490046929]
solve_prodgen  [1.2500368960027117, 1.296529343999282, 1.3022710209988873]
solve_Maurice  [1.485518219997175, 1.489505891004228, 1.784105566002836]

编辑:我以前的解决方案只在大多数情况下有效,这应该在所有情况下都有效:

from itertools import product
l = [[1, 2, 3], [18, 20, 22], [3, 14, 16], [1, 3, 5], [18, 2, 16]]

result = None
for item in product(*l):
    if len(item) > len(set(item)):
        # Try the next combination if there are duplicates
        continue
    if result is None or sum(result) > sum(item):
        result = item
print(result)
输出

(1, 18, 3, 5, 2)
(1, 18, 3, 5, 2)
seed 22290
Number of selections: 4096
[
    [6, 11, 22, 23],
    [9, 14, 17, 19],
    [5, 9, 16, 22],
    [5, 6, 9, 13],
    [1, 3, 6, 22],
    [4, 5, 6, 13],
]

solve_Maurice  (11, 9, 5, 6, 1, 4)
solve_prodgen  (11, 9, 5, 6, 1, 4)
solve_recgen   [11, 9, 5, 6, 1, 4]

solve_recgen   [0.5476037560001714, 0.549133045002236, 0.5647858490046929]
solve_prodgen  [1.2500368960027117, 1.296529343999282, 1.3022710209988873]
solve_Maurice  [1.485518219997175, 1.489505891004228, 1.784105566002836]

这里有一个紧凑的暴力解决方案,因此它必须执行
列**行
测试,这是不好的。我怀疑有一种算法通常更有效,但在最坏的情况下,可能需要检查所有的可能性

from itertools import product

lst = [
    [ 1,  2,  3],
    [18, 20, 22],
    [ 3, 14, 16],
    [ 1,  3,  5],
    [18,  2, 16],
]

nrows = len(lst) 
m = min((t for t in product(*lst) if len(set(t)) == nrows), key=sum)
print(m)
输出

(1, 18, 3, 5, 2)
(1, 18, 3, 5, 2)
seed 22290
Number of selections: 4096
[
    [6, 11, 22, 23],
    [9, 14, 17, 19],
    [5, 9, 16, 22],
    [5, 6, 9, 13],
    [1, 3, 6, 22],
    [4, 5, 6, 13],
]

solve_Maurice  (11, 9, 5, 6, 1, 4)
solve_prodgen  (11, 9, 5, 6, 1, 4)
solve_recgen   [11, 9, 5, 6, 1, 4]

solve_recgen   [0.5476037560001714, 0.549133045002236, 0.5647858490046929]
solve_prodgen  [1.2500368960027117, 1.296529343999282, 1.3022710209988873]
solve_Maurice  [1.485518219997175, 1.489505891004228, 1.784105566002836]

这里有一个更快的版本,它使用递归生成器而不是
itertools.product

def select(data, seq):
    if data:
        for seq in select(data[:-1], seq):
            for u in data[-1]:
                if u not in seq:
                    yield seq + [u]
    else:
        yield seq

def solve(data):
    return min(select(data, []), key=sum)
这里是递归生成器的一个修改版本,它可以按顺序进行排序,但当然速度较慢,而且会消耗更多的RAM。如果对输入数据进行排序,它通常会很快找到最小值,但我无法找到一种简单的方法,在找到最小值时让它停止

def select(data, selected):
    if data:
        for selected in sorted(select(data[:-1], selected), key=sum):
            for u in data[-1]:
                if u not in selected:
                    yield selected + [u]
    else:
        yield selected

这里有一些计时代码,比较了Maurice和我的解决方案的速度。它在Python2和Python3上运行。在运行旧的Debian Linux版本的2GHz 32位机器上,我在Python2.6和Python3.6上得到了类似的时间结果

from __future__ import print_function, division
from timeit import Timer
from itertools import product
from random import seed, sample, randrange

n = randrange(0, 1 << 32)
print('seed', n)
seed(n)

def show(data):
    indent = ' ' * 4
    s = '\n'.join(['{0}{1},'.format(indent, row) for row in data])
    print('[\n{0}\n]\n'.format(s))

def make_data(rows, cols):
    maxn = rows * cols
    nums = range(1, maxn)
    return [sample(nums, cols) for _ in range(rows)]

def sort_data(data):
    newdata = [sorted(row) for row in data]
    newdata.sort(reverse=True, key=sum)
    return newdata

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

def solve_Maurice(data):
    result = None
    for item in product(*data):
        if len(item) > len(set(item)):
            # Try the next combination if there are duplicates
            continue
        if result is None or sum(result) > sum(item):
            result = item
    return result

def solve_prodgen(data):
    rows = len(data) 
    return min((t for t in product(*data) if len(set(t)) == rows), key=sum)

def select(data, seq):
    if data:
        for seq in select(data[:-1], seq):
            for u in data[-1]:
                if u not in seq:
                    yield seq + [u]
    else:
        yield seq

def solve_recgen(data):
    return min(select(data, []), key=sum)

funcs = (
    solve_Maurice,
    solve_prodgen,
    solve_recgen,
)

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

def verify():
    for func in funcs:
        fname = func.__name__
        seq = func(data)
        print('{0:14} {1}'.format(fname, seq))
    print()

def time_test(loops, reps):
    ''' Print timing stats for all the functions '''
    timings = []
    for func in funcs:
        fname = func.__name__
        setup = 'from __main__ import data, ' + fname
        cmd = fname + '(data)'
        t = Timer(cmd, setup)
        result = t.repeat(reps, loops)
        result.sort()
        timings.append((result, fname))

    timings.sort()
    for result, fname in timings:
        print('{0:14} {1}'.format(fname, result))

rows, cols = 6, 4
print('Number of selections:', cols ** rows)

data = make_data(rows, cols)
data = sort_data(data)
show(data)

verify()

loops, reps = 100, 3
time_test(loops, reps)

这里有一个紧凑的暴力解决方案,因此它必须执行
列**行
测试,这是不好的。我怀疑有一种算法通常更有效,但在最坏的情况下,可能需要检查所有的可能性

from itertools import product

lst = [
    [ 1,  2,  3],
    [18, 20, 22],
    [ 3, 14, 16],
    [ 1,  3,  5],
    [18,  2, 16],
]

nrows = len(lst) 
m = min((t for t in product(*lst) if len(set(t)) == nrows), key=sum)
print(m)
输出

(1, 18, 3, 5, 2)
(1, 18, 3, 5, 2)
seed 22290
Number of selections: 4096
[
    [6, 11, 22, 23],
    [9, 14, 17, 19],
    [5, 9, 16, 22],
    [5, 6, 9, 13],
    [1, 3, 6, 22],
    [4, 5, 6, 13],
]

solve_Maurice  (11, 9, 5, 6, 1, 4)
solve_prodgen  (11, 9, 5, 6, 1, 4)
solve_recgen   [11, 9, 5, 6, 1, 4]

solve_recgen   [0.5476037560001714, 0.549133045002236, 0.5647858490046929]
solve_prodgen  [1.2500368960027117, 1.296529343999282, 1.3022710209988873]
solve_Maurice  [1.485518219997175, 1.489505891004228, 1.784105566002836]

这里有一个更快的版本,它使用递归生成器而不是
itertools.product

def select(data, seq):
    if data:
        for seq in select(data[:-1], seq):
            for u in data[-1]:
                if u not in seq:
                    yield seq + [u]
    else:
        yield seq

def solve(data):
    return min(select(data, []), key=sum)
这里是递归生成器的一个修改版本,它可以按顺序进行排序,但当然速度较慢,而且会消耗更多的RAM。如果对输入数据进行排序,它通常会很快找到最小值,但我无法找到一种简单的方法,在找到最小值时让它停止

def select(data, selected):
    if data:
        for selected in sorted(select(data[:-1], selected), key=sum):
            for u in data[-1]:
                if u not in selected:
                    yield selected + [u]
    else:
        yield selected

这里有一些计时代码,比较了Maurice和我的解决方案的速度。它在Python2和Python3上运行。在运行旧的Debian Linux版本的2GHz 32位机器上,我在Python2.6和Python3.6上得到了类似的时间结果

from __future__ import print_function, division
from timeit import Timer
from itertools import product
from random import seed, sample, randrange

n = randrange(0, 1 << 32)
print('seed', n)
seed(n)

def show(data):
    indent = ' ' * 4
    s = '\n'.join(['{0}{1},'.format(indent, row) for row in data])
    print('[\n{0}\n]\n'.format(s))

def make_data(rows, cols):
    maxn = rows * cols
    nums = range(1, maxn)
    return [sample(nums, cols) for _ in range(rows)]

def sort_data(data):
    newdata = [sorted(row) for row in data]
    newdata.sort(reverse=True, key=sum)
    return newdata

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

def solve_Maurice(data):
    result = None
    for item in product(*data):
        if len(item) > len(set(item)):
            # Try the next combination if there are duplicates
            continue
        if result is None or sum(result) > sum(item):
            result = item
    return result

def solve_prodgen(data):
    rows = len(data) 
    return min((t for t in product(*data) if len(set(t)) == rows), key=sum)

def select(data, seq):
    if data:
        for seq in select(data[:-1], seq):
            for u in data[-1]:
                if u not in seq:
                    yield seq + [u]
    else:
        yield seq

def solve_recgen(data):
    return min(select(data, []), key=sum)

funcs = (
    solve_Maurice,
    solve_prodgen,
    solve_recgen,
)

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

def verify():
    for func in funcs:
        fname = func.__name__
        seq = func(data)
        print('{0:14} {1}'.format(fname, seq))
    print()

def time_test(loops, reps):
    ''' Print timing stats for all the functions '''
    timings = []
    for func in funcs:
        fname = func.__name__
        setup = 'from __main__ import data, ' + fname
        cmd = fname + '(data)'
        t = Timer(cmd, setup)
        result = t.repeat(reps, loops)
        result.sort()
        timings.append((result, fname))

    timings.sort()
    for result, fname in timings:
        print('{0:14} {1}'.format(fname, result))

rows, cols = 6, 4
print('Number of selections:', cols ** rows)

data = make_data(rows, cols)
data = sort_data(data)
show(data)

verify()

loops, reps = 100, 3
time_test(loops, reps)


假设全部为正:在每个子列表中选择最小值。如果复制了任何元素,请从两个相应的子列表中选择与复制元素不同的最小数字。查找
min()。你至少试过做些什么吗?我想你正在使用Python2。Python3为带前导零的整数文本引发了
SyntaxError:invalid token
。为了可读性,我添加了零,没有进行检查。改用spaces now.FWIW,我添加了一个更快的版本(并在我的答案中添加了一些计时代码)。如果复制了任何元素,请从两个相应的子列表中选择与复制元素不同的最小数字。查找
min()。你至少试过做些什么吗?我想你正在使用Python2。Python3为带前导零的整数文本引发了
SyntaxError:invalid token
。为了可读性,我添加了零,没有进行检查。使用spaces now代替.FWIW,我添加了一个更快的版本(在我的答案中添加了一些计时代码)。您可以将其作为嵌套列表理解,不需要额外的
,因为我在l
中悬挂在外面。@AkshatMahajan列表理解可能会使代码更加混乱,尤其是在这种情况下IMO[[1,2,3],[1,18,19]]也不适用于您的解决方案。一种方法是对列表的每一个可能的排序都使用您的方法。但这是糟糕的复杂性well@PM2RingAlburkerk是对的,我添加了另一个解决方案,它基本上尝试了所有的组合-不那么简单,但现在应该可以工作了。是的,你的新代码是正确的。我一直在试图找到一种更智能的算法,知道什么时候停止搜索,但到目前为止我运气不好。FWIW,我在我的答案中添加了一个新版本和一些计时代码。您可以将其作为嵌套列表理解,不需要额外的
,因为我在l
中悬挂在外面。@AkshatMahajan列表理解可能会使代码更加混乱,尤其是在这种情况下,IMO[[1,2,3],[1,18,19]]也不能与您的解决方案一起工作。一种方法是对列表的每一个可能的排序都使用您的方法。但这是糟糕的复杂性well@PM2RingAlburkerk是对的,我添加了另一个解决方案,它基本上尝试了所有的组合-不那么简单,但现在应该可以工作了。是的,你的新代码是正确的。我一直在试图找到一种更智能的算法,知道什么时候停止搜索,但到目前为止我运气不好。FWIW,我在答案中添加了一个新版本和一些计时代码。