创建一个Python生成器，从两个大列表中生成整数的有序乘积_Python_Python 3.x_Iterator_Generator

创建一个Python生成器，从两个大列表中生成整数的有序乘积

python python-3.x

创建一个Python生成器，从两个大列表中生成整数的有序乘积,python,python-3.x,iterator,generator,Python,Python 3.x,Iterator,Generator,因此，我有两个非常大的数字列表l1和l2。我想将l1的每个元素与l2的每个元素相乘，而无需明确创建新的产品列表。因此，我想要一台发电机。这部分很简单。我可以做一些类似的事情 for a in l1: for b in l2: yield a * b 然而，我也需要这些产品的数量。我想知道是否有一些聪明的技巧来对收益率语句进行排序，这样就可以用生成器来完成。在Python 3中，如果可能的话。谢谢。似乎没有其他方法可以在不创建列表的情况下对这些输出进行排序，因为如果不存储

因此，我有两个非常大的数字列表

l1

和

l2

。我想将

l1

的每个元素与

l2

的每个元素相乘，而无需明确创建新的产品列表。因此，我想要一台发电机。这部分很简单。我可以做一些类似的事情

for a in l1:
    for b in l2:
        yield a * b

然而，我也需要这些产品的数量。我想知道是否有一些聪明的技巧来对收益率语句进行排序，这样就可以用生成器来完成。在Python 3中，如果可能的话。谢谢。

似乎没有其他方法可以在不创建列表的情况下对这些输出进行排序，因为如果不存储输出，就无法对其进行排序。下面是你如何做到这一点的

myList = []

for i in range(len(l1)):
    for j in range(len(l2)):
        output = l1[i] * l2[j]
        myList.append(output)
myList.sort()
print(myList)

希望有帮助。

我将调用列表

xs

和

ys

，并假设它们已排序。正如您在评论中所指出的，最小的乘积必然是

xs[0]*ys[0]

——但前提是您还假设所有数字都是非负数，所以我也会这样假设

在第一个产品之后，它会变得更混乱-否则你已经解决了它；-）接下来要考虑的是<代码> XS（0）*YS（1）< /代码>和<代码> XS（1）*YS（0）< /代码>。足够简单，但是接下来要考虑的是哪一个赢得了。如果

xs[0]*ys[1]

获胜，您只需将其替换为

xs[0]*ys[2]

，但如果

xs[1]*ys[0]

获胜，则

xs[1]*ys[1]

和

xs[2]*ys[0]

都将发挥作用。等等

下面将跟踪堆中不断增长的可能性集合。堆中最多只能保存

len（xs）

项，因此代码首先安排

xs

作为较短的列表：

def upprod(xs, ys):
    # xs and ys must be sorted, and non-negative
    from heapq import heappush, heappop
    # make xs the shorter
    if len(ys) < len(xs):
        xs, ys = ys, xs
    if not xs:
        return
    lenxs = len(xs)
    lenys = len(ys)
    # the heap holds 4-tuples:
    #     (product, xs index, ys index, xs[xs index])
    h = [(xs[0] * ys[0], 0, 0, xs[0])]
    while h:
        prod, xi, yi, x = heappop(h)
        yield prod
        # same x with next y
        yi += 1
        if yi < lenys:
            heappush(h, (x * ys[yi], xi, yi, x))
        # if this is the first time we used x, start
        # the next x going
        if yi == 1:
            xi += 1
            if xi < lenxs:
                x = xs[xi]
                heappush(h, (x * ys[0], xi, 0, x))

编辑-理论上在某种意义上更好；-）以上并没有充分利用偏序，我们可以从单独的指数推断：如果

i1 <= i2 and j1 <= j2

与第一个代码相比，它在许多情况下使堆稍微小一些。但是堆操作需要的时间是堆条目数的对数，堆仍然可以增长到

len（xs）

entries，所以这算不上什么胜利。这可能会因为两个新函数调用的开销而丢失（而内联这些调用太难看了）。

我的解决方案是创建一个生成器列表，为乘积矩阵中的每行创建一个生成器，然后使用它对这些生成器的输出进行排序。在32位机器上，每个生成器的大小为44字节，因此整个生成器列表只消耗少量RAM

heapq.merge

（当没有提供排序键函数时）通过创建传递给它的每个ITerable的3元组来工作。该元组包含iterable的下一个值、iterable的索引号以及对iterable的

\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu。它将这些元组放在堆上，以执行对iterables值的合并排序。您可以在其Python中看到详细信息
因此，我的方法不像蒂姆·彼得斯的解决方案那样节俭，但也不太简陋，伊姆霍
这是我的老式2GHz 32位单核机器上的输出，它在旧的Debian派生Linux发行版上运行Python 3.6.0。YMMV
验证。maxlen=10
0 : 8 * 9 = 72
1 : 9 * 0 = 0
2 : 1 * 7 = 7
3 : 8 * 10 = 80
4 : 10 * 5 = 50
5 : 10 * 0 = 0
6 : 5 * 2 = 10
7 : 5 * 10 = 50
8 : 3 * 0 = 0
9 : 0 * 6 = 0
验证。maxlen=100
0 : 64 * 0 = 0
1 : 77 * 96 = 7392
2 : 24 * 13 = 312
3 : 53 * 39 = 2067
4 : 74 * 39 = 2886
5 : 92 * 97 = 8924
6 : 31 * 48 = 1488
7 : 39 * 17 = 663
8 : 42 * 25 = 1050
9 : 94 * 25 = 2350
10 : 82 * 83 = 6806
11 : 2 * 97 = 194
12 : 90 * 30 = 2700
13 : 93 * 24 = 2232
14 : 91 * 37 = 3367
15 : 24 * 86 = 2064
16 : 70 * 15 = 1050
17 : 2 * 4 = 8
18 : 72 * 58 = 4176
19 : 25 * 84 = 2100
时间安排
0:8192个循环。7 * 8 = 56
已排序的产品：0.659312、0.665853、0.710947
upprod1:1.695471、1.705061、1.739299
已排序的产品合并：1.9901161、1.991129、2.001242
分类产品诊断：3.013945、3.018927、3.053115
upprod2:3.582396、3.586332、3.622949
1:2048圈。18 * 16 = 288
排序的“产品”暴力：0.826128、0.840111、0.863559
upprod1:2.240931、2.241636、2.244615
已排序产品合并：2.301838、2.304075、2.306918
分类产品诊断：3.030672、3.053302、3.135322
upprod2:4.860378、4.949804、4.953891
2:512个循环。39 * 32 = 1248
已排序的产品：0.907932、0.918692、0.942830
已排序产品合并：2.559567、2.561709、2.604387
upprod1:2.700482、2.701147、2.757695
分类产品诊断：2.961776、2.965271、2.995747
upprod2:5.563303、5.654425、5.656695
3:128圈。68 * 70 = 4760
已排序的“产品”暴力：0.823448、0.827748、0.835049
已排序产品合并：2.591373、2.592134、2.685534
upprod1:2.760466、2.763615、2.795082
已排序的产品诊断：2.789673、2.828662、2.848498
upprod2:5.483504、5.488450、5.517847
4:32圈。122 * 156 = 19032
已排序的产品蛮力：0.873736、0.880958、0.892846
已排序产品合并：2.701089、2.742456、2.818822
upprod1:2.875358、2.881793、2.922569
分类产品诊断：2.953450、2.988184、3.012430
upprod2:5.780552、5.812967、5.826775
5:8圈。173 * 309 = 53457
已排序的产品：0.711012、0.711816、0.721627
已排序的产品合并：1.997386、1.999774、2.033489
upprod1:2.137337、2.172369、3.335119
分类产品诊断：2.324447、2.329552、2.331095
upprod2:4.278704、4.289019、4.324436
您的原始列表是否已排序？@DSM是的，如果有帮助的话，它们可以按顺序生成。@M.Haurer我在这里可能完全错了，但我认为您的要求似乎很奇怪。你要求的是一种在不存储产品的情况下动态创建产品的方法，但同时你希望以某种方式订购产品。我真的不明白你怎么可能对没有存储在一起的项目进行排序。@scharette要求
i1 <= i2 and j1 <= j2

xs[i1] * ys[j1] <= xs[i2] * ys[j2]

def upprod(xs, ys):
    # xs and ys must be sorted, and non-negative
    from heapq import heappush, heappop
    # make xs the shorter
    if len(ys) < len(xs):
        xs, ys = ys, xs
    if not xs:
        return
    lenxs = len(xs)
    lenys = len(ys)
    # the heap holds 3-tuples:
    #     (product, xs index, ys index)
    h = [(xs[0] * ys[0], 0, 0)]

    # interior points for which only one immediate predecessor has
    # been processed; there's no need to put them in the heap
    # until their second predecessor has been processed too
    pending = set()

    def add(xi, yi):
        if xi < lenxs and yi < lenys:
            if xi and yi: # if either is 0, only one predecessor
                p = xi, yi
                if p in pending:
                    pending.remove(p)
                else:
                    pending.add(p)
                    return
            heappush(h, (xs[xi] * ys[yi], xi, yi))

    while h:
        prod, xi, yi = heappop(h)
        yield prod
        # same x with next y; and same y with next x
        add(xi, yi + 1)
        add(xi + 1, yi)
    assert not pending

def sorted_prod_merge(xs, ys):
    ''' mergesort generators of the rows. '''
    if len(ys) < len(xs):
        xs, ys = ys, xs
    def gen(x):
        for y in ys:
            yield x * y
    yield from merge(*[gen(x) for x in xs])

from heapq import heappush, heappop, merge
from random import seed, randrange
from timeit import Timer
from collections import deque

seed(163)

# Brute force method, as a generator
def sorted_prod_brute(xs, ys):
    yield from sorted(x * y for x in xs for y in ys)

# By Tim Peters
def upprod1(xs, ys):
    # xs and ys must be sorted, and non-negative
    from heapq import heappush, heappop
    # make xs the shorter
    if len(ys) < len(xs):
        xs, ys = ys, xs
    if not xs:
        return
    lenxs = len(xs)
    lenys = len(ys)
    # the heap holds 4-tuples:
    #     (product, xs index, ys index, xs[xs index])
    h = [(xs[0] * ys[0], 0, 0, xs[0])]
    while h:
        prod, xi, yi, x = heappop(h)
        yield prod
        # same x with next y
        yi += 1
        if yi < lenys:
            heappush(h, (x * ys[yi], xi, yi, x))
        # if this is the first time we used x, start
        # the next x going
        if yi == 1:
            xi += 1
            if xi < lenxs:
                x = xs[xi]
                heappush(h, (x * ys[0], xi, 0, x))

# By Tim Peters
def upprod2(xs, ys):
    # xs and ys must be sorted, and non-negative
    from heapq import heappush, heappop
    # make xs the shorter
    if len(ys) < len(xs):
        xs, ys = ys, xs
    if not xs:
        return
    lenxs = len(xs)
    lenys = len(ys)
    # the heap holds 3-tuples:
    #     (product, xs index, ys index)
    h = [(xs[0] * ys[0], 0, 0)]

    # interior points for which only one immediate predecessor has
    # been processed; there's no need to put them in the heap
    # until their second predecessor has been processed too
    pending = set()

    def add(xi, yi):
        if xi < lenxs and yi < lenys:
            doit = True
            if xi and yi: # if either is 0, only one predecessor
                p = xi, yi
                if p in pending:
                    pending.remove(p)
                else:
                    pending.add(p)
                    doit = False
            if doit:
                heappush(h, (xs[xi] * ys[yi], xi, yi))
    while h:
        prod, xi, yi = heappop(h)
        yield prod
        # same x with next y; and same y with next x
        add(xi, yi + 1)
        add(xi + 1, yi)
    assert not pending

def sorted_prod_merge(xs, ys):
    ''' mergesort generators of the rows. '''
    if len(ys) < len(xs):
        xs, ys = ys, xs
    def gen(x):
        for y in ys:
            yield x * y
    yield from merge(*[gen(x) for x in xs])

def sorted_prod_row(xs, ys):
    ''' Heapsort, row by row.
        Fast, but not space-efficient: the maximum 
        heap size grows to almost len(ys) * len(xs)
    '''
    if len(ys) < len(xs):
        xs, ys = ys, xs
    if not xs:
        return
    x, xs = xs[0], xs[1:]
    heap = []
    #big = 0
    for y in ys:
        lo = x * y
        while heap and heap[0] <= lo:
            yield heappop(heap)
        yield lo
        for u in xs:
            heappush(heap, u * y)
        #big = max(big, len(heap))
    #print(big)
    while heap:
        yield heappop(heap)

def sorted_prod_diag(xs, ys):
    ''' Heapsort, going along the diagonals
        50% slower than sorted_prod_row, but more
        space-efficient: the maximum heap size 
        grows to around 0.5 * len(ys) * len(xs)
    '''
    if not (xs and ys):
        return
    lenxs, lenys = len(xs), len(ys)
    heap = []
    #big = 0
    for n in range(lenxs + lenys - 1):
        row = sorted(xs[n - i] * ys[i]
            for i in range(max(0, n + 1 - lenxs), min(lenys, n + 1)))
        lo = row[0]
        while heap and heap[0] <= lo:
            yield heappop(heap)
        yield lo
        for u in row[1:]:
            heappush(heap, u)
        #big = max(big, len(heap))
    #print(big)
    #assert not heap

def sorted_prod_block(xs, ys):
    ''' yield the top left corner, then merge sort
        the top row, the left column and the remaining 
        block. So we end up with max(len(xs), len(ys))
        recursively nested calls to merge(). It's ok
        for small lists, but too slow otherwise.
    '''
    if not (xs and ys):
        return
    x, *xs = xs
    y, *ys = ys
    yield x * y
    row = (y * u for u in xs)
    col = (x * v for v in ys)
    yield from merge(row, col, sorted_prod_block(xs, ys))

def sorted_prod_blockI(xs, ys):
    ''' Similar to sorted_prod_block except we use indexing
        to avoid creating sliced copies of the lists
    '''
    lenxs, lenys = len(xs), len(ys)
    def sorted_block(xi, yi):
        if xi == lenxs or yi == lenys:
            return
        x, y = xs[xi], ys[yi]
        yield x * y
        xi, yi = xi + 1, yi + 1
        row = (xs[i] * y for i in range(xi, lenxs))
        col = (ys[i] * x for i in range(yi, lenys))
        yield from merge(row, col, sorted_block(xi, yi))
    yield from sorted_block(0, 0)

functions = (
    sorted_prod_brute,
    upprod1,
    upprod2,
    sorted_prod_merge,
    #sorted_prod_row,
    sorted_prod_diag,
    #sorted_prod_block,
    #sorted_prod_blockI,
)

UB = 1000

def verify(numtests, maxlen=10):
    print('Verifying. maxlen =', maxlen)
    for k in range(numtests):
        lenxs = randrange(maxlen + 1)
        lenys = randrange(maxlen + 1)
        print(k, ':', lenxs, '*', lenys, '=', lenxs * lenys)
        xs = sorted(randrange(UB) for i in range(lenxs))
        ys = sorted(randrange(UB) for i in range(lenys))
        good = list(sorted_prod_brute(xs, ys))

        for func in functions[1:]:
            result = list(func(xs, ys))
            if result != good:
                print(func.__name__, 'failed!')
    print()

def time_test(loops=20):
    timings = []
    for func in functions:
        # Consume the generator output by feeding it to a zero-length deque
        t = Timer(lambda: deque(func(xs, ys), maxlen=0))
        result = sorted(t.repeat(3, loops))
        timings.append((result, func.__name__))
    timings.sort()
    for result, name in timings:
        print('{:18} : {:.6f}, {:.6f}, {:.6f}'.format(name, *result))
    print()

verify(10, 10)
verify(20, 100)

print('\nTimings')
loops = 8192
minlen = 5
for k in range(6):
    lenxs = randrange(minlen, 2 * minlen)
    lenys = randrange(minlen, 2 * minlen)
    print(k, ':', loops, 'loops.', lenxs, '*', lenys, '=', lenxs * lenys)
    xs = sorted(randrange(UB) for i in range(lenxs))
    ys = sorted(randrange(UB) for i in range(lenys))
    time_test(loops)
    minlen *= 2
    loops //= 4