Python 获取k个排序数组的交集最有效的方法是什么？_Python_Python 3.x_Algorithm

Python 获取k个排序数组的交集最有效的方法是什么？

python python-3.x algorithm

Python 获取k个排序数组的交集最有效的方法是什么？,python,python-3.x,algorithm,Python,Python 3.x,Algorithm,给定k个排序数组，获取这些列表交集的最有效方法是什么范例输入： [[1,3,5,7], [1,1,3,5,7], [1,4,7,9]] [1,7] 输出： [[1,3,5,7], [1,1,3,5,7], [1,4,7,9]] [1,7] 有一种方法可以根据我在nlogk时代的《编程访谈元素》一书中读到的内容，得到k个排序数组的并集。我想知道是否有一种方法可以在十字路口做类似的事情 ## merge sorted arrays in nlogk time [ regular ap

给定k个排序数组，获取这些列表交集的最有效方法是什么

范例

输入：

[[1,3,5,7], [1,1,3,5,7], [1,4,7,9]]

[1,7]

输出：

[[1,3,5,7], [1,1,3,5,7], [1,4,7,9]]

[1,7]

有一种方法可以根据我在nlogk时代的《编程访谈元素》一书中读到的内容，得到k个排序数组的并集。我想知道是否有一种方法可以在十字路口做类似的事情

## merge sorted arrays in nlogk time [ regular appending and merging is nlogn time ]
import heapq
def mergeArys(srtd_arys):
    heap = []
    srtd_iters = [iter(x) for x in srtd_arys]
    
    # put the first element from each srtd array onto the heap
    for idx, it in enumerate(srtd_iters):
        elem = next(it, None)
        if elem:
            heapq.heappush(heap, (elem, idx))
    
    res = []
 
    # collect results in nlogK time
    while heap:
        elem, ary = heapq.heappop(heap)
        it = srtd_iters[ary]
        res.append(elem)
        nxt = next(it, None)
        if nxt:
            heapq.heappush(heap, (nxt, ary))

编辑：显然，这是我试图解决的一个算法问题，因此我无法使用任何内置函数，如设置交点等您可以使用

reduce

：

from functools import reduce

a = [[1,3,5,7],[1,1,3,5,7],[1,4,7,9]] 
reduce(lambda x, y: x & set(y), a[1:], set(a[0]))
 {1, 7}

可以使用内置集合和集合交点：

d=[[1,3,5,7]，[1,1,3,5,7]，[1,4,7,9]]
结果=集合（d[0]）。交点（*d[1:]）
{1, 7}

利用排序顺序这里是一种O（n）方法，除了一个迭代器和每个子列表一个值的基本要求外，它不需要任何特殊的数据结构或辅助内存：

from itertools import cycle

def intersection(data):
    ITERATOR, VALUE = 0, 1
    n = len(data)
    result = []
    try:
        pairs = cycle([(it := iter(sublist)), next(it)] for sublist in data)
        pair = next(pairs)
        curr = pair[VALUE]  # Candidate is the largest value seen so far
        matches = 1         # Number of pairs where the candidate occurs
        while True:
            iterator, value = pair = next(pairs)
            while value < curr:
                value = next(iterator)
            pair[VALUE] = value
            if value > curr:
                curr, matches = value, 1
                continue
            matches += 1
            if matches != n:
                continue
            result.append(curr)
            while (value := next(iterator)) == curr:
                pass
            pair[VALUE] = value
            curr, matches = value, 1
    except StopIteration:
        return result

文字算法该算法围绕迭代器、值对循环。如果某个值在所有对中都匹配，则该值属于交点。如果某个值低于迄今为止看到的任何其他值，则当前迭代器处于高级状态。如果某个值大于迄今为止看到的任何值，则它将成为新目标，并且匹配计数将重置为1。当任何迭代器耗尽时，算法完成

不依赖于内置函数使用是完全可选的。通过增加在末尾环绕的索引，可以很容易地模拟它

而不是：

iterator, value = pair = next(pairs)

你可以写：

pairnum += 1
if pairnum == n:
    pairnum = 0
iterator, value = pair = pairs[pairnum]

或者更紧凑地说：

pairnum = (pairnum + 1) % n
iterator, value = pair = pairs[pairnum]

重复值如果要保留重复（如多集），则很容易修改，只需更改

result.append（curr）

后的四行即可从每个迭代器中删除匹配元素：

def intersection(data):
    ITERATOR, VALUE = 0, 1
    n = len(data)
    result = []
    try:
        pairs = cycle([(it := iter(sublist)), next(it)] for sublist in data)
        pair = next(pairs)
        curr = pair[VALUE]  # Candidate is the largest value seen so far
        matches = 1         # Number of pairs where the candidate occurs
        while True:
            iterator, value = pair = next(pairs)
            while value < curr:
                value = next(iterator)
            pair[VALUE] = value
            if value > curr:
                curr, matches = value, 1
                continue
            matches += 1
            if matches != n:
                continue
            result.append(curr)
            for i in range(n):
                iterator, value = pair = next(pairs)
                pair[VALUE] = next(iterator)
            curr, matches = pair[VALUE], 1
    except StopIteration:
        return result

def交叉口（数据）：
迭代器，值=0，1
n=len（数据）
结果=[]
尝试：
pairs=周期（[（it:=iter（子列表）），下一个（it）]用于数据中的子列表）
配对=下一个（配对）
curr=pair[VALUE]#候选者是迄今为止看到的最大值
匹配=1#候选出现的对数
尽管如此：
迭代器，值=对=下一个（对）
当值<当前值时：
值=下一个（迭代器）
对[值]=值
如果值>当前值：
curr，matches=value，1
持续
匹配项+=1
如果匹配！=n:
持续
结果追加（curr）
对于范围（n）中的i：
迭代器，值=对=下一个（对）
pair[VALUE]=next（迭代器）
curr，matches=pair[VALUE]，1
除停止迭代外：
返回结果

是的，这是可能的！我已经修改了您的示例代码来实现这一点

我的回答假设您的问题是关于算法的-如果您想要使用

set

s运行最快的代码，请参阅其他答案

这将保持

O（n log（k））

时间复杂度：所有

之间的代码如果最低！=元素还是元素！=看到的次数：

和

unbench\u all=False

是

O（log（k））

。主循环中有一个嵌套的循环（

用于范围内的无边界（times\u seen）：

），但它只运行

times\u seen

次，并且

times\u seen

最初为0，每次运行此内部循环后重置为0，并且每次主循环迭代只能递增一次，因此，内部循环的总迭代次数不能超过主循环。因此，由于内循环中的代码是

O（log（k））

并且运行次数最多与外循环相同，而外循环是

O（log（k））

并且运行次数是

，因此算法是

O（nlog（k））

该算法依赖于Python中元组的比较方式。它比较元组的第一个项，如果它们相等，则比较第二个项（即

（x，a）<（x，b）

为真，当且仅当

）。
在该算法中，与问题中的示例代码不同，当从堆中弹出一个项时，不一定在同一次迭代中再次推送它。因为我们需要检查所有子列表是否包含相同的数字，所以从堆中弹出一个数字后，它的子列表就是我所说的“benched”，这意味着它不会被添加回堆中。这是因为我们需要检查其他子列表是否包含相同的项，因此现在不需要添加此子列表的下一项
如果一个数字确实在所有子列表中，那么堆将看起来像[（2,0），（2,1），（2,2），（2,3）]
，元组的所有第一个元素都相同，因此heappop
将选择子列表索引最低的一个。这意味着第一个索引0将被弹出，而所见次数
将增加到1，然后索引1将被弹出，而所见次数
将增加到2-如果ary
不等于所见次数
，则该数字不在所有子列表的交叉处。这将导致条件如果最低！=元素还是元素！=times\u seen:
，它决定了数字何时不应出现在结果中。此if
语句的else
分支用于它可能仍在结果中的时间
unbench_all
布尔值适用于需要从工作台上删除所有子列表的情况-这可能是因为：
已知当前编号不在子列表的交点处
已知它位于子列表的交叉点
当unbench_all
为True
时，将重新添加从堆中删除的所有子列表。众所周知，这些是索引在范围（times_seen）内的项目，因为算法仅在项目数相同时才从堆中移除项目，因此必须按顺序移除项目
def mergeArys(srtd_arys):
    heap = []
    srtd_iters = [iter(x) for x in srtd_arys]

    # put the first element from each srtd array onto the heap
    for idx, it in enumerate(srtd_iters):
        elem = next(it, None)
        if elem:
            heapq.heappush(heap, (elem, idx))

    res = []

    # collect results in nlogK time
    while heap:
        elem, ary = heap[0]
        lowest = elem
        keep_elem = True
        for i in range(len(srtd_arys)):
            elem, ary = heap[0]
            if lowest != elem or ary != i:
                if ary != i:
                    heapq.heappop(heap)
                    it = srtd_iters[ary]
                    nxt = next(it, None)
                    if nxt:
                        heapq.heappush(heap, (nxt, ary))

                keep_elem = False
                i -= 1
                break
            heapq.heappop(heap)

        if keep_elem:
            res.append(elem)

        for unbenched in range(i+1):
            unbenched_it = srtd_iters[unbenched]
            nxt = next(unbenched_it, None)
            if nxt:
                heapq.heappush(heap, (nxt, unbenched))

        if len(heap) < len(srtd_arys):
            heap = []

    return res


  inter = []

  for n in range(len(arrays[0])):
    if indexes[0] >= len(arrays[0]):
        return inter
    for i in range(1,k):
      if indexes[i] >= len(arrays[i]):
        return inter
      while indexes[i] < len(arrays[i]) and arrays[i][indexes[i]] < arrays[0][indexes[0]]:
        indexes[i] += 1
      while indexes[i] < len(arrays[i]) and indexes[0] < len(arrays[0]) and arrays[i][indexes[i]] > arrays[0][indexes[0]]:
        indexes[0] += 1
    if indexes[0] < len(arrays[0]):
      inter.append(arrays[0][indexes[0]])
    indexes = [idx+1 for idx in indexes]
  return inter

problem = [[1,3,5,7],[1,1,3,5,8,7],[1,4,7,9]];

debruijn = [0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
    31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9];
u32 = accum = (1 << 32) - 1;
for vec in problem:
    maxterm = 0;
    for v in vec:
        maxterm |= 1 << v;
    accum &= maxterm;

# https://graphics.stanford.edu/~seander/bithacks.html#IntegerLogDeBruijn
result = [];
while accum:
    power = accum;
    accum &= accum - 1; # Peter Wegner CACM 3 (1960), 322
    power &= ~accum;
    result.append(debruijn[((power * 0x077CB531) & u32) >> 27]);

print result;

arrays = [[1,3,5,7],[1,1,3,5,7],[1,4,7,9]]
counts = {}

for ar in arrays:
  last = None
  for i in ar:
    if (i != last):
      counts[i] = counts.get(i, 0) + 1
    last = i

N = len(arrays)
intersection = [i for i, n in counts.iteritems() if n == N]
print intersection

def counter(my_list):
    my_list = sorted(my_list)
    first_val, *all_val = my_list
    p_index = my_list.index(first_val)
    my_counter = {}
    for item in all_val:
         c_index = my_list.index(item)
         diff = abs(c_index-p_index)
         p_index = c_index
         my_counter[first_val] = diff 
         first_val = item
    c_index = my_list.index(item)
    diff = len(my_list) - c_index
    my_counter[first_val] = diff 
    return my_counter

def my_func(data):
    if not data or not isinstance(data, list):
        return
    # get the first value
    first_val, *all_val = data
    if not isinstance(first_val, list):
        return
    # count items in first value
    p = counter(first_val) # counter({1: 2, 3: 1, 5: 1, 7: 1})
    # collect all common items and calculate the minimum occurance in intersection
    for val in all_val:
        # collecting common items
        c = counter(val)
        # calculate the minimum occurance in intersection
        inner_dict = {}
        for inner_val in set(c).intersection(set(p)):
            inner_dict[inner_val] = min(p[inner_val], c[inner_val])
        p = inner_dict
    # >>>p
    # {1: 2, 7: 1}
    # Sort by keys of counter
    sorted_items = sorted(p.items(), key=lambda x:x[0]) # [(1, 2), (7, 1)]
    result=[i[0] for i in sorted_items for _ in range(i[1])] # [1, 1, 7]
    return result

>>> data = [[1,3,5,7],[1,1,3,5,7],[1,4,7,9]]
>>> my_func(data=data)
[1, 7]
>>> data = [[1,1,3,5,7],[1,1,3,5,7],[1,1,4,7,9]]
>>> my_func(data=data)
[1, 1, 7]

from heapq import merge
from itertools import groupby, chain

ls = [[1, 3, 5, 7], [1, 1, 3, 5, 7], [1, 4, 7, 9]]


def index_groups(lst):
    """[1, 1, 3, 5, 7] -> [(1, 0), (1, 1), (3, 0), (5, 0), (7, 0)]"""
    return chain.from_iterable(((e, i) for i, e in enumerate(group)) for k, group in groupby(lst))


iterables = (index_groups(li) for li in ls)
flat = merge(*iterables)
res = [k for (k, _), g in groupby(flat) if sum(1 for _ in g) == len(ls)]
print(res)

[1, 7]

ls = [[1, 1, 3, 5, 7], [1, 1, 3, 5, 7], [1, 1, 4, 7, 9]]

[1, 1, 7]

def intersection(iterables):
    target, count = None, 0
    for it in itertools.cycle(map(iter, iterables)):
        for value in it:
            if count == 0 or value > target:
                target, count = value, 1
                break
            if value == target:
                count += 1
                break
        else:  # exhausted iterator
            return
        if count >= len(iterables):
            yield target
            count = 0

def intersection(seqs):
    seq = min(seqs, key=len)
    if not seq:
        return
    pivot = seq[len(seq) // 2]
    lows, counts, highs = [], [], []
    for seq in seqs:
        start = bisect.bisect_left(seq, pivot)
        stop = bisect.bisect_right(seq, pivot, start)
        lows.append(seq[:start])
        counts.append(stop - start)
        highs.append(seq[stop:])
    yield from intersection(lows)
    yield from itertools.repeat(pivot, min(counts))
    yield from intersection(highs)

def find_welfare_crook(f, g, h, i, j, k):
    """f, g, and h are "ascending functions," i.e.,
i <= j implies f[i] <= f[j] or, equivalently,
f[i] < f[j] implies i < j, and the same goes for g and h.
i, j, k define where to start the search in each list.
"""
    # This is an implementation of a solution to the Welfare Crook
    # problems presented in David Gries's book, The Science of Programming.
    # The surprising and beautiful thing is that the guard predicates are
    # so few and so simple.
    i , j , k = i , j , k
    while True:
        if f[i] < g[j]:
            i += 1
        elif g[j] < h[k]:
            j += 1
        elif h[k] < f[i]:
            k += 1
        else:
            break
    return (i,j,k)
    # The other remarkable thing is how the negation of the guard
    # predicates works out to be:  f[i] == g[j] and g[j] == c[k].

def findIntersectionLofL(lofl):
    """Generalized findIntersection function which operates on a "list of lists." """
    K = len(lofl)
    indices = [0 for i in range(K)]
    result = []
    #
    try:
        while True:
            # idea is to maintain the indices via a construct like the following:
            allEqual = True
            for i in range(K):
                if lofl[i][indices[i]] < lofl[(i+1)%K][indices[(i+1)%K]] :
                    indices[i] += 1
                    allEqual = False
            # When the above iteration finishes, if all of the list
            # items indexed by the indices are equal, then another
            # item common to all of the lists must be added to the result.
            if allEqual :
                result.append(lofl[0][indices[0]])
                while lofl[0][indices[0]] == lofl[1][indices[1]]:
                    indices[0] += 1
    except IndexError as e:
        # Eventually, the foregoing iteration will advance one of the
        # indices past the end of one of the lists, and when that happens
        # an IndexError exception will be raised.  This means the algorithm
        # is finished.
        return result

def findIntersectionLofLunRolled(lofl):
    """Generalized findIntersection function which operates on a "list of lists."
Accepts a list-of-lists, lofl.  Each of the lists must be ordered.
Returns the list of each element which appears in all of the lists at least once.
"""
    K = len(lofl)
    indices = [0] * K
    result = []
    lt = [ (i, (i+1) % K) for i in range(K) ] # avoids evaluation of index exprs inside the loop
    #
    try:
        while True:
            allUnEqual = True
            while allUnEqual:
                allUnEqual = False
                for i,j in lt:
                    if lofl[i][indices[i]] < lofl[j][indices[j]]:
                        indices[i] += 1
                        allUnEqual = True
            # Now all of the lofl[i][indices[i]], for all i, are the same value.
            # Store that value in the result, and then advance all of the indices
            # past that common value:
            v = lofl[0][indices[0]]
            result.append(v)
            for i,j in lt:
                while lofl[i][indices[i]] == v:
                    indices[i] += 1
    except IndexError as e:
        # Eventually, the foregoing iteration will advance one of the
        # indices past the end of one of the lists, and when that happens
        # an IndexError exception will be raised.  This means the algorithm
        # is finished.
        return result