Python 获取k个排序数组的交集最有效的方法是什么?

Python 获取k个排序数组的交集最有效的方法是什么?,python,python-3.x,algorithm,Python,Python 3.x,Algorithm,给定k个排序数组,获取这些列表交集的最有效方法是什么 范例 输入: [[1,3,5,7], [1,1,3,5,7], [1,4,7,9]] [1,7] 输出: [[1,3,5,7], [1,1,3,5,7], [1,4,7,9]] [1,7] 有一种方法可以根据我在nlogk时代的《编程访谈元素》一书中读到的内容,得到k个排序数组的并集。我想知道是否有一种方法可以在十字路口做类似的事情 ## merge sorted arrays in nlogk time [ regular ap

给定k个排序数组,获取这些列表交集的最有效方法是什么

范例

输入:

[[1,3,5,7], [1,1,3,5,7], [1,4,7,9]] 
[1,7]
输出:

[[1,3,5,7], [1,1,3,5,7], [1,4,7,9]] 
[1,7]
有一种方法可以根据我在nlogk时代的《编程访谈元素》一书中读到的内容,得到k个排序数组的并集。我想知道是否有一种方法可以在十字路口做类似的事情

## merge sorted arrays in nlogk time [ regular appending and merging is nlogn time ]
import heapq
def mergeArys(srtd_arys):
    heap = []
    srtd_iters = [iter(x) for x in srtd_arys]
    
    # put the first element from each srtd array onto the heap
    for idx, it in enumerate(srtd_iters):
        elem = next(it, None)
        if elem:
            heapq.heappush(heap, (elem, idx))
    
    res = []
 
    # collect results in nlogK time
    while heap:
        elem, ary = heapq.heappop(heap)
        it = srtd_iters[ary]
        res.append(elem)
        nxt = next(it, None)
        if nxt:
            heapq.heappush(heap, (nxt, ary))

编辑:显然,这是我试图解决的一个算法问题,因此我无法使用任何内置函数,如设置交点等您可以使用
reduce

from functools import reduce

a = [[1,3,5,7],[1,1,3,5,7],[1,4,7,9]] 
reduce(lambda x, y: x & set(y), a[1:], set(a[0]))
 {1, 7}

可以使用内置集合和集合交点:

d=[[1,3,5,7],[1,1,3,5,7],[1,4,7,9]]
结果=集合(d[0])。交点(*d[1:])
{1, 7}
利用排序顺序 这里是一种O(n)方法,除了一个迭代器和每个子列表一个值的基本要求外,它不需要任何特殊的数据结构或辅助内存:

from itertools import cycle

def intersection(data):
    ITERATOR, VALUE = 0, 1
    n = len(data)
    result = []
    try:
        pairs = cycle([(it := iter(sublist)), next(it)] for sublist in data)
        pair = next(pairs)
        curr = pair[VALUE]  # Candidate is the largest value seen so far
        matches = 1         # Number of pairs where the candidate occurs
        while True:
            iterator, value = pair = next(pairs)
            while value < curr:
                value = next(iterator)
            pair[VALUE] = value
            if value > curr:
                curr, matches = value, 1
                continue
            matches += 1
            if matches != n:
                continue
            result.append(curr)
            while (value := next(iterator)) == curr:
                pass
            pair[VALUE] = value
            curr, matches = value, 1
    except StopIteration:
        return result
文字算法 该算法围绕迭代器、值对循环。如果某个值在所有对中都匹配,则该值属于交点。如果某个值低于迄今为止看到的任何其他值,则当前迭代器处于高级状态。如果某个值大于迄今为止看到的任何值,则它将成为新目标,并且匹配计数将重置为1。当任何迭代器耗尽时,算法完成

不依赖于内置函数 使用是完全可选的。通过增加在末尾环绕的索引,可以很容易地模拟它

而不是:

iterator, value = pair = next(pairs)
你可以写:

pairnum += 1
if pairnum == n:
    pairnum = 0
iterator, value = pair = pairs[pairnum]    
或者更紧凑地说:

pairnum = (pairnum + 1) % n
iterator, value = pair = pairs[pairnum] 
重复值 如果要保留重复(如多集),则很容易修改,只需更改
result.append(curr)
后的四行即可从每个迭代器中删除匹配元素:

def intersection(data):
    ITERATOR, VALUE = 0, 1
    n = len(data)
    result = []
    try:
        pairs = cycle([(it := iter(sublist)), next(it)] for sublist in data)
        pair = next(pairs)
        curr = pair[VALUE]  # Candidate is the largest value seen so far
        matches = 1         # Number of pairs where the candidate occurs
        while True:
            iterator, value = pair = next(pairs)
            while value < curr:
                value = next(iterator)
            pair[VALUE] = value
            if value > curr:
                curr, matches = value, 1
                continue
            matches += 1
            if matches != n:
                continue
            result.append(curr)
            for i in range(n):
                iterator, value = pair = next(pairs)
                pair[VALUE] = next(iterator)
            curr, matches = pair[VALUE], 1
    except StopIteration:
        return result
def交叉口(数据):
迭代器,值=0,1
n=len(数据)
结果=[]
尝试:
pairs=周期([(it:=iter(子列表)),下一个(it)]用于数据中的子列表)
配对=下一个(配对)
curr=pair[VALUE]#候选者是迄今为止看到的最大值
匹配=1#候选出现的对数
尽管如此:
迭代器,值=对=下一个(对)
当值<当前值时:
值=下一个(迭代器)
对[值]=值
如果值>当前值:
curr,matches=value,1
持续
匹配项+=1
如果匹配!=n:
持续
结果追加(curr)
对于范围(n)中的i:
迭代器,值=对=下一个(对)
pair[VALUE]=next(迭代器)
curr,matches=pair[VALUE],1
除停止迭代外:
返回结果

是的,这是可能的!我已经修改了您的示例代码来实现这一点

我的回答假设您的问题是关于算法的-如果您想要使用
set
s运行最快的代码,请参阅其他答案

这将保持
O(n log(k))
时间复杂度:所有
之间的代码如果最低!=元素还是元素!=看到的次数:
unbench\u all=False
O(log(k))
。主循环中有一个嵌套的循环(
用于范围内的无边界(times\u seen):
),但它只运行
times\u seen
次,并且
times\u seen
最初为0,每次运行此内部循环后重置为0,并且每次主循环迭代只能递增一次,因此,内部循环的总迭代次数不能超过主循环。因此,由于内循环中的代码是
O(log(k))
并且运行次数最多与外循环相同,而外循环是
O(log(k))
并且运行次数是
n
,因此算法是
O(nlog(k))

该算法依赖于Python中元组的比较方式。它比较元组的第一个项,如果它们相等,则比较第二个项(即
(x,a)<(x,b)
为真,当且仅当
a
)。 在该算法中,与问题中的示例代码不同,当从堆中弹出一个项时,不一定在同一次迭代中再次推送它。因为我们需要检查所有子列表是否包含相同的数字,所以从堆中弹出一个数字后,它的子列表就是我所说的“benched”,这意味着它不会被添加回堆中。这是因为我们需要检查其他子列表是否包含相同的项,因此现在不需要添加此子列表的下一项

如果一个数字确实在所有子列表中,那么堆将看起来像
[(2,0),(2,1),(2,2),(2,3)]
,元组的所有第一个元素都相同,因此
heappop
将选择子列表索引最低的一个。这意味着第一个索引0将被弹出,而
所见次数
将增加到1,然后索引1将被弹出,而
所见次数
将增加到2-如果
ary
不等于
所见次数
,则该数字不在所有子列表的交叉处。这将导致条件
如果最低!=元素还是元素!=times\u seen:
,它决定了数字何时不应出现在结果中。此
if
语句的
else
分支用于它可能仍在结果中的时间

unbench_all
布尔值适用于需要从工作台上删除所有子列表的情况-这可能是因为:

  • 已知当前编号不在子列表的交点处
  • 已知它位于子列表的交叉点
  • unbench_all
    True
    时,将重新添加从堆中删除的所有子列表。众所周知,这些是索引在
    范围(times_seen)
    内的项目,因为算法仅在项目数相同时才从堆中移除项目,因此必须按顺序移除项目
    def mergeArys(srtd_arys):
        heap = []
        srtd_iters = [iter(x) for x in srtd_arys]
    
        # put the first element from each srtd array onto the heap
        for idx, it in enumerate(srtd_iters):
            elem = next(it, None)
            if elem:
                heapq.heappush(heap, (elem, idx))
    
        res = []
    
        # collect results in nlogK time
        while heap:
            elem, ary = heap[0]
            lowest = elem
            keep_elem = True
            for i in range(len(srtd_arys)):
                elem, ary = heap[0]
                if lowest != elem or ary != i:
                    if ary != i:
                        heapq.heappop(heap)
                        it = srtd_iters[ary]
                        nxt = next(it, None)
                        if nxt:
                            heapq.heappush(heap, (nxt, ary))
    
                    keep_elem = False
                    i -= 1
                    break
                heapq.heappop(heap)
    
            if keep_elem:
                res.append(elem)
    
            for unbenched in range(i+1):
                unbenched_it = srtd_iters[unbenched]
                nxt = next(unbenched_it, None)
                if nxt:
                    heapq.heappush(heap, (nxt, unbenched))
    
            if len(heap) < len(srtd_arys):
                heap = []
    
        return res
    
    
      inter = []
    
      for n in range(len(arrays[0])):
        if indexes[0] >= len(arrays[0]):
            return inter
        for i in range(1,k):
          if indexes[i] >= len(arrays[i]):
            return inter
          while indexes[i] < len(arrays[i]) and arrays[i][indexes[i]] < arrays[0][indexes[0]]:
            indexes[i] += 1
          while indexes[i] < len(arrays[i]) and indexes[0] < len(arrays[0]) and arrays[i][indexes[i]] > arrays[0][indexes[0]]:
            indexes[0] += 1
        if indexes[0] < len(arrays[0]):
          inter.append(arrays[0][indexes[0]])
        indexes = [idx+1 for idx in indexes]
      return inter
    
    problem = [[1,3,5,7],[1,1,3,5,8,7],[1,4,7,9]];
    
    debruijn = [0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
        31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9];
    u32 = accum = (1 << 32) - 1;
    for vec in problem:
        maxterm = 0;
        for v in vec:
            maxterm |= 1 << v;
        accum &= maxterm;
    
    # https://graphics.stanford.edu/~seander/bithacks.html#IntegerLogDeBruijn
    result = [];
    while accum:
        power = accum;
        accum &= accum - 1; # Peter Wegner CACM 3 (1960), 322
        power &= ~accum;
        result.append(debruijn[((power * 0x077CB531) & u32) >> 27]);
    
    print result;
    
    arrays = [[1,3,5,7],[1,1,3,5,7],[1,4,7,9]]
    counts = {}
    
    for ar in arrays:
      last = None
      for i in ar:
        if (i != last):
          counts[i] = counts.get(i, 0) + 1
        last = i
    
    N = len(arrays)
    intersection = [i for i, n in counts.iteritems() if n == N]
    print intersection
    
    def counter(my_list):
        my_list = sorted(my_list)
        first_val, *all_val = my_list
        p_index = my_list.index(first_val)
        my_counter = {}
        for item in all_val:
             c_index = my_list.index(item)
             diff = abs(c_index-p_index)
             p_index = c_index
             my_counter[first_val] = diff 
             first_val = item
        c_index = my_list.index(item)
        diff = len(my_list) - c_index
        my_counter[first_val] = diff 
        return my_counter
    
    def my_func(data):
        if not data or not isinstance(data, list):
            return
        # get the first value
        first_val, *all_val = data
        if not isinstance(first_val, list):
            return
        # count items in first value
        p = counter(first_val) # counter({1: 2, 3: 1, 5: 1, 7: 1})
        # collect all common items and calculate the minimum occurance in intersection
        for val in all_val:
            # collecting common items
            c = counter(val)
            # calculate the minimum occurance in intersection
            inner_dict = {}
            for inner_val in set(c).intersection(set(p)):
                inner_dict[inner_val] = min(p[inner_val], c[inner_val])
            p = inner_dict
        # >>>p
        # {1: 2, 7: 1}
        # Sort by keys of counter
        sorted_items = sorted(p.items(), key=lambda x:x[0]) # [(1, 2), (7, 1)]
        result=[i[0] for i in sorted_items for _ in range(i[1])] # [1, 1, 7]
        return result
    
    >>> data = [[1,3,5,7],[1,1,3,5,7],[1,4,7,9]]
    >>> my_func(data=data)
    [1, 7]
    >>> data = [[1,1,3,5,7],[1,1,3,5,7],[1,1,4,7,9]]
    >>> my_func(data=data)
    [1, 1, 7]
    
    from heapq import merge
    from itertools import groupby, chain
    
    ls = [[1, 3, 5, 7], [1, 1, 3, 5, 7], [1, 4, 7, 9]]
    
    
    def index_groups(lst):
        """[1, 1, 3, 5, 7] -> [(1, 0), (1, 1), (3, 0), (5, 0), (7, 0)]"""
        return chain.from_iterable(((e, i) for i, e in enumerate(group)) for k, group in groupby(lst))
    
    
    iterables = (index_groups(li) for li in ls)
    flat = merge(*iterables)
    res = [k for (k, _), g in groupby(flat) if sum(1 for _ in g) == len(ls)]
    print(res)
    
    [1, 7]
    
    ls = [[1, 1, 3, 5, 7], [1, 1, 3, 5, 7], [1, 1, 4, 7, 9]]
    
    [1, 1, 7]
    
    def intersection(iterables):
        target, count = None, 0
        for it in itertools.cycle(map(iter, iterables)):
            for value in it:
                if count == 0 or value > target:
                    target, count = value, 1
                    break
                if value == target:
                    count += 1
                    break
            else:  # exhausted iterator
                return
            if count >= len(iterables):
                yield target
                count = 0
    
    def intersection(seqs):
        seq = min(seqs, key=len)
        if not seq:
            return
        pivot = seq[len(seq) // 2]
        lows, counts, highs = [], [], []
        for seq in seqs:
            start = bisect.bisect_left(seq, pivot)
            stop = bisect.bisect_right(seq, pivot, start)
            lows.append(seq[:start])
            counts.append(stop - start)
            highs.append(seq[stop:])
        yield from intersection(lows)
        yield from itertools.repeat(pivot, min(counts))
        yield from intersection(highs)
    
    def find_welfare_crook(f, g, h, i, j, k):
        """f, g, and h are "ascending functions," i.e.,
    i <= j implies f[i] <= f[j] or, equivalently,
    f[i] < f[j] implies i < j, and the same goes for g and h.
    i, j, k define where to start the search in each list.
    """
        # This is an implementation of a solution to the Welfare Crook
        # problems presented in David Gries's book, The Science of Programming.
        # The surprising and beautiful thing is that the guard predicates are
        # so few and so simple.
        i , j , k = i , j , k
        while True:
            if f[i] < g[j]:
                i += 1
            elif g[j] < h[k]:
                j += 1
            elif h[k] < f[i]:
                k += 1
            else:
                break
        return (i,j,k)
        # The other remarkable thing is how the negation of the guard
        # predicates works out to be:  f[i] == g[j] and g[j] == c[k].
    
    def findIntersectionLofL(lofl):
        """Generalized findIntersection function which operates on a "list of lists." """
        K = len(lofl)
        indices = [0 for i in range(K)]
        result = []
        #
        try:
            while True:
                # idea is to maintain the indices via a construct like the following:
                allEqual = True
                for i in range(K):
                    if lofl[i][indices[i]] < lofl[(i+1)%K][indices[(i+1)%K]] :
                        indices[i] += 1
                        allEqual = False
                # When the above iteration finishes, if all of the list
                # items indexed by the indices are equal, then another
                # item common to all of the lists must be added to the result.
                if allEqual :
                    result.append(lofl[0][indices[0]])
                    while lofl[0][indices[0]] == lofl[1][indices[1]]:
                        indices[0] += 1
        except IndexError as e:
            # Eventually, the foregoing iteration will advance one of the
            # indices past the end of one of the lists, and when that happens
            # an IndexError exception will be raised.  This means the algorithm
            # is finished.
            return result
    
    def findIntersectionLofLunRolled(lofl):
        """Generalized findIntersection function which operates on a "list of lists."
    Accepts a list-of-lists, lofl.  Each of the lists must be ordered.
    Returns the list of each element which appears in all of the lists at least once.
    """
        K = len(lofl)
        indices = [0] * K
        result = []
        lt = [ (i, (i+1) % K) for i in range(K) ] # avoids evaluation of index exprs inside the loop
        #
        try:
            while True:
                allUnEqual = True
                while allUnEqual:
                    allUnEqual = False
                    for i,j in lt:
                        if lofl[i][indices[i]] < lofl[j][indices[j]]:
                            indices[i] += 1
                            allUnEqual = True
                # Now all of the lofl[i][indices[i]], for all i, are the same value.
                # Store that value in the result, and then advance all of the indices
                # past that common value:
                v = lofl[0][indices[0]]
                result.append(v)
                for i,j in lt:
                    while lofl[i][indices[i]] == v:
                        indices[i] += 1
        except IndexError as e:
            # Eventually, the foregoing iteration will advance one of the
            # indices past the end of one of the lists, and when that happens
            # an IndexError exception will be raised.  This means the algorithm
            # is finished.
            return result