Python 获取k个排序数组的交集最有效的方法是什么?
给定k个排序数组,获取这些列表交集的最有效方法是什么 范例 输入:Python 获取k个排序数组的交集最有效的方法是什么?,python,python-3.x,algorithm,Python,Python 3.x,Algorithm,给定k个排序数组,获取这些列表交集的最有效方法是什么 范例 输入: [[1,3,5,7], [1,1,3,5,7], [1,4,7,9]] [1,7] 输出: [[1,3,5,7], [1,1,3,5,7], [1,4,7,9]] [1,7] 有一种方法可以根据我在nlogk时代的《编程访谈元素》一书中读到的内容,得到k个排序数组的并集。我想知道是否有一种方法可以在十字路口做类似的事情 ## merge sorted arrays in nlogk time [ regular ap
[[1,3,5,7], [1,1,3,5,7], [1,4,7,9]]
[1,7]
输出:
[[1,3,5,7], [1,1,3,5,7], [1,4,7,9]]
[1,7]
有一种方法可以根据我在nlogk时代的《编程访谈元素》一书中读到的内容,得到k个排序数组的并集。我想知道是否有一种方法可以在十字路口做类似的事情
## merge sorted arrays in nlogk time [ regular appending and merging is nlogn time ]
import heapq
def mergeArys(srtd_arys):
heap = []
srtd_iters = [iter(x) for x in srtd_arys]
# put the first element from each srtd array onto the heap
for idx, it in enumerate(srtd_iters):
elem = next(it, None)
if elem:
heapq.heappush(heap, (elem, idx))
res = []
# collect results in nlogK time
while heap:
elem, ary = heapq.heappop(heap)
it = srtd_iters[ary]
res.append(elem)
nxt = next(it, None)
if nxt:
heapq.heappush(heap, (nxt, ary))
编辑:显然,这是我试图解决的一个算法问题,因此我无法使用任何内置函数,如设置交点等您可以使用
reduce
:
from functools import reduce
a = [[1,3,5,7],[1,1,3,5,7],[1,4,7,9]]
reduce(lambda x, y: x & set(y), a[1:], set(a[0]))
{1, 7}
可以使用内置集合和集合交点:
d=[[1,3,5,7],[1,1,3,5,7],[1,4,7,9]]
结果=集合(d[0])。交点(*d[1:])
{1, 7}
利用排序顺序
这里是一种O(n)方法,除了一个迭代器和每个子列表一个值的基本要求外,它不需要任何特殊的数据结构或辅助内存:
from itertools import cycle
def intersection(data):
ITERATOR, VALUE = 0, 1
n = len(data)
result = []
try:
pairs = cycle([(it := iter(sublist)), next(it)] for sublist in data)
pair = next(pairs)
curr = pair[VALUE] # Candidate is the largest value seen so far
matches = 1 # Number of pairs where the candidate occurs
while True:
iterator, value = pair = next(pairs)
while value < curr:
value = next(iterator)
pair[VALUE] = value
if value > curr:
curr, matches = value, 1
continue
matches += 1
if matches != n:
continue
result.append(curr)
while (value := next(iterator)) == curr:
pass
pair[VALUE] = value
curr, matches = value, 1
except StopIteration:
return result
文字算法
该算法围绕迭代器、值对循环。如果某个值在所有对中都匹配,则该值属于交点。如果某个值低于迄今为止看到的任何其他值,则当前迭代器处于高级状态。如果某个值大于迄今为止看到的任何值,则它将成为新目标,并且匹配计数将重置为1。当任何迭代器耗尽时,算法完成
不依赖于内置函数
使用是完全可选的。通过增加在末尾环绕的索引,可以很容易地模拟它
而不是:
iterator, value = pair = next(pairs)
你可以写:
pairnum += 1
if pairnum == n:
pairnum = 0
iterator, value = pair = pairs[pairnum]
或者更紧凑地说:
pairnum = (pairnum + 1) % n
iterator, value = pair = pairs[pairnum]
重复值
如果要保留重复(如多集),则很容易修改,只需更改result.append(curr)
后的四行即可从每个迭代器中删除匹配元素:
def intersection(data):
ITERATOR, VALUE = 0, 1
n = len(data)
result = []
try:
pairs = cycle([(it := iter(sublist)), next(it)] for sublist in data)
pair = next(pairs)
curr = pair[VALUE] # Candidate is the largest value seen so far
matches = 1 # Number of pairs where the candidate occurs
while True:
iterator, value = pair = next(pairs)
while value < curr:
value = next(iterator)
pair[VALUE] = value
if value > curr:
curr, matches = value, 1
continue
matches += 1
if matches != n:
continue
result.append(curr)
for i in range(n):
iterator, value = pair = next(pairs)
pair[VALUE] = next(iterator)
curr, matches = pair[VALUE], 1
except StopIteration:
return result
def交叉口(数据):
迭代器,值=0,1
n=len(数据)
结果=[]
尝试:
pairs=周期([(it:=iter(子列表)),下一个(it)]用于数据中的子列表)
配对=下一个(配对)
curr=pair[VALUE]#候选者是迄今为止看到的最大值
匹配=1#候选出现的对数
尽管如此:
迭代器,值=对=下一个(对)
当值<当前值时:
值=下一个(迭代器)
对[值]=值
如果值>当前值:
curr,matches=value,1
持续
匹配项+=1
如果匹配!=n:
持续
结果追加(curr)
对于范围(n)中的i:
迭代器,值=对=下一个(对)
pair[VALUE]=next(迭代器)
curr,matches=pair[VALUE],1
除停止迭代外:
返回结果
是的,这是可能的!我已经修改了您的示例代码来实现这一点
我的回答假设您的问题是关于算法的-如果您想要使用set
s运行最快的代码,请参阅其他答案
这将保持O(n log(k))
时间复杂度:所有之间的代码如果最低!=元素还是元素!=看到的次数:
和unbench\u all=False
是O(log(k))
。主循环中有一个嵌套的循环(用于范围内的无边界(times\u seen):
),但它只运行times\u seen
次,并且times\u seen
最初为0,每次运行此内部循环后重置为0,并且每次主循环迭代只能递增一次,因此,内部循环的总迭代次数不能超过主循环。因此,由于内循环中的代码是O(log(k))
并且运行次数最多与外循环相同,而外循环是O(log(k))
并且运行次数是n
,因此算法是O(nlog(k))
该算法依赖于Python中元组的比较方式。它比较元组的第一个项,如果它们相等,则比较第二个项(即(x,a)<(x,b)
为真,当且仅当a
)。
在该算法中,与问题中的示例代码不同,当从堆中弹出一个项时,不一定在同一次迭代中再次推送它。因为我们需要检查所有子列表是否包含相同的数字,所以从堆中弹出一个数字后,它的子列表就是我所说的“benched”,这意味着它不会被添加回堆中。这是因为我们需要检查其他子列表是否包含相同的项,因此现在不需要添加此子列表的下一项
如果一个数字确实在所有子列表中,那么堆将看起来像[(2,0),(2,1),(2,2),(2,3)]
,元组的所有第一个元素都相同,因此heappop
将选择子列表索引最低的一个。这意味着第一个索引0将被弹出,而所见次数
将增加到1,然后索引1将被弹出,而所见次数
将增加到2-如果ary
不等于所见次数
,则该数字不在所有子列表的交叉处。这将导致条件如果最低!=元素还是元素!=times\u seen:
,它决定了数字何时不应出现在结果中。此if
语句的else
分支用于它可能仍在结果中的时间
unbench_all
布尔值适用于需要从工作台上删除所有子列表的情况-这可能是因为:
unbench_all
为True
时,将重新添加从堆中删除的所有子列表。众所周知,这些是索引在范围(times_seen)
内的项目,因为算法仅在项目数相同时才从堆中移除项目,因此必须按顺序移除项目
def mergeArys(srtd_arys):
heap = []
srtd_iters = [iter(x) for x in srtd_arys]
# put the first element from each srtd array onto the heap
for idx, it in enumerate(srtd_iters):
elem = next(it, None)
if elem:
heapq.heappush(heap, (elem, idx))
res = []
# collect results in nlogK time
while heap:
elem, ary = heap[0]
lowest = elem
keep_elem = True
for i in range(len(srtd_arys)):
elem, ary = heap[0]
if lowest != elem or ary != i:
if ary != i:
heapq.heappop(heap)
it = srtd_iters[ary]
nxt = next(it, None)
if nxt:
heapq.heappush(heap, (nxt, ary))
keep_elem = False
i -= 1
break
heapq.heappop(heap)
if keep_elem:
res.append(elem)
for unbenched in range(i+1):
unbenched_it = srtd_iters[unbenched]
nxt = next(unbenched_it, None)
if nxt:
heapq.heappush(heap, (nxt, unbenched))
if len(heap) < len(srtd_arys):
heap = []
return res
inter = []
for n in range(len(arrays[0])):
if indexes[0] >= len(arrays[0]):
return inter
for i in range(1,k):
if indexes[i] >= len(arrays[i]):
return inter
while indexes[i] < len(arrays[i]) and arrays[i][indexes[i]] < arrays[0][indexes[0]]:
indexes[i] += 1
while indexes[i] < len(arrays[i]) and indexes[0] < len(arrays[0]) and arrays[i][indexes[i]] > arrays[0][indexes[0]]:
indexes[0] += 1
if indexes[0] < len(arrays[0]):
inter.append(arrays[0][indexes[0]])
indexes = [idx+1 for idx in indexes]
return inter
problem = [[1,3,5,7],[1,1,3,5,8,7],[1,4,7,9]];
debruijn = [0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9];
u32 = accum = (1 << 32) - 1;
for vec in problem:
maxterm = 0;
for v in vec:
maxterm |= 1 << v;
accum &= maxterm;
# https://graphics.stanford.edu/~seander/bithacks.html#IntegerLogDeBruijn
result = [];
while accum:
power = accum;
accum &= accum - 1; # Peter Wegner CACM 3 (1960), 322
power &= ~accum;
result.append(debruijn[((power * 0x077CB531) & u32) >> 27]);
print result;
arrays = [[1,3,5,7],[1,1,3,5,7],[1,4,7,9]]
counts = {}
for ar in arrays:
last = None
for i in ar:
if (i != last):
counts[i] = counts.get(i, 0) + 1
last = i
N = len(arrays)
intersection = [i for i, n in counts.iteritems() if n == N]
print intersection
def counter(my_list):
my_list = sorted(my_list)
first_val, *all_val = my_list
p_index = my_list.index(first_val)
my_counter = {}
for item in all_val:
c_index = my_list.index(item)
diff = abs(c_index-p_index)
p_index = c_index
my_counter[first_val] = diff
first_val = item
c_index = my_list.index(item)
diff = len(my_list) - c_index
my_counter[first_val] = diff
return my_counter
def my_func(data):
if not data or not isinstance(data, list):
return
# get the first value
first_val, *all_val = data
if not isinstance(first_val, list):
return
# count items in first value
p = counter(first_val) # counter({1: 2, 3: 1, 5: 1, 7: 1})
# collect all common items and calculate the minimum occurance in intersection
for val in all_val:
# collecting common items
c = counter(val)
# calculate the minimum occurance in intersection
inner_dict = {}
for inner_val in set(c).intersection(set(p)):
inner_dict[inner_val] = min(p[inner_val], c[inner_val])
p = inner_dict
# >>>p
# {1: 2, 7: 1}
# Sort by keys of counter
sorted_items = sorted(p.items(), key=lambda x:x[0]) # [(1, 2), (7, 1)]
result=[i[0] for i in sorted_items for _ in range(i[1])] # [1, 1, 7]
return result
>>> data = [[1,3,5,7],[1,1,3,5,7],[1,4,7,9]]
>>> my_func(data=data)
[1, 7]
>>> data = [[1,1,3,5,7],[1,1,3,5,7],[1,1,4,7,9]]
>>> my_func(data=data)
[1, 1, 7]
from heapq import merge
from itertools import groupby, chain
ls = [[1, 3, 5, 7], [1, 1, 3, 5, 7], [1, 4, 7, 9]]
def index_groups(lst):
"""[1, 1, 3, 5, 7] -> [(1, 0), (1, 1), (3, 0), (5, 0), (7, 0)]"""
return chain.from_iterable(((e, i) for i, e in enumerate(group)) for k, group in groupby(lst))
iterables = (index_groups(li) for li in ls)
flat = merge(*iterables)
res = [k for (k, _), g in groupby(flat) if sum(1 for _ in g) == len(ls)]
print(res)
[1, 7]
ls = [[1, 1, 3, 5, 7], [1, 1, 3, 5, 7], [1, 1, 4, 7, 9]]
[1, 1, 7]
def intersection(iterables):
target, count = None, 0
for it in itertools.cycle(map(iter, iterables)):
for value in it:
if count == 0 or value > target:
target, count = value, 1
break
if value == target:
count += 1
break
else: # exhausted iterator
return
if count >= len(iterables):
yield target
count = 0
def intersection(seqs):
seq = min(seqs, key=len)
if not seq:
return
pivot = seq[len(seq) // 2]
lows, counts, highs = [], [], []
for seq in seqs:
start = bisect.bisect_left(seq, pivot)
stop = bisect.bisect_right(seq, pivot, start)
lows.append(seq[:start])
counts.append(stop - start)
highs.append(seq[stop:])
yield from intersection(lows)
yield from itertools.repeat(pivot, min(counts))
yield from intersection(highs)
def find_welfare_crook(f, g, h, i, j, k):
"""f, g, and h are "ascending functions," i.e.,
i <= j implies f[i] <= f[j] or, equivalently,
f[i] < f[j] implies i < j, and the same goes for g and h.
i, j, k define where to start the search in each list.
"""
# This is an implementation of a solution to the Welfare Crook
# problems presented in David Gries's book, The Science of Programming.
# The surprising and beautiful thing is that the guard predicates are
# so few and so simple.
i , j , k = i , j , k
while True:
if f[i] < g[j]:
i += 1
elif g[j] < h[k]:
j += 1
elif h[k] < f[i]:
k += 1
else:
break
return (i,j,k)
# The other remarkable thing is how the negation of the guard
# predicates works out to be: f[i] == g[j] and g[j] == c[k].
def findIntersectionLofL(lofl):
"""Generalized findIntersection function which operates on a "list of lists." """
K = len(lofl)
indices = [0 for i in range(K)]
result = []
#
try:
while True:
# idea is to maintain the indices via a construct like the following:
allEqual = True
for i in range(K):
if lofl[i][indices[i]] < lofl[(i+1)%K][indices[(i+1)%K]] :
indices[i] += 1
allEqual = False
# When the above iteration finishes, if all of the list
# items indexed by the indices are equal, then another
# item common to all of the lists must be added to the result.
if allEqual :
result.append(lofl[0][indices[0]])
while lofl[0][indices[0]] == lofl[1][indices[1]]:
indices[0] += 1
except IndexError as e:
# Eventually, the foregoing iteration will advance one of the
# indices past the end of one of the lists, and when that happens
# an IndexError exception will be raised. This means the algorithm
# is finished.
return result
def findIntersectionLofLunRolled(lofl):
"""Generalized findIntersection function which operates on a "list of lists."
Accepts a list-of-lists, lofl. Each of the lists must be ordered.
Returns the list of each element which appears in all of the lists at least once.
"""
K = len(lofl)
indices = [0] * K
result = []
lt = [ (i, (i+1) % K) for i in range(K) ] # avoids evaluation of index exprs inside the loop
#
try:
while True:
allUnEqual = True
while allUnEqual:
allUnEqual = False
for i,j in lt:
if lofl[i][indices[i]] < lofl[j][indices[j]]:
indices[i] += 1
allUnEqual = True
# Now all of the lofl[i][indices[i]], for all i, are the same value.
# Store that value in the result, and then advance all of the indices
# past that common value:
v = lofl[0][indices[0]]
result.append(v)
for i,j in lt:
while lofl[i][indices[i]] == v:
indices[i] += 1
except IndexError as e:
# Eventually, the foregoing iteration will advance one of the
# indices past the end of one of the lists, and when that happens
# an IndexError exception will be raised. This means the algorithm
# is finished.
return result