Python 最长等距子序列
我有一百万个按排序顺序排列的整数,我想找出连续对之间的差值相等的最长子序列。比如说Python 最长等距子序列,python,algorithm,Python,Algorithm,我有一百万个按排序顺序排列的整数,我想找出连续对之间的差值相等的最长子序列。比如说 1, 4, 5, 7, 8, 12 有子序列 4, 8, 12 我的天真方法是贪婪的,只检查从每个点可以扩展一个子序列多远。这似乎需要每个点花费O(n²)时间 有没有更快的方法来解决这个问题 更新。我将尽快测试答案中给出的代码(谢谢)。然而,使用n^2内存显然不起作用。到目前为止,还没有以输入为[random.randint(0100000)for r in xrange(200000)]终
1, 4, 5, 7, 8, 12
有子序列
4, 8, 12
我的天真方法是贪婪的,只检查从每个点可以扩展一个子序列多远。这似乎需要每个点花费O(n²)
时间
有没有更快的方法来解决这个问题
更新。我将尽快测试答案中给出的代码(谢谢)。然而,使用n^2内存显然不起作用。到目前为止,还没有以输入为[random.randint(0100000)for r in xrange(200000)]
终止的代码
计时。我在32位系统上使用以下输入数据进行了测试
a= [random.randint(0,10000) for r in xrange(20000)]
a.sort()
- ZelluX的动态编程方法使用1.6G RAM,耗时2分14秒。使用pypy只需9秒!但是,它会因大输入上的内存错误而崩溃
- Armin的O(nd)时间方法使用pypy需要9秒,但只有20MB的RAM。当然,如果范围更大,情况会更糟。内存使用率低意味着我也可以用a=[random.randint(0100000)for r in xrange(200000)]测试它,但在我用pypy测试的几分钟内它并没有完成
a= [random.randint(0,40000) for r in xrange(28000)]
a = list(set(a))
a.sort()
列出长度约为20000
。所有与pypy的计时
- 泽勒克斯,9秒
- 克鲁耶夫,20秒
- 阿明,52秒
O(N^3)
(你说的O(N^2)每个索引
)。这里是时间的O(N^2)
和内存解决方案的O(N^2)
主意
如果我们知道通过索引i[0]
,i[1]
,i[2]
,i[3]
的子序列,我们就不应该尝试以i[1]
和i[2]
和i[3]
开头的子序列
请注意,我编辑了该代码,以便使用排序的a
更容易一些,但它不适用于相等的元素。您可以轻松地在O(N)
中检查相等元素的最大数量
伪码
我只寻求最大长度,但这不会改变任何事情
whereInA = {}
for i in range(n):
whereInA[a[i]] = i; // It doesn't matter which of same elements it points to
boolean usedPairs[n][n];
for i in range(n):
for j in range(i + 1, n):
if usedPair[i][j]:
continue; // do not do anything. It was in one of prev sequences.
usedPair[i][j] = true;
//here quite stupid solution:
diff = a[j] - a[i];
if diff == 0:
continue; // we can't work with that
lastIndex = j
currentLen = 2
while whereInA contains index a[lastIndex] + diff :
nextIndex = whereInA[a[lastIndex] + diff]
usedPair[lastIndex][nextIndex] = true
++currentLen
lastIndex = nextIndex
// you may store all indicies here
maxLen = max(maxLen, currentLen)
关于内存使用的思考
O(n^2)
对于1000000个元素,时间非常慢。但如果要在如此多的元素上运行此代码,最大的问题将是内存使用。可以做些什么来减少它
- 将布尔数组更改为位字段以每位存储更多布尔值
- 使下一个布尔数组变短,因为我们仅在
时使用i
usedPairs[i][j]
- 仅储存成对的使用过的标识。(与第一个想法冲突)
- 删除不再使用的已用磁盘(用于循环中已选择的
,i
)j
- 遍历列表的主循环
- 若在预计算列表中找到了数字,那个么它属于该列表中的所有序列,用count+1重新计算所有序列
- 删除当前元素的所有预计算
- 重新计算新序列,其中第一个元素的范围是从0到当前,第二个元素是遍历的当前元素(实际上,不是从0到当前,我们可以使用这样一个事实:新元素不应该超过max(a),新列表应该有可能比已经找到的列表更长)
- 索引0,元素1:
- 如果预信用证中的
?不,什么也不做1
- 无所事事
- 如果预信用证中的
- 索引1,元素2:
- 如果预LC中的
?不,什么也不做2
- 检查我们的集合中是否有3=
+(1
-2
)*2?不,什么也不做1
- 如果预LC中的
- 索引2,元素4:
- 如果预信用证中的
?不,什么也不做4
- 检查6=
+(2
-4
)*2是否在我们的集合中?没有2
- 检查7=
+(1
-4
)*2是否在我们的集合中?是-添加新元素1
7-列表的元素,3是步骤{7:{3:{'count':2,'start':1}}}
- 检查6=
- 如果预信用证中的
- 索引3,元素
:5
- 如果预信用证中的
?不,什么也不做5
- 不要检查
,因为6=4+(4
-5
)*2小于计算元素74
- 检查8=
+(2
-5
)*2是否在我们的集合中?没有2
- 检查10=
+(2
-5
)*2-超过最大值(a)==71
- 不要检查
- 如果预信用证中的
- 索引4,元素
:7
- 如果在预LC中7?是-将其放入结果中
- 不要选中
,因为9=5+(5
-7
)*2大于最大值(a)==75
- 不要选中
- 如果在预LC中7?是-将其放入结果中
- 算法
[1,2,4,5,7]
的输出将是(有点凌乱,请自己编写代码并查看)
def add_precalc(precalc, start, step, count, res, N):
if step == 0: return True
if start + step * res[1]["count"] > N: return False
x = start + step * count
if x > N or x < 0: return False
if precalc[x] is None: return True
if step not in precalc[x]:
precalc[x][step] = {"start":start, "count":count}
return True
def work(a):
precalc = [None] * (max(a) + 1)
for x in a: precalc[x] = {}
N, m = max(a), 0
ind = {x:i for i, x in enumerate(a)}
res = (0, {"start":0, "count":0})
for i, x in enumerate(a):
for el in precalc[x].iteritems():
el[1]["count"] += 1
if el[1]["count"] > res[1]["count"]: res = el
add_precalc(precalc, el[1]["start"], el[0], el[1]["count"], res, N)
t = el[1]["start"] + el[0] * el[1]["count"]
if t in ind and ind[t] > m:
m = ind[t]
precalc[x] = None
for y in a[i - m - 1::-1]:
if not add_precalc(precalc, y, x - y, 2, res, N): break
return [x * res[0] + res[1]["start"] for x in range(res[1]["count"])]
def add_precalc(precalc、start、step、count、res、N):
如果步骤==0:返回True
如果开始+步骤*res[1][“计数”]>N:返回False
x=开始+步骤*计数
如果x>N或x<0:返回False
如果预信用证[x]为无:返回
A = [1, 4, 5, 7, 8, 12] # in sorted order
Aset = set(A)
for d in range(1, 12):
already_seen = set()
for a in A:
if a not in already_seen:
b = a
count = 1
while b + d in Aset:
b += d
count += 1
already_seen.add(b)
print "found %d items in %d .. %d" % (count, a, b)
# collect here the largest 'count'
import random
import timeit
import sys
#s = [1,4,5,7,8,12]
#s = [2, 6, 7, 10, 13, 14, 17, 18, 21, 22, 23, 25, 28, 32, 39, 40, 41, 44, 45, 46, 49, 50, 51, 52, 53, 63, 66, 67, 68, 69, 71, 72, 74, 75, 76, 79, 80, 82, 86, 95, 97, 101, 110, 111, 112, 114, 115, 120, 124, 125, 129, 131, 132, 136, 137, 138, 139, 140, 144, 145, 147, 151, 153, 157, 159, 161, 163, 165, 169, 172, 173, 175, 178, 179, 182, 185, 186, 188, 195]
#s = [0, 6, 7, 10, 11, 12, 16, 18, 19]
m = [random.randint(1,40000) for r in xrange(20000)]
s = list(set(m))
s.sort()
lenS = len(s)
halfRange = (s[lenS-1] - s[0]) // 2
while s[lenS-1] - s[lenS-2] > halfRange:
s.pop()
lenS -= 1
halfRange = (s[lenS-1] - s[0]) // 2
while s[1] - s[0] > halfRange:
s.pop(0)
lenS -=1
halfRange = (s[lenS-1] - s[0]) // 2
n = lenS
largest = (s[n-1] - s[0]) // 2
#largest = 1000 #set the maximum size of d searched
maxS = s[n-1]
maxD = 0
maxSeq = 0
hCount = [None]*(largest + 1)
hLast = [None]*(largest + 1)
best = {}
start = timeit.default_timer()
for i in range(1,n):
sys.stdout.write(repr(i)+"\r")
for j in range(i-1,-1,-1):
d = s[i] - s[j]
numLeft = n - i
if d != 0:
maxPossible = (maxS - s[i]) // d + 2
else:
maxPossible = numLeft + 2
ok = numLeft + 2 > maxSeq and maxPossible > maxSeq
if d > largest or (d > maxD and not ok):
break
if hLast[d] != None:
found = False
for k in range (len(hLast[d])-1,-1,-1):
tmpLast = hLast[d][k]
if tmpLast == j:
found = True
hLast[d][k] = i
hCount[d][k] += 1
tmpCount = hCount[d][k]
if tmpCount > maxSeq:
maxSeq = tmpCount
best = {'len': tmpCount, 'd': d, 'last': i}
elif s[tmpLast] < s[j]:
del hLast[d][k]
del hCount[d][k]
if not found and ok:
hLast[d].append(i)
hCount[d].append(2)
elif ok:
if d > maxD:
maxD = d
hLast[d] = [i]
hCount[d] = [2]
end = timeit.default_timer()
seconds = (end - start)
#print (hCount)
#print (hLast)
print(best)
print(seconds)
input = [1, 4, 5, 7, 8, 12]
[1, 4, 5, 7, 8, 12]
x 3 4 6 7 11 # distance from point i to point 0
x x 1 3 4 8 # distance from point i to point 1
x x x 2 3 7 # distance from point i to point 2
x x x x 1 5 # distance from point i to point 3
x x x x x 4 # distance from point i to point 4
def build_columns(l):
columns = {}
for x in l[1:]:
col = []
for y in l[:l.index(x)]:
col.append(x - y)
columns[x] = col
return columns
def algo(input, columns):
seqs = []
for index1, number in enumerate(input[1:]):
index1 += 1 #first item was sliced
for index2, distance in enumerate(columns[number]):
seq = []
seq.append(input[index2]) # k-th pred
seq.append(number)
matches = 1
for successor in input[index1 + 1 :]:
column = columns[successor]
if column[index1] == distance * matches:
matches += 1
seq.append(successor)
if (len(seq) > 2):
seqs.append(seq)
return seqs
print max(sequences, key=len)
def findLESS(A):
Aset = set(A)
lmax = 2
d = 1
minStep = 0
while (lmax - 1) * minStep <= A[-1] - A[0]:
minStep = A[-1] - A[0] + 1
for j, b in enumerate(A):
if j+d < len(A):
a = A[j+d]
step = a - b
minStep = min(minStep, step)
if a + step in Aset and b - step not in Aset:
c = a + step
count = 3
while c + step in Aset:
c += step
count += 1
if count > lmax:
lmax = count
d += 1
return lmax
print(findLESS([1, 4, 5, 7, 8, 12]))
def findLESS(src):
r = [False for i in range(src[-1]+1)]
for x in src:
r[x] = True
d = 1
best = 1
while best * d < len(r):
for s in range(d):
l = 0
for i in range(s, len(r), d):
if r[i]:
l += 1
best = max(best, l)
else:
l = 0
d += 1
return best
print(findLESS([1, 4, 5, 7, 8, 12]))
def findLESS(src):
r = 0
for x in src:
r |= 1 << x
d = 1
best = 1
while best * d < src[-1] + 1:
c = best
rr = r
while c & (c-1):
cc = c & -c
rr &= rr >> (cc * d)
c &= c-1
while c != 1:
c = c >> 1
rr &= rr >> (c * d)
rr &= rr >> d
while rr:
rr &= rr >> d
best += 1
d += 1
return best
random.seed(42)
s = sorted(list(set([random.randint(0,200000) for r in xrange(140000)])))
s = sorted(list(set([random.randint(0,2000000) for r in xrange(1400000)])))
Size: 100000 1000000
Second answer by Armin Rigo: 634 ?
By Armin Rigo, optimized: 64 >5000
O(M^2) algorithm: 53 2940
O(M^2*L) algorithm: 7 711
lmax = 2
l = [[2 for i in xrange(n)] for j in xrange(n)]
for mid in xrange(n - 1):
prev = mid - 1
succ = mid + 1
while (prev >= 0 and succ < n):
if a[prev] + a[succ] < a[mid] * 2:
succ += 1
elif a[prev] + a[succ] > a[mid] * 2:
prev -= 1
else:
l[mid][succ] = l[prev][mid] + 1
lmax = max(lmax, l[mid][succ])
prev -= 1
succ += 1
print lmax
A = [1, 4, 5, 7, 8, 12] # in sorted order
Aset = set(A)
lmax = 2
for j, b in enumerate(A):
for i in range(j):
a = A[i]
step = b - a
if b + step in Aset and a - step not in Aset:
c = b + step
count = 3
while c + step in Aset:
c += step
count += 1
#print "found %d items in %d .. %d" % (count, a, c)
if count > lmax:
lmax = count
print lmax