Python 最长等距子序列_Python_Algorithm

Python 最长等距子序列

python algorithm

Python 最长等距子序列,python,algorithm,Python,Algorithm,我有一百万个按排序顺序排列的整数，我想找出连续对之间的差值相等的最长子序列。比如说 1, 4, 5, 7, 8, 12 有子序列 4, 8, 12 我的天真方法是贪婪的，只检查从每个点可以扩展一个子序列多远。这似乎需要每个点花费O（n²）时间有没有更快的方法来解决这个问题更新。我将尽快测试答案中给出的代码（谢谢）。然而，使用n^2内存显然不起作用。到目前为止，还没有以输入为[random.randint（0100000）for r in xrange（200000）]终

我有一百万个按排序顺序排列的整数，我想找出连续对之间的差值相等的最长子序列。比如说

1, 4, 5, 7, 8, 12

有子序列

   4,       8, 12

我的天真方法是贪婪的，只检查从每个点可以扩展一个子序列多远。这似乎需要每个点花费

O（n²）

时间

有没有更快的方法来解决这个问题

更新。我将尽快测试答案中给出的代码（谢谢）。然而，使用n^2内存显然不起作用。到目前为止，还没有以输入为

[random.randint（0100000）for r in xrange（200000）]

终止的代码

计时。我在32位系统上使用以下输入数据进行了测试

a= [random.randint(0,10000) for r in xrange(20000)] 
a.sort()

ZelluX的动态编程方法使用1.6G RAM，耗时2分14秒。使用pypy只需9秒！但是，它会因大输入上的内存错误而崩溃
Armin的O（nd）时间方法使用pypy需要9秒，但只有20MB的RAM。当然，如果范围更大，情况会更糟。内存使用率低意味着我也可以用a=[random.randint（0100000）for r in xrange（200000）]测试它，但在我用pypy测试的几分钟内它并没有完成

为了能够测试Kluev的I-reran方法

a= [random.randint(0,40000) for r in xrange(28000)] 
a = list(set(a))
a.sort()

列出长度约为

。所有与pypy的计时

泽勒克斯，9秒
克鲁耶夫，20秒
阿明，52秒

看来，如果Zelux方法可以成为线性空间，它将是明显的赢家。

你的解决方案是现在的

O（N^3）

（你说的

O（N^2）每个索引

）。这里是时间的

O（N^2）

和内存解决方案的

O（N^2）

主意如果我们知道通过索引

i[0]

，

i[1]

，

i[2]

，

i[3]

的子序列，我们就不应该尝试以

i[1]

和

i[2]

和

i[3]

开头的子序列

请注意，我编辑了该代码，以便使用排序的

更容易一些，但它不适用于相等的元素。您可以轻松地在

O（N）

中检查相等元素的最大数量

伪码我只寻求最大长度，但这不会改变任何事情

whereInA = {}
for i in range(n):
   whereInA[a[i]] = i; // It doesn't matter which of same elements it points to

boolean usedPairs[n][n];

for i in range(n):
    for j in range(i + 1, n):
       if usedPair[i][j]:
          continue; // do not do anything. It was in one of prev sequences.

    usedPair[i][j] = true;

    //here quite stupid solution:
    diff = a[j] - a[i];
    if diff == 0:
       continue; // we can't work with that
    lastIndex = j
    currentLen = 2
    while whereInA contains index a[lastIndex] + diff :
        nextIndex = whereInA[a[lastIndex] + diff]
        usedPair[lastIndex][nextIndex] = true
        ++currentLen
        lastIndex = nextIndex

    // you may store all indicies here
    maxLen = max(maxLen, currentLen)

关于内存使用的思考

O（n^2）

对于1000000个元素，时间非常慢。但如果要在如此多的元素上运行此代码，最大的问题将是内存使用。
可以做些什么来减少它

将布尔数组更改为位字段以每位存储更多布尔值
使下一个布尔数组变短，因为我们仅在
```
i
```
时使用usedPairs[i][j]


一些启发式方法：

仅储存成对的使用过的标识。（与第一个想法冲突）
删除不再使用的已用磁盘（用于循环中已选择的i
，j
）
算法

遍历列表的主循环
若在预计算列表中找到了数字，那个么它属于该列表中的所有序列，用count+1重新计算所有序列
删除当前元素的所有预计算
重新计算新序列，其中第一个元素的范围是从0到当前，第二个元素是遍历的当前元素（实际上，不是从0到当前，我们可以使用这样一个事实：新元素不应该超过max（a），新列表应该有可能比已经找到的列表更长）

所以对于列表[1,2,4,5,7]
的输出将是（有点凌乱，请自己编写代码并查看）

索引0，元素1：

如果预信用证中的1
？不，什么也不做
无所事事

索引1，元素2：

如果预LC中的2
？不，什么也不做
检查我们的集合中是否有3=1
+（2
-1
）*2？不，什么也不做

索引2，元素4：

如果预信用证中的4？不，什么也不做

检查6=2
+（4
-2
）*2是否在我们的集合中？没有
检查7=1
+（4
-1
）*2是否在我们的集合中？是-添加新元素{7:{3:{'count'：2，'start'：1}}}
7-列表的元素，3是步骤


索引3，元素5：

如果预信用证中的5？不，什么也不做

不要检查4
，因为6=4+（5
-4
）*2小于计算元素7
检查8=2
+（5
-2
）*2是否在我们的集合中？没有
检查10=2
+（5
-1
）*2-超过最大值（a）==7


索引4，元素7：

如果在预LC中7？是-将其放入结果中

不要选中5
，因为9=5+（7
-5
）*2大于最大值（a）==7



结果=（3，{'count'：3，'start'：1}）#步骤3，count 3，start 1，将其转换为序列
复杂性
它不应该超过O（N^2），我认为这是因为搜索新序列的提前终止，我将在稍后尝试提供详细的分析
代码
def add_precalc(precalc, start, step, count, res, N):
    if step == 0: return True
    if start + step * res[1]["count"] > N: return False

    x = start + step * count
    if x > N or x < 0: return False

    if precalc[x] is None: return True

    if step not in precalc[x]:
        precalc[x][step] = {"start":start, "count":count}

    return True

def work(a):
    precalc = [None] * (max(a) + 1)
    for x in a: precalc[x] = {}
    N, m = max(a), 0
    ind = {x:i for i, x in enumerate(a)}

    res = (0, {"start":0, "count":0})
    for i, x in enumerate(a):
        for el in precalc[x].iteritems():
            el[1]["count"] += 1
            if el[1]["count"] > res[1]["count"]: res = el
            add_precalc(precalc, el[1]["start"], el[0], el[1]["count"], res, N)
            t = el[1]["start"] + el[0] * el[1]["count"]
            if t in ind and ind[t] > m:
                m = ind[t]
        precalc[x] = None

        for y in a[i - m - 1::-1]:
            if not add_precalc(precalc, y, x - y, 2, res, N): break

    return [x * res[0] + res[1]["start"] for x in range(res[1]["count"])]

def add_precalc（precalc、start、step、count、res、N）：
如果步骤==0：返回True
如果开始+步骤*res[1][“计数”]>N：返回False
x=开始+步骤*计数
如果x>N或x<0：返回False
如果预信用证[x]为无：返回
A = [1, 4, 5, 7, 8, 12]    # in sorted order
Aset = set(A)

for d in range(1, 12):
    already_seen = set()
    for a in A:
        if a not in already_seen:
            b = a
            count = 1
            while b + d in Aset:
                b += d
                count += 1
                already_seen.add(b)
            print "found %d items in %d .. %d" % (count, a, b)
            # collect here the largest 'count'

import random
import timeit
import sys

#s = [1,4,5,7,8,12]
#s = [2, 6, 7, 10, 13, 14, 17, 18, 21, 22, 23, 25, 28, 32, 39, 40, 41, 44, 45, 46, 49, 50, 51, 52, 53, 63, 66, 67, 68, 69, 71, 72, 74, 75, 76, 79, 80, 82, 86, 95, 97, 101, 110, 111, 112, 114, 115, 120, 124, 125, 129, 131, 132, 136, 137, 138, 139, 140, 144, 145, 147, 151, 153, 157, 159, 161, 163, 165, 169, 172, 173, 175, 178, 179, 182, 185, 186, 188, 195]
#s = [0, 6, 7, 10, 11, 12, 16, 18, 19]

m = [random.randint(1,40000) for r in xrange(20000)]
s = list(set(m))
s.sort()

lenS = len(s)
halfRange = (s[lenS-1] - s[0]) // 2

while s[lenS-1] - s[lenS-2] > halfRange:
    s.pop()
    lenS -= 1
    halfRange = (s[lenS-1] - s[0]) // 2

while s[1] - s[0] > halfRange:
    s.pop(0)
    lenS -=1
    halfRange = (s[lenS-1] - s[0]) // 2

n = lenS

largest = (s[n-1] - s[0]) // 2
#largest = 1000 #set the maximum size of d searched

maxS = s[n-1]
maxD = 0
maxSeq = 0
hCount = [None]*(largest + 1)
hLast = [None]*(largest + 1)
best = {}

start = timeit.default_timer()

for i in range(1,n):

    sys.stdout.write(repr(i)+"\r")

    for j in range(i-1,-1,-1):
        d = s[i] - s[j]
        numLeft = n - i
        if d != 0:
            maxPossible = (maxS - s[i]) // d + 2
        else:
            maxPossible = numLeft + 2
        ok = numLeft + 2 > maxSeq and maxPossible > maxSeq

        if d > largest or (d > maxD and not ok):
            break

        if hLast[d] != None:
            found = False
            for k in range (len(hLast[d])-1,-1,-1):
                tmpLast = hLast[d][k]
                if tmpLast == j:
                    found = True
                    hLast[d][k] = i
                    hCount[d][k] += 1
                    tmpCount = hCount[d][k]
                    if tmpCount > maxSeq:
                        maxSeq = tmpCount
                        best = {'len': tmpCount, 'd': d, 'last': i}
                elif s[tmpLast] < s[j]:
                    del hLast[d][k]
                    del hCount[d][k]
            if not found and ok:
                hLast[d].append(i)
                hCount[d].append(2)
        elif ok:
            if d > maxD: 
                maxD = d
            hLast[d] = [i]
            hCount[d] = [2]


end = timeit.default_timer()
seconds = (end - start)

#print (hCount)
#print (hLast)
print(best)
print(seconds)

input = [1, 4, 5, 7, 8, 12]

[1, 4, 5, 7, 8, 12]
 x  3  4  6  7  11   # distance from point i to point 0
 x  x  1  3  4   8   # distance from point i to point 1
 x  x  x  2  3   7   # distance from point i to point 2
 x  x  x  x  1   5   # distance from point i to point 3
 x  x  x  x  x   4   # distance from point i to point 4

def build_columns(l):
    columns = {}
    for x in l[1:]:
        col = []
        for y in l[:l.index(x)]:
            col.append(x - y)
        columns[x] = col
    return columns

def algo(input, columns):
    seqs = []
    for index1, number in enumerate(input[1:]):
        index1 += 1 #first item was sliced
        for index2, distance in enumerate(columns[number]):
            seq = []
            seq.append(input[index2]) # k-th pred
            seq.append(number)
            matches = 1
            for successor in input[index1 + 1 :]:
                column = columns[successor]
                if column[index1] == distance * matches:
                    matches += 1
                    seq.append(successor)
            if (len(seq) > 2):
                seqs.append(seq)
    return seqs

print max(sequences, key=len)

def findLESS(A):
  Aset = set(A)
  lmax = 2
  d = 1
  minStep = 0

  while (lmax - 1) * minStep <= A[-1] - A[0]:
    minStep = A[-1] - A[0] + 1
    for j, b in enumerate(A):
      if j+d < len(A):
        a = A[j+d]
        step = a - b
        minStep = min(minStep, step)
        if a + step in Aset and b - step not in Aset:
          c = a + step
          count = 3
          while c + step in Aset:
            c += step
            count += 1
          if count > lmax:
            lmax = count
    d += 1

  return lmax

print(findLESS([1, 4, 5, 7, 8, 12]))

def findLESS(src):
  r = [False for i in range(src[-1]+1)]
  for x in src:
    r[x] = True

  d = 1
  best = 1

  while best * d < len(r):
    for s in range(d):
      l = 0

      for i in range(s, len(r), d):
        if r[i]:
          l += 1
          best = max(best, l)
        else:
          l = 0

    d += 1

  return best


print(findLESS([1, 4, 5, 7, 8, 12]))

def findLESS(src):
  r = 0
  for x in src:
    r |= 1 << x

  d = 1
  best = 1

  while best * d < src[-1] + 1:
    c = best
    rr = r

    while c & (c-1):
      cc = c & -c
      rr &= rr >> (cc * d)
      c &= c-1

    while c != 1:
      c = c >> 1
      rr &= rr >> (c * d)

    rr &= rr >> d

    while rr:
      rr &= rr >> d
      best += 1

    d += 1

  return best

random.seed(42)
s = sorted(list(set([random.randint(0,200000) for r in xrange(140000)])))

s = sorted(list(set([random.randint(0,2000000) for r in xrange(1400000)])))

Size:                         100000   1000000
Second answer by Armin Rigo:     634         ?
By Armin Rigo, optimized:         64     >5000
O(M^2) algorithm:                 53      2940
O(M^2*L) algorithm:                7       711

lmax = 2
l = [[2 for i in xrange(n)] for j in xrange(n)]
for mid in xrange(n - 1):
    prev = mid - 1
    succ = mid + 1
    while (prev >= 0 and succ < n):
        if a[prev] + a[succ] < a[mid] * 2:
            succ += 1
        elif a[prev] + a[succ] > a[mid] * 2:
            prev -= 1
        else:
            l[mid][succ] = l[prev][mid] + 1
            lmax = max(lmax, l[mid][succ])
            prev -= 1
            succ += 1

print lmax

A = [1, 4, 5, 7, 8, 12]    # in sorted order
Aset = set(A)

lmax = 2
for j, b in enumerate(A):
    for i in range(j):
        a = A[i]
        step = b - a
        if b + step in Aset and a - step not in Aset:
            c = b + step
            count = 3
            while c + step in Aset:
                c += step
                count += 1
            #print "found %d items in %d .. %d" % (count, a, c)
            if count > lmax:
                lmax = count

print lmax