String 解释「；循环滚动算法“；_String_Algorithm_Language Agnostic_Cycle

String 解释「；循环滚动算法“；

string algorithm language-agnostic

String 解释「；循环滚动算法“；,string,algorithm,language-agnostic,cycle,String,Algorithm,Language Agnostic,Cycle,我正在尝试实现Evgeny Kluev在回答问题时给出的算法但我很难让它工作。下面是一个我试图按照他的指示手工计算的示例： text: ababacababd <STEP 1> suffixes and LCPs: ababacababd 4

我正在尝试实现Evgeny Kluev在回答问题时给出的算法但我很难让它工作。下面是一个我试图按照他的指示手工计算的示例：

text: ababacababd


<STEP 1>                                    
  suffixes and LCPs:                               

  ababacababd                                            
  4 
  ababd 
  3 
  abacababd 
  2  
  abd
  1
  acababd
  0
  babacababd
  3
  babd
  2
  bacababd
  1
  bd
  0
  cababd
  0
  d

<STEP 2>
  sorted LCP array indices: 0,1,5,2,6,3,7,4,8,9
                 (values) : 4,3,3,2,2,1,1,1,0,0

<STEP 3>
LCP groups sorted by position in text (format => position: entry):
  lcp 4:
    0: ababacababd
    6: ababd

  lcp 3:
    1: babacababd
    2: abacababd
    6: ababd
    7: babd

  lcp 2:
    2: abacababd
    3: bacababd
    7: babd
    8: abd

  lcp 1:
    3: bacababd
    4: acababd
    8: abd
    9: bd

  lcp 0:
    0: ababacababd
    1: babacababd
    4: acababd
    5: cababd
    9: bd
   10: d

<STEP 4>
entries remaining after filter (LCP == positional difference):
   none! only (abd),(bd) and (bacababd),(acababd) from LCP 1 group
   have positional difference equal to 1 but they don't share prefixes
   with each other. shouldn't i have at least (ab) and (ba) here?

text:abababad
后缀和LCP：
阿巴巴德
4.
阿巴德
3.
阿巴卡巴德
2.
阿布德
1.
阿卡巴德
0
巴巴卡巴德
3.
巴布
2.
巴卡巴德
1.
屋宇署
0
卡巴德
0
D
排序LCP数组索引：0,1,5,2,6,3,7,4,8,9
（值）：4,3,3,2,2,1,1,0,0
文本中按位置排序的LCP组（格式=>位置：条目）：
立法会四题：
0:abababad
6:ababd
立法会三题：
1：巴巴卡巴德
2:Abacabd
6:ababd
7:babd
立法会二题：
2:Abacabd
3:bacababd
7:babd
8:abd
立法会一题：
3:bacababd
4：阿卡巴德
8:abd
9:bd
lcp 0:
0:abababad
1：巴巴卡巴德
4：阿卡巴德
5:cababd
9:bd
10:d
筛选后剩余的条目（LCP==位置差异）：
没有一个LCP1组仅（abd），（bd）和（bacababd），（ACABAD）
位置差等于1，但不共享前缀
彼此之间。我不应该至少有（ab）和（ba）吗？

有人能告诉我在这个过程中我做错了什么吗

另外，他说在第4步的末尾，我们应该在文本中有所有可能的序列，他是指所有可能的重复序列吗

这是一个已知的算法，其名称我可以在其他地方找到更多详细信息吗

（我也对他在第5步中对相交序列的定义感到困惑，但如果我正确理解前面的步骤，也许这会有意义）

编辑：以下是在Evgeny的有益澄清之后，我对步骤4、5、6的内容：

<STEP 4>
filter pseudocode:
results = {}
for (positions,lcp) in lcp_groups:
  results[lcp] = []
  while positions not empty:
    pos = positions.pop(0) #pop lowest element
    if (pos+lcp) in positions:
      common = prefix(input, pos, lcp)
      if common.size() < lcp:
        next
      i = 1
      while (pos+lcp*(i+1)) in positions:
        if common != prefix(input, pos+lcp*i, lcp):
          break
        positions.delete(pos+lcp*i)
        i += 1

      results[lcp].append( (common, pos, i+1) )

application of filter logic:
  lcp 4:
    0: ababacababd # 4 not in {6}
    6: ababd       # 10 not in {}

  lcp 3:
    0: ababacababd # 3 not in {1,2,6,7}
    1: babacababd  # 4 not in {2,6,7}
    2: abacababd   # 5 not in {6,7}
    6: ababd       # 9 not in {7}
    7: babd        # 10 not in {}

  lcp 2:
    0: ababacababd # 2 in {1,2,3,6,7,8}, 4 not in {1,2,3,6,7,8} => ("ab", 0, 2)
    1: babacababd  # 3 in {2,3,6,7,8}, 5 not in {2,3,6,7,8} => ("ba", 1, 2)
    2: abacababd   # 4 not in {3,6,7,8}
    3: bacababd    # 5 not in {6,7,8}
    6: ababd       # 8 in {7,8}, 10 not in {7,8} => ("ab", 6, 2)
    7: babd        # 9 not in {8}
    8: abd         # 10 not in {}

  lcp 1:
    0: ababacababd # 1 in {1,2,3,4,6,7,8,9}, prefix is ""
    1: babacababd  # 2 in {2,3,4,6,7,8,9}, prefix is ""
    2: abacababd   # 3 in {3,4,6,7,8,9}, prefix is ""
    3: bacababd    # 4 in {4,6,7,8,9}, prefix is ""
    4: acababd     # 5 not in {6,7,8,9}
    6: ababd       # 7 in {7,8,9}, prefix is ""
    7: babd        # 8 in {8,9}, prefix is ""
    8: abd         # 9 in {9}, prefix is ""
    9: bd          # 10 not in {}

sequences: [("ab", 0, 2), ("ba", 1, 2), ("ab", 6, 2)]

<STEP 5>
add sequences in order of LCP grouping. sequences within an LCP group
are added according to which was generated first:
  lcp 4: no sequences
  lcp 3: no sequences
  lcp 2: add ("ab", 0, 2)
  lcp 2: dont add ("ba", 1, 2) because it intersects with ("ab", 0, 2)
  lcp 2: add ("ab", 6, 2)
  lcp 1: no sequences

collection = [("ab", 0, 2), ("ab", 6, 2)]
(order by position not by which one was added first)

<STEP 6>
recreate input by iterating through the collection in order and 
filling in gaps with the normal input:
  input = "ab"*2 + input[4..5] + "ab"*2 + input[10..10]


筛选器伪代码：
结果={}
对于lcp_组中的（职位，lcp）：
结果[lcp]=[]
虽然职位不为空：
pos=位置。pop（0）#pop最低元素
如果（pos+lcp）处于以下位置：
公共=前缀（输入、pos、lcp）
如果通用.size（）（“ab”，0,2）
1:babacababd#3在{2,3,6,7,8}中，5不在{2,3,6,7,8}=>（“ba”，1,2）
2:Abacabab#4不在{3,6,7,8}
3:bacababd#5不在{6,7,8}
6:ababd#8在{7,8}，10不在{7,8}=>（“ab”，6,2）
7:babd#9不在{8}
8:abd#10不在{}
立法会一题：
0:abababad#1在{1,2,3,4,6,7,8,9}中，前缀为“”
1:babacababd#2在{2,3,4,6,7,8,9}中，前缀为“”
2:ABACABD#3在{3,4,6,7,8,9}中，前缀为“”
3:bacababd#4在{4,6,7,8,9}中，前缀为“”
4:Acababad#5不在{6,7,8,9}
6:ababd#7在{7,8,9}中，前缀为“”
7:babd#8在{8,9}中，前缀为“”
8:abd#9在{9}中，前缀为“”
9:bd#10不在{}
序列：[（“ab”，0，2），（“ba”，1，2），（“ab”，6，2）]
按LCP分组顺序添加序列。LCP组内的序列
根据首先生成的内容添加：
立法会四题：没有序列
立法会三题：没有序列
立法会二题：加入（“ab”，0，2）
lcp 2：不要添加（“ba”，1，2），因为它与（“ab”，0，2）相交
立法会第二题：增补（“ab”，6，2）
立法会一题：没有序列
集合=[（“ab”，0，2），（“ab”，6，2）]
（按位置排序，而不是先添加位置）
通过按顺序和顺序遍历集合来重新创建输入
用正常输入填充间隙：
input=“ab”*2+输入[4..5]+“ab”*2+输入[10..10]

Evgeny，如果你再看一次，我想问你一个简单的问题：我是否正确执行了步骤5？也就是说，我是否根据生成它们的LCP组添加序列（首先是LCP值较高的组）？还是与LCP有关的其他问题

此外，如果第4步或第6步有任何错误，请让我知道，但我所做的似乎在本例中效果很好。

我必须澄清原始答案中“按LCP值分组”的含义。事实上，对于具有选定LCP值的组，我们应该包括具有较大LCP值的所有条目

这意味着对于您的示例，在处理LCP3时，我们需要将前面的条目0和6合并到此组中。在处理LCP2时，我们需要将前面的所有条目与LCP3和LCP4合并：0、1、2、6、7

因此，两（ab）对以及一（ba）对在过滤后剩余。但由于（ba）与第一（ab）对“相交”，因此在步骤5中被拒绝

另外，他说在第4步的末尾，我们应该在文本中有所有可能的序列，他是指所有可能的重复序列吗

没错，我指的是所有可能的重复序列

这是一个已知的算法，其名称我可以在其他地方找到更多详细信息吗

我不知道。以前从未见过这样的算法

下面是如何执行步骤2。。4可以实现（在伪代码中）：

在这里，我没有为

位置

规划任何特定的数据结构。但为了方便起见，假定使用有序关联数组。RMQ是范围最小查询，所以LCP数组应该进行相应的预处理

此代码实际上与OP中的代码相同。但它使用RMQ，而不是昂贵的字符串比较（如

common！=prefix（input，pos+lcp*i，lcp）

），RMQ（如果正确实现）几乎可以立即工作（并且具有相同的效果

for (in_sa, in_src) in suffix_array: # step 2
  lcp_groups[max(LCP[in_sa.left], LCP[in_sa.right])].append((in_sa, in_src))
apply(sort_by_position_in_src, lcp_groups) # step 3
for lcp from largest downto 1: # step 4
  # sorted by position in src array and unique:
  positions = merge_and_uniq(positions, lcp_groups[lcp])
  for start in positions:
    pos = start
    while (next = positions[pos.in_src + lcp]).exists
           and LCP.RMQ(pos.in_sa, next.in_sa) >= lcp
           and not(prev = positions[pos.in_src - lcp]).exists  # to process each
                   and LCP.RMQ(pos.in_sa, prev.in_sa) >= lcp): # chain only once
      pos = next
    if pos != start:
      pass_to_step5(start, lcp, pos + lcp)

def loop_rolling(begin, end):
  distance = (end - begin) / 2)
  for d from distance downto 1:
    start = pos = begin
    while pos + d < end:
      while (pos + d < end) and (src[pos] == src[pos + d]):
        ++pos
      repeats = floor((pos - start) / d)
      if repeats > 0:
        pass_to_step5(start, d, start + d * (repeats + 1))
      start = pos

def loop_rolling(begin, end, distance):
  distance = min(distance, (end - begin) / 2))
  for d from distance downto 1:
    start = pos = begin
    while pos + d < end:
      while (pos + d < end) and (src[pos] == src[pos + d]):
        ++pos
      repeats = floor((pos - start) / d)
      if repeats > 0:
        loop_rolling(begin, start, d - 1)
        print repeats+1, "*("
        loop_rolling(start, start + d, d - 1) # "nested loops"
        print ')'
        loop_rolling(start + d * (repeats + 1), end, d)
        return
      else:
        if d == 1: print src[start .. pos]
        start = pos