String 一个字符在一个位置出现的最长子序列_String_Algorithm_Substring_Dynamic Programming_Longest Substring

String 一个字符在一个位置出现的最长子序列

string algorithm

String 一个字符在一个位置出现的最长子序列,string,algorithm,substring,dynamic-programming,longest-substring,String,Algorithm,Substring,Dynamic Programming,Longest Substring,在n个字符的序列中；每个字符在序列中可能出现多次。您希望找到S的最长子序列，其中相同字符的所有出现都集中在一个位置例如，如果S=aaacaaccbcbbab，则最长的子序列（答案）为aaaaaaccccbbbbb，即=aaa_uuaaacc_ccbbb。换句话说，出现在S中的任何字母字符只能出现在子序列中的一个连续块中。如果可能，给出一个多项式时间确定解的算法编辑：对于OP的问题，此解决方案是错误的。我不会删除它，因为它可能适合其他人。：）考虑一个相关的问题：找到给定字符连续出现的S的

在n个字符的序列中；每个字符在序列中可能出现多次。您希望找到S的最长子序列，其中相同字符的所有出现都集中在一个位置

例如，如果S=aaacaaccbcbbab，则最长的子序列（答案）为aaaaaaccccbbbbb，即=aaa_uuaaacc_ccbbb。

换句话说，出现在S中的任何字母字符只能出现在子序列中的一个连续块中。如果可能，给出一个多项式时间

确定解的算法

编辑：对于OP的问题，此解决方案是错误的。我不会删除它，因为它可能适合其他人。：）

考虑一个相关的问题：找到给定字符连续出现的S的最长子序列。这可以在线性时间内解决：

char c = . . .; // the given character
int start = -1;
int bestStart = -1;
int bestLength = 0;
int currentLength = 0;
for (int i = 0; i < S.length; ++i) {
    if (S.charAt(i) == c) {
        if (start == -1) {
            start = i;
        }
        ++currentLength;
    } else {
        if (currentLength > bestLength) {
            bestStart = start;
            bestLength = currentLength;
        }
        start = -1;
        currentLength = 0;
    }
}
if (bestStart >= 0) {
    // longest sequence of c starts at bestStart
} else {
    // character c does not occur in S
}

charc=…；//给定字符
int start=-1；
int-bestStart=-1；
int-bestLength=0；
int currentLength=0；
对于（int i=0；i最佳长度）{
最佳开始=开始；
最佳长度=当前长度；
}
开始=-1；
currentLength=0；
}
}
如果（最佳开始>=0）{
//c的最长序列从bestStart开始
}否则{
//字符c不出现在S中
}

如果不同字符的数量（称为

）相当小，只需对每个字符并行应用此算法即可。这可以通过将

start

、

bestStart

、

currentLength

、

bestLength

转换为数组

长来轻松实现。最后，扫描

bestLength

数组以查找最大条目的索引，并使用

bestStart

数组中的相应条目作为答案。总的复杂性是O（mn）。

import java.util.*；
公共类长子序列{
/**
*@param args
*/
公共静态void main（字符串[]args）{
扫描仪sc=新的扫描仪（System.in）；
字符串str=sc.next（）；
执行（str）；
}
静态void执行（字符串str）{
int[]散列=新的int[256]；
字符串ans=“”；
对于（int i=0；i


空格：256->O（256），我不知道这样说是否正确…，因为O（256）我认为是O（1）
时间：O（n）设计
下面给出了一个C++实现的动态编程算法，解决了这个问题。运行时间的上限（可能不是很紧）由O（g*（n^2+log（g））给出，其中n是字符串的长度，g是输入中不同子序列的数量。我不知道一个很好的方法来描述这个数字，但是对于一个由n个不同字符组成的字符串来说，它可能和O（2^n）一样糟糕，这使得这个算法在最坏的情况下是指数时间。它还使用O（ng）空间来保存DP回忆录表。（与子字符串不同，子序列可能由原始字符串中的非连续字符组成。）在实践中，只要不同字符的数量很小，算法就会很快
提出该算法时使用的两个关键思想是：

长度为n的字符串的每个子序列要么是（a）空字符串，要么是（b）第一个元素位于某个位置1秒表-v runfinder abcdefghijklmnopqrstuvwxyz123456abcdefghijklmnop的子序列
秒表：要运行的命令：。
秒表：开始前的全局内存情况：使用了4128813056个虚拟字节中的20555507968（49%），21453728个物理字节中的1722564608（80%）。
秒表：进程开始时间：21/11/2012 02:53:14
ABCDEFGHIJKLMNOPQRSTUVXYZ123456
秒表：终止。运行时间：8062ms，CPU时间：7437ms，用户时间：7328ms，内核时间：109ms，CPU使用率：92.25%，页面错误：35473（+35473），峰值工作集大小：145440768，峰值VM使用率：145010688，配额峰值分页池使用率：11596，配额峰值非分页池使用率：1256
秒表：进程已完成，退出代码为0。
秒表：流程完成时间：21/11/2012 02:53:22

上一次运行耗时8秒，使用了145Mb，显示了如何处理包含许多不同字符的字符串

编辑：在另一个优化中添加：如果我们能够证明它不可能比目前发现的最好的一个更好，我们现在就退出寻找子序列开始位置的循环。这将最后一个示例所需的时间从32秒降至8秒

不同字符的数量有界限吗？我必须说，我不明白。哪个是期望的输出？“AAAAA CCBBBB”或“aaa___aaacc_ccbbb_b”？为什么在两个“解决方案”中“c”都在“b”之前？这有意义吗？这看起来像一个简单的O（N）问题…答案是AAAAA CCCCBBBB，aaa_u aaacc_CCBBBB_b只是对原始答案的解释。答案没有边界，解决方案可以包括所有不同的字符，如果最长，则只能包括一个是的，这是子序列，所以你必须维持顺序。用下面的短语来表达这个问题是否正确？“在S中找到最少的位置，这样，如果这些字符被删除，剩余序列的属性是，如果它被分解为相同字符的运行，那么同一个字符就不会有两个运行。”我认为如果字符的所有实例都必须在原始字符串中放在一起，这是一个很好的解决方案。然而，我认为这个问题有一个稍微不同的要求，即角色的所有实例都必须

import java.util.*;

public class LongestSubsequence {

    /**
     * @param args
     */
    public static void main(String[] args) {
        Scanner sc = new Scanner(System.in);

        String str = sc.next();

        execute(str);

    }


    static void execute(String str) {

        int[] hash = new int[256];
        String ans = "";

        for (int i = 0; i < str.length(); i++) {

            char temp = str.charAt(i);

            hash[temp]++;
        }

        for (int i = 0; i < hash.length; i++) {
            if (hash[i] != 0) {
                for (int j = 0; j < hash[i]; j++)
                    ans += (char) i;
            }
        }

        System.out.println(ans);
    }
}

#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <functional>
#include <map>

using namespace std;

class RunFinder {
    string s;
    map<string, string> memo[2];    // DP matrix

    // If skip == false, compute the longest valid subsequence of t.
    // Otherwise, compute the longest valid subsequence of the string
    // consisting of t without its first character, taking that first character
    // to be the last character of a preceding subsequence that we will be
    // adding to.
    string calc(string const& t, bool skip) {
        map<string, string>::iterator m(memo[skip].find(t));

        // Only calculate if we haven't already solved this case.
        if (m == memo[skip].end()) {
            // Try the empty subsequence.  This is always valid.
            string best;

            // Try starting a subsequence whose leftmost position is one of
            // the remaining characters.  Instead of trying each character
            // position separately, consider only contiguous blocks of identical
            // characters, since if we choose one character from this block there
            // is never any harm in choosing all of them.
            for (string::const_iterator i = t.begin() + skip; i != t.end();) {
            if (t.end() - i < best.size()) {
                // We can't possibly find a longer string now.
                break;
            }

                string::const_iterator next = find_if(i + 1, t.end(), bind1st(not_equal_to<char>(), *i));
                // Just use next - 1 to cheaply give us an extra char at the start; this is safe
                string u(next - 1, t.end());
                u[0] = *i;      // Record the previous char for the recursive call
                if (skip && *i != t[0]) {
                    // We have added a new segment that is different from the
                    // previous segment.  This means we can no longer use the
                    // character from the previous segment.
                    u.erase(remove(u.begin() + 1, u.end(), t[0]), u.end());
                }
                string v(i, next);
                v += calc(u, true);

                if (v.size() > best.size()) {
                    best = v;
                }

                i = next;
            }

            m = memo[skip].insert(make_pair(t, best)).first;
        }

        return (*m).second;
    }

public:
    RunFinder(string s) : s(s) {}

    string calc() {
        return calc(s, false);
    }
};

int main(int argc, char **argv) {
    RunFinder rf(argv[1]);
    cout << rf.calc() << '\n';
    return 0;
}

C:\runfinder>stopwatch runfinder aaaccaaaccbccbbbab
aaaaaaccccbbbb
stopwatch: Terminated. Elapsed time: 0ms
stopwatch: Process completed with exit code 0.

C:\runfinder>stopwatch runfinder abbaaasdbasdnfa,mnbmansdbfsbdnamsdnbfabbaaasdbasdnfa,mnbmansdbfsbdnamsdnbfabbaaasdbasdnfa,mnbmansdbfsbdnamsdnbfabbaaasdbasdnfa,mnbmansdbfsbdnamsdnbf
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,mnnsdbbbf
stopwatch: Terminated. Elapsed time: 609ms
stopwatch: Process completed with exit code 0.

C:\runfinder>stopwatch -v runfinder abcdefghijklmnopqrstuvwxyz123456abcdefghijklmnop
stopwatch: Command to be run: <runfinder abcdefghijklmnopqrstuvwxyz123456abcdefghijklmnop>.
stopwatch: Global memory situation before commencing: Used 2055507968 (49%) of 4128813056 virtual bytes, 1722564608 (80%) of 2145353728 physical bytes.
stopwatch: Process start time: 21/11/2012 02:53:14
abcdefghijklmnopqrstuvwxyz123456
stopwatch: Terminated. Elapsed time: 8062ms, CPU time: 7437ms, User time: 7328ms, Kernel time: 109ms, CPU usage: 92.25%, Page faults: 35473 (+35473), Peak working set size: 145440768, Peak VM usage: 145010688, Quota peak paged pool usage: 11596, Quota peak non paged pool usage: 1256
stopwatch: Process completed with exit code 0.
stopwatch: Process completion time: 21/11/2012 02:53:22