C++ 迭代字符串，但将分隔符保留在子字符串中，包括其他规则_C++_String_Split_Iteration_Delimiter

C++ 迭代字符串，但将分隔符保留在子字符串中，包括其他规则

c++ string

C++ 迭代字符串，但将分隔符保留在子字符串中，包括其他规则,c++,string,split,iteration,delimiter,C++,String,Split,Iteration,Delimiter,我试图对字符串进行迭代，使其分解为附加到向量末尾的子字符串。此外，我还尝试制定一些其他规则。（撇号被视为字母数字，如果“，”出现在数字之间，则表示ok；如果“.”出现在数字/空白之前，或出现在数字之间，则表示ok）例如： This'.isatest!!!!andsuch .1,00,0.011#$%@ 结果是： myvector[This'][.][isatest][!!!!][andsuch][.1,00,0.011][#$%@] 对于非字母数字字符（和撇号）的拆分，以及对于“，”和“.

我试图对字符串进行迭代，使其分解为附加到向量末尾的子字符串。此外，我还尝试制定一些其他规则。（撇号被视为字母数字，如果“，”出现在数字之间，则表示ok；如果“.”出现在数字/空白之前，或出现在数字之间，则表示ok）

例如：

This'.isatest!!!!andsuch .1,00,0.011#$%@

结果是：

myvector[This'][.][isatest][!!!!][andsuch][.1,00,0.011][#$%@]

对于非字母数字字符（和撇号）的拆分，以及对于“，”和“.”的if语句，我没有遇到任何问题，但在保留分隔符时遇到了麻烦。目前，我得到的信息更像：

myvector[This'][.][isatest][!][!][!][!][andsuch][.1,00,0.011][#][$][%][@]

有什么有用的提示吗？

您想用一些特定于域的产品（如：“逗号分隔的数字”）进行标记化。我选择的武器是Boost:Boost Spirit中的解析器生成器

注意我添加了一个

给你：

#include <boost/spirit/home/x3.hpp>
#include <cassert>

using Tokens = std::vector<std::string>;

Tokens smart_split(std::string const& s) {
    Tokens tokens;

    using namespace boost::spirit::x3;

    auto wordc = char_("a-zA-Z'");
    parse(s.begin(), s.end(), *raw [double_%','| +wordc | +~wordc], tokens);

    return tokens;
}

#include <iostream>
#include <iomanip>

int main()
{
    Tokens const expected { "This'",".","isatest","!!!!","andsuch",".1,00,0.11","#$%@" };
    Tokens const actual = smart_split("This'.isatest!!!!andsuch.1,00,0.11#$%@");

    for (auto t : actual)
        std::cout << std::quoted(t) << ",";

    assert(actual == expected);
}

#include <string>
#include <iterator>
#include <algorithm>
#include <iostream>

template <typename Out>
Out smart_split(char const* first, char const* last, Out out) {
    auto it = first;
    std::string token;

    auto emit = [&] {
        if (!token.empty()) 
            *out++ = token;
        token.clear();
        return out;
    };

    enum { NUMBER_LIST, OTHER } state = OTHER;

    while (it != last) {
#ifndef NDEBUG
        std::cout << std::string(it - first, ' ') << std::string(it, last) << " (token: '" << token << "')\n";
#endif

        if (std::isdigit(*it) || *it == '-' || *it == '+' || *it == '.') {
            if (state != NUMBER_LIST)
                emit();

            char* e;
            std::strtod(it, &e);
            if (it < e) {
                token.append(it, static_cast<char const*>(e));
                it = e;

                if (it != last && *it == ',') {
                    token += *it++;
                    state = NUMBER_LIST;
                }
            } 
            else {
                token += *it++;
            }
        } 
        else if (std::isalpha(*it) || *it == '\'') {
            state = OTHER;
            emit();

            while (it != last && (std::isalpha(*it) || *it == '\'')) {
                token += *it++;
            }

            emit();
        }
        else {
            if (state == NUMBER_LIST)
                emit();
            state = OTHER;
            token += *it++;
        }
    }

    return emit();
}

#include <vector>

typedef std::vector<std::string> Tokens;

int main()
{
    std::string const input = "This'.isatest!!!!andsuch.1,00,0.11#$%@";

    Tokens actual;
    smart_split(input.data(), input.data() + input.size(), back_inserter(actual));

    for (auto& token : actual)
        std::cout << token << "\n";
}

因为我可能有点疯疯癫癫的，所以我花时间做了另一个，手卷式的解析器

正如你所看到的，这并不简单。它冗长乏味，容易出错，很难维护，而且不太通用。你选择

专业提示：编写您理解的代码。这给了你短暂的机会去维护它

#include <boost/spirit/home/x3.hpp>
#include <cassert>

using Tokens = std::vector<std::string>;

Tokens smart_split(std::string const& s) {
    Tokens tokens;

    using namespace boost::spirit::x3;

    auto wordc = char_("a-zA-Z'");
    parse(s.begin(), s.end(), *raw [double_%','| +wordc | +~wordc], tokens);

    return tokens;
}

#include <iostream>
#include <iomanip>

int main()
{
    Tokens const expected { "This'",".","isatest","!!!!","andsuch",".1,00,0.11","#$%@" };
    Tokens const actual = smart_split("This'.isatest!!!!andsuch.1,00,0.11#$%@");

    for (auto t : actual)
        std::cout << std::quoted(t) << ",";

    assert(actual == expected);
}

#include <string>
#include <iterator>
#include <algorithm>
#include <iostream>

template <typename Out>
Out smart_split(char const* first, char const* last, Out out) {
    auto it = first;
    std::string token;

    auto emit = [&] {
        if (!token.empty()) 
            *out++ = token;
        token.clear();
        return out;
    };

    enum { NUMBER_LIST, OTHER } state = OTHER;

    while (it != last) {
#ifndef NDEBUG
        std::cout << std::string(it - first, ' ') << std::string(it, last) << " (token: '" << token << "')\n";
#endif

        if (std::isdigit(*it) || *it == '-' || *it == '+' || *it == '.') {
            if (state != NUMBER_LIST)
                emit();

            char* e;
            std::strtod(it, &e);
            if (it < e) {
                token.append(it, static_cast<char const*>(e));
                it = e;

                if (it != last && *it == ',') {
                    token += *it++;
                    state = NUMBER_LIST;
                }
            } 
            else {
                token += *it++;
            }
        } 
        else if (std::isalpha(*it) || *it == '\'') {
            state = OTHER;
            emit();

            while (it != last && (std::isalpha(*it) || *it == '\'')) {
                token += *it++;
            }

            emit();
        }
        else {
            if (state == NUMBER_LIST)
                emit();
            state = OTHER;
            token += *it++;
        }
    }

    return emit();
}

#include <vector>

typedef std::vector<std::string> Tokens;

int main()
{
    std::string const input = "This'.isatest!!!!andsuch.1,00,0.11#$%@";

    Tokens actual;
    smart_split(input.data(), input.data() + input.size(), back_inserter(actual));

    for (auto& token : actual)
        std::cout << token << "\n";
}

在调试生成的情况下，它还跟踪循环的进度：

This'.isatest!!!!andsuch.1,00,0.11#$%@ (token: '')
     .isatest!!!!andsuch.1,00,0.11#$%@ (token: '')
      isatest!!!!andsuch.1,00,0.11#$%@ (token: '.')
             !!!!andsuch.1,00,0.11#$%@ (token: '')
              !!!andsuch.1,00,0.11#$%@ (token: '!')
               !!andsuch.1,00,0.11#$%@ (token: '!!')
                !andsuch.1,00,0.11#$%@ (token: '!!!')
                 andsuch.1,00,0.11#$%@ (token: '!!!!')
                        .1,00,0.11#$%@ (token: '')
                           00,0.11#$%@ (token: '.1,')
                              0.11#$%@ (token: '.1,00,')
                                  #$%@ (token: '.1,00,0.11')
                                   $%@ (token: '#')
                                    %@ (token: '#$')
                                     @ (token: '#$%')

提示1：提及您使用的编程语言。最好也把它放在一个标记中。将它添加到标记和帖子中-谢谢你已经知道了和相关的函数吗？是的，我在一个接受字符的方法中使用这行代码：return（isalnum（temp）| temp='\''；谢谢你的快速回复，但是在std库中有没有类似的方法？对于C++，我还是很新的，所以我宁愿坚持下去，直到我的脚下。为了完整性：和版本你可以做一切与标准库。这将更加困难。您希望数字的确切格式是什么？空间是否重要？等。数字应该被视为字母和撇号，以及我已经从我正在阅读的行中去掉的空格。什么。数字应该被视为字母或撇号？（这毫无意义。注意，你可以从我的回答中看出我已经知道了这一部分。）这完全有效，非常棒！不幸的是，由于我作业的特殊性，我不得不用一堆if语句来强制它，但我很可能会在另一个类中针对类似的问题回到这里……你的if-else森林和我的一样稠密吗？我认为这很好地避开了它，只使用了一点函数风格：（使用NDEBUG编译），而仍然只是标准库