C++ 正则表达式，用于使用可接受的值按任意顺序验证字段语法_C++_Regex_Boost_Tokenize

C++ 正则表达式，用于使用可接受的值按任意顺序验证字段语法

c++ regex boost

C++ 正则表达式，用于使用可接受的值按任意顺序验证字段语法,c++,regex,boost,tokenize,C++,Regex,Boost,Tokenize,考虑以下情况：我们希望使用正则表达式来验证具有X个字段的命令的语法—一个必填字段，两个可选字段。这三个字段可以按任意顺序显示，可以用任意数量的空格分隔，并且具有有限的可接受值字典 Mandatory Field: "-foo" Optional Field 1: Can be either of "-handle" "-bar" or "-mustache" Optional Field 2: Can be either of "-meow" "-mix" or "-want" 有效输

考虑以下情况：我们希望使用正则表达式来验证具有X个字段的命令的语法—一个必填字段，两个可选字段。这三个字段可以按任意顺序显示，可以用任意数量的空格分隔，并且具有有限的可接受值字典

Mandatory Field:  "-foo"
Optional Field 1:  Can be either of "-handle" "-bar" or "-mustache"
Optional Field 2:  Can be either of "-meow" "-mix" or "-want"

有效输入的示例：

-foo
-foo           -bar
-foo-want
-foo -meow-bar
-foo-mix-mustache
-handle      -foo-meow
-mustache-foo
-mustache -mix -foo
-want-foo
-want-meow-foo
-want-foo-meow

woof
-handle-meow
-ha-foondle
meow
-foobar
stackoverflow
- handle -foo -mix
-handle -mix
-foo -handle -bar
-foo -handle -mix -sodium

无效输入的示例：

-foo
-foo           -bar
-foo-want
-foo -meow-bar
-foo-mix-mustache
-handle      -foo-meow
-mustache-foo
-mustache -mix -foo
-want-foo
-want-meow-foo
-want-foo-meow

woof
-handle-meow
-ha-foondle
meow
-foobar
stackoverflow
- handle -foo -mix
-handle -mix
-foo -handle -bar
-foo -handle -mix -sodium

我想你可以说，有三个捕获组，第一个是强制性的，最后两个是可选的：

(\-foo){1}
(\-handle|\-bar|\-mustache)?
(\-meow|\-mix|\-want)?

但我不知道如何写它，使它们可以以任何顺序，可能被任何数量的空格分隔，而没有其他任何东西

到目前为止，我有三个前瞻性的捕获组：（%的标志指示要完成的内容）

添加一个新的捕获组非常简单，或者为现有组扩展可接受的输入，但我肯定会被反向引用难倒，并且不太确定扩展检查以容纳第四个组会如何影响反向引用

或者只在“-”字符上使用类似boost:：split或boost:：tokenize的内容，然后对它们进行迭代，计算属于组1、2、3和“以上无一个”的标记，并验证计数是否更合理

看起来它应该是boost库的简单扩展或应用程序

你提到了boost。你看过课程选项了吗

事实上，上下文无关语法就可以了。让我们将命令解析为如下结构：

struct Command {
    std::string one, two, three;
};

现在，当我们将其作为一个融合序列进行调整时，我们可以为其编写一个精灵气语法，并享受自动逻辑属性传播：

CommandParser() : CommandParser::base_type(start) {
    using namespace qi;

    command = field(Ref(&f1)) ^ field(Ref(&f2)) ^ field(Ref(&f3));
    field   = '-' >> raw[lazy(*_r1)];

    f1 += "foo";
    f2 += "handle", "bar", "mustache";
    f3 += "meow", "mix", "want";

    start   = skip(blank) [ command >> eoi ] >> eps(is_valid(_val));
}

在这里，一切都是直截了当的：允许所有三个字段以任何顺序排列

f1、f2、f3是各字段可接受的符号（

选项

，如下）

最后，开始规则添加了跳过空格，并在末尾进行检查（我们是否达到了

eoi

？是否存在必填字段？）

现场演示

#include <boost/fusion/adapted/struct.hpp>
struct Command {
    std::string one, two, three;
};

BOOST_FUSION_ADAPT_STRUCT(Command, one, two, three)

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

namespace qi = boost::spirit::qi;

template <typename It> 
struct CommandParser : qi::grammar<It, Command()> {
    CommandParser() : CommandParser::base_type(start) {
        using namespace qi;

        command = field(Ref(&f1)) ^ field(Ref(&f2)) ^ field(Ref(&f3));
        field   = '-' >> raw[lazy(*_r1)];

        f1 += "foo";
        f2 += "handle", "bar", "mustache";
        f3 += "meow", "mix", "want";

        start   = skip(blank) [ command >> eoi ] >> eps(is_valid(_val));
    }
  private:
    // mandatory field check
    struct is_valid_f {
        bool operator()(Command const& cmd) const { return cmd.one.size(); }
    };
    boost::phoenix::function<is_valid_f> is_valid;

    // rules and skippers
    using Options = qi::symbols<char>;
    using Ref     = Options const*;
    using Skipper = qi::blank_type;

    qi::rule<It, Command()> start;
    qi::rule<It, Command(), Skipper> command;
    qi::rule<It, std::string(Ref)> field;

    // option values
    Options f1, f2, f3;
};

boost::optional<Command> parse(std::string const& input) {
    using It = std::string::const_iterator;

    Command cmd;
    bool ok = parse(input.begin(), input.end(), CommandParser<It>{}, cmd);

    return boost::make_optional(ok, cmd);
}

#include <iomanip>
void run_test(std::string const& input, bool expect_valid) {
    auto result = parse(input);

    std::cout << (expect_valid == !!result?"PASS":"FAIL") << "\t" << std::quoted(input) << "\n";
    if (result) {
        using boost::fusion::operator<<;
        std::cout << " --> Parsed: " << *result << "\n";
    }
}

int main() {
    char const* valid[] = { 
        "-foo",
        "-foo           -bar",
        "-foo-want",
        "-foo -meow-bar",
        "-foo-mix-mustache",
        "-handle      -foo-meow",
        "-mustache-foo",
        "-mustache -mix -foo",
        "-want-foo",
        "-want-meow-foo",
        "-want-foo-meow",
    };
    char const* invalid[] = {
        "woof",
        "-handle-meow",
        "-ha-foondle",
        "meow",
        "-foobar",
        "stackoverflow",
        "- handle -foo -mix",
        "-handle -mix",
        "-foo -handle -bar",
        "-foo -handle -mix -sodium",
    };

    std::cout << " === Positive test cases:\n";
    for (auto test : valid)   run_test(test, true);
    std::cout << " === Negative test cases:\n";
    for (auto test : invalid) run_test(test, false);
}

这是一种强力解决方案，适用于相当简单的情况

其思想是根据这些捕获组出现的顺序的所有排列建立一个正则表达式

在测试数据中只有

排列。显然，这种方法很容易变得笨拙

// Build all the permutations into a regex.
std::regex const e{[]{

    std::string e;

    char const* grps[] =
    {
        "\\s*(-foo)",
        "\\s*(-handle|-bar|-mustache)?",
        "\\s*(-meow|-mix|-want)?",
    };

    // initial permutation
    std::sort(std::begin(grps), std::end(grps));

    auto sep = "";

    do
    {
        e = e + sep + "(?:";
        for(auto const* g: grps)
            e += g;
        e += ")";
        sep = "|"; // separate each permutation with |
    }
    while(std::next_permutation(std::begin(grps), std::end(grps)));

    return e;

}(), std::regex_constants::optimize};

// Do some tests

std::vector<std::string> const tests =
{
    "-foo",
    "-foo           -bar",
    "-foo-want",
    "-foo -meow-bar",
    "-foo-mix-mustache",
    "-handle      -foo-meow",
    "-mustache-foo",
    "-mustache -mix -foo",
    "-want-foo",
    "-want-meow-foo",
    "-want-foo-meow",
    "woof",
    "-handle-meow",
    "-ha-foondle",
    "meow",
    "-foobar",
    "stackoverflow",
    "- handle -foo -mix",
    "-handle -mix",
    "-foo -handle -bar",
    "-foo -handle -mix -sodium",
};

std::smatch m;
for(auto const& test: tests)
{
    if(!std::regex_match(test, m, e))
    {
        std::cout << "Invalid: " << test << '\n';
        continue;
    }
    std::cout << "Valid: " << test << '\n';
}

//将所有排列构建到正则表达式中。
std:：regex常量e{[]{
std：：字符串e；
字符常量*grps[]=
{
“\\s*（-foo）”，
“\\s*（-handle-bar-mustache）？”，
“\\s*（-meow-mix-want）？”，
};
//初始排列
标准：：排序（标准：：开始（grps），标准：：结束（grps））；
自动sep=“”；
做
{
e=e+sep+“（？：”；
用于（自动常数*g:grps）
e+=g；
e+=”；
sep=“|”；//用|
}
while（std：：next_置换（std：：begin（grps），std：：end（grps））；
返回e；
}（），std:：regex_constants:：optimize}；
//做一些测试
向量常数测试=
{
“-foo”，
“-foo-bar”，
“-foo want”，
“-foo-meow-bar”，
“-foo-mix胡须”，
“-handle-foo-meow”，
“-胡子福”，
“-胡子-混合-福”，
“-想要食物”，
“-想要猫咪”，
“-想要福喵”，
“汪”，
“-处理喵喵”，
“-ha foondle”，
“喵喵”，
“-foobar”，
“堆栈溢出”，
“-handle-foo-mix”，
“-handle-mix”，
“-foo-handle-bar”，
“-foo-handle-mix-na”，
};
std：：smatch m；
用于（自动常量和测试：测试）
{
如果（！std:：regex_匹配（test，m，e））
{
std：：你决定使用正则表达式有什么特别的原因吗？虽然这里有一个解决方案，但它似乎太复杂了。我知道我设计得太多了。也许正则表达式不是答案，但我觉得必须有一个boost库可以优雅地实现这一功能。一个上下文无关的语法ar之类的。我们基本上有3个字段，每个字段都定义了一组可接受的值。一个字段是必需的，另外两个是可选的，这三个字段都可以以任何顺序出现。Jared B的答案很可能接近于Boost的这一优雅功能。有了它，我相信您仍然可以对字符串参数值进行操作使用正则表达式，如果这就是你想要的。正则表达式是为了解决一组特定的字符串问题而设计的，虽然你可以做到这一点，但它肯定已经达到了它的目的：）允许来自同一字段的两个项吗？为什么-want meow foo
有效，而-foo-handle-bar
无效？@Galik我认为它不应该有效在我的回答中被允许。这看起来和我想要的相似，但我特别想对字符串数据进行操作，不一定是程序选项。这将非常困难™ 要使用Boost PO实现您想要的功能，请参见，例如，为了展示automagic属性传播有多好，这里有一个版本，它使数据成员成为可选的
。它确实可以提供更漂亮的即时输出。有趣的是，我一定要阅读所用库的语法，以了解我是如何使用的t正在工作，如何根据我想要的目的调整它。特别是自从我接触CFGs以来已经有8-10年了。似乎功能非常强大。排列解析器的存在肯定会有所帮助。我对必填字段的手动验证感到有点惊讶。如果有一些默认验证，我将进行一些阅读在这一点上，boost:：optional的使用也可以触发。感谢您的帮助。如果我能弄明白，这将是一个强大的工具。*嗯哼*我的第一条评论中是否显示了boost optional案例的验证。实际上，Spirit的解析器模型是PEG，它比CFGOne更强大。问题：为什么我们需要使用fusion structure，当所有容器项目