C++ C++；11多次匹配捕获组的正则表达式_C++_Regex_C++11_Ecmascript 5

C++ C++；11多次匹配捕获组的正则表达式

c++ regex c++11

C++ C++；11多次匹配捕获组的正则表达式,c++,regex,c++11,ecmascript-5,C++,Regex,C++11,Ecmascript 5,有人能帮我用C++11中的JavaScript（ECMAScript）正则表达式提取“：”和“^”符号之间的文本吗。我不需要捕获hw描述符本身，但它必须存在于行中，以便考虑匹配行的其余部分。另外，：p…^，：m…^和：u…^可以任何顺序到达，并且必须至少有一个在场我尝试使用以下正则表达式： static const std::regex gRegex("(?:hw-descriptor)(:[pmu](.*?)\\^)+", std::regex::icase); 对照以下文本行： "hw-

有人能帮我用C++11中的JavaScript（ECMAScript）正则表达式提取“：”和“^”符号之间的文本吗。我不需要捕获

hw描述符

本身，但它必须存在于行中，以便考虑匹配行的其余部分。另外，

：p…^

，

：m…^

和

：u…^

可以任何顺序到达，并且必须至少有一个在场

我尝试使用以下正则表达式：

static const std::regex gRegex("(?:hw-descriptor)(:[pmu](.*?)\\^)+", std::regex::icase);

对照以下文本行：

"hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^"

以下是发布在上的代码。它显示了我是如何试图解决这个问题的，但是我只得到了一个匹配。我需要了解如何提取与前面描述的pm或u字符对应的每个潜在3个匹配项

#include <iostream>
#include <string>
#include <vector>
#include <regex>

int main()
{
    static const std::regex gRegex("(?:hw-descriptor)(:[pmu](.*?)\\^)+", std::regex::icase);
    std::string foo = "hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^";
    // I seem to only get 1 match here, I was expecting 
    // to loop through each of the matches, looks like I need something like 
    // a pcre global option but I don't know how.
    std::for_each(std::sregex_iterator(foo.cbegin(), foo.cend(), gRegex), std::sregex_iterator(), 
        [&](const auto& rMatch) {
            for (int i=0; i< static_cast<int>(rMatch.size()); ++i) {
                std::cout << rMatch[i] << std::endl;
            }
        });
}

使用

std:：regex

，在将某个字符串与连续重复模式进行匹配时，无法保持多个重复捕获

您可以做的是匹配包含前缀和重复块的整个文本，将后者捕获到一个单独的组中，然后使用第二个较小的正则表达式分别获取所需子字符串的所有出现

这里的第一个正则表达式可能是

hw-descriptor((?::[pmu][^^]*\\^)+)

看。它将匹配

hw描述符和（（？：：[pmu][^^^]*\\^）+
将捕获组1中一个或多个重复的：[pmu][^^]*\^
模式：：
，p
/m
/u
，0个或多个字符，而不是^/code>。找到匹配项后，使用：[pmu][^^^]*\^
regex返回所有真正的“匹配项”
:
重复捕获组在每次匹配时重新写入其缓冲区（量词迭代）。您需要检查hw描述符
是否单独存在。或者捕获整个块，然后在找到有效组后使用另一个匹配代码。@WiktorStribiżew您能告诉我如何在实时代码中这样做吗？演示是一个多类正则表达式的精简版本-只有1个类包含重复捕获组-我可以重新表达正则表达式，使其具有可选的“p”组，然后是可选的“m”组，然后是可选的“u”组，这3个组中至少有1个匹配吗？只是一个想法，但我没有成功地把这样的正则表达式组合在一起，所以不能用C++正则表达式来实现。使用多个匹配，如中所示。这种方法对你合适吗？你不能说两个街区之间是否有任何东西。然而，这是一个两步过程，在一步中验证和捕获所有块，然后在另一步中提取单个块。步骤1^（？：hw描述符）（.*？：[pmu]\^.*）
步骤2：（[pmu]）（.*？\ ^步骤2使用步骤1中的捕获缓冲区作为目标字符串，并全局执行。或更高版本的boost regex:。我会坚持你现在的想法，分两步来做。另一个选项可能是链接和设置的噩梦。如果您需要比ecmascript更强大的功能，那么它可能值得任何一个pcre或类似BoostPerl的东西。请注意，boost regex重复捕获充其量是有问题的（请参阅性能免责声明）。
hw-descriptor((?::[pmu][^^]*\\^)+)

static const std::regex gRegex("hw-descriptor((?::[pmu][^^]*\\^)+)", std::regex::icase);
static const std::regex lRegex(":[pmu][^^]*\\^", std::regex::icase);
std::string foo = "hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^ hw-descriptor:pTEXT8^:mTEXT8^:uTEXT83^";
std::smatch smtch;
for(std::sregex_iterator i = std::sregex_iterator(foo.begin(), foo.end(), gRegex);
                         i != std::sregex_iterator();
                         ++i)
{
    std::smatch m = *i;
    std::cout << "Match value: " << m.str() << std::endl;
    std::string x = m.str(1);
    for(std::sregex_iterator j = std::sregex_iterator(x.begin(), x.end(), lRegex);
                         j != std::sregex_iterator();
                         ++j)
    {
        std::cout << "Element value: " << (*j).str() << std::endl;
    }
}

Match value: hw-descriptor:pTEXT1^:mTEXT2^:uTEXT3^
Element value: :pTEXT1^
Element value: :mTEXT2^
Element value: :uTEXT3^
Match value: hw-descriptor:pTEXT8^:mTEXT8^:uTEXT83^
Element value: :pTEXT8^
Element value: :mTEXT8^
Element value: :uTEXT83^