C++ 正则表达式中的转义（\'；）单引号，它在两个单引号之间使用字符串。_C++_Regex_Token

C++ 正则表达式中的转义（\'；）单引号，它在两个单引号之间使用字符串。

c++ regex

C++ 正则表达式中的转义（\'；）单引号，它在两个单引号之间使用字符串。,c++,regex,token,C++,Regex,Token,我有以下字符串： std::string s("server ('m1.labs.teradata.com') username ('use\\')r_*5') password('u\" er 5') dbname ('default')"); 我使用了以下代码： int main() { std::regex re(R"('[^'\\]*(?:\\[\s\S][^'\\]*)*')"); std::string s("server ('m1.labs.teradata.com') us

我有以下字符串：

std::string s("server ('m1.labs.teradata.com') username ('use\\')r_*5') password('u\" er 5') dbname ('default')");

我使用了以下代码：

int main() {
  std::regex re(R"('[^'\\]*(?:\\[\s\S][^'\\]*)*')");
std::string s("server ('m1.labs.teradata.com') username ('use\\')r_*5') password('u\" er 5') dbname ('default')");
unsigned count = 0;
for(std::sregex_iterator i = std::sregex_iterator(s.begin(), s.end(), re);
                         i != std::sregex_iterator();
                         ++i)
{
    std::smatch m = *i;
    cout << "the token is"<<"   "<< m.str() << endl;
    count++;
}
cout << "There were " << count << " tokens found." << endl;
return 0;

现在，如果代码中提到的字符串s是

std::string s("server ('m1.labs.ter\'adata.com') username ('use\\')r_*5') password('u\" er 5') dbname ('default')");

输出变为：

the token is   'm1.labs.ter'
the token is   ') username ('
the token is   ')r_*5'
the token is   'u" er 5'
the token is   'default'
There were 5 tokens found.

现在，两个字符串的输出不同：预期的输出是“提取括号和单引号之间的所有内容，即

the token is   'm1.labs.teradata.com'
the token is   'use\')r_*5'
the token is   'u" er 5'
the token is   'default'
There were 4 tokens found

我在代码中提到的正则表达式能够正确提取，但不能转义为“单引号”，它可以转义为“，）等，但不能转义为单引号。

可以修改正则表达式以生成所需的输出。提前感谢。

您正在使用我昨天通过评论分享的正确正则表达式。它匹配内部可能已转义单引号的单引号字符串文本

std::regex re(R"('([^'\\]*(?:\\[\s\S][^'\\]*)*)')");
std::string s("server ('m1.labs.teradata.com') username ('u\\'se)r_*5') password('uer 5') dbname ('default')");
unsigned count = 0;
for(std::sregex_iterator i = std::sregex_iterator(s.begin(), s.end(), re);
                         i != std::sregex_iterator();
                         ++i)
{
    std::smatch m = *i;
    cout << "the token is"<<"   "<< m.str(1) << endl;
    count++;
}
cout << "There were " << count << " tokens found." << endl;

或者使用原始字符串文字，其中反斜杠表示文字反斜杠：

R"(('u\'se)r_*5'))"

R“（…）”

形成原始字符串文本

图案细节：

```
”
```
-单引号
```
[^'\]*
```
-0+字符，单引号和反斜杠除外
```
（？：\\[\s\s][^'\\]*）*
```
-零个或多个以下序列：
- ```
\\[\s\s]
```
  -任何反斜杠转义字符
- ```
[^'\]*
```
  -0+字符，而不是
```
'
```
  和
```
\
```
```
”
```
-一个单引号

请注意，为了避免将第一个单引号匹配为转义引号，需要调整表达式，如中所示：

请参阅。

请参阅。要定义文字反斜杠，必须在非原始字符串文字内将其加倍。有文字字符串，也有定义代码中文字字符串的文字字符串。第二个字符串看起来没有正确转义。应该

（'m1.labs.ter\\'adata.com'）

是

（'m1.labs.ter\\'adata.com'）

？@WiktorStribiżew我理解了解释，有没有办法改变正则表达式以转义字符串中的单引号：假设字符串是（'user/'5'），正则表达式应该给我'user'5'（输出应该在单个引号之间。）如果你有

“'a'b'文本”

，你的意思是你想得到

“'a'b'文本”

？@WiktorStribiżew就像我想提取（'**'）之间的代码一样，'**''应该在这里提取。现在假设我有这个字符串作为输入：username（'user'09'））带有正则表达式的提取字符串应为：“user'09”。因此，基本上应完成单引号的转义。如果我不清楚，请告诉我。提前谢谢

"('u\\'se)r_*5')"

R"(('u\'se)r_*5'))"

std::regex re(R"((?:^|[^\\])(?:\\{2})*'([^'\\]*(?:\\[\s\S][^'\\]*)*)')");
std::string s("server ('m1.labs.teradata.com') username ('u\\'se)r_*5') password('uer 5') dbname ('default')");
unsigned count = 0;
for(std::sregex_iterator i = std::sregex_iterator(s.begin(), s.end(), re);
                         i != std::sregex_iterator();
                         ++i)
{
    std::smatch m = *i;
    cout << "the token is"<<"   "<< m.str(1) << endl;
    count++;
}
cout << "There were " << count << " tokens found." << endl;

#include <iostream>
#include <string>
#include <vector>
#include <regex>

using namespace std;

int main() {
    std::regex rx("'[^']*(?:''[^']*)*'");
    std::string sentence("server ('m1.labs.\\''tera\"da  ta.com') username ('us *(er'')5') password('uer 5') dbname ('default')");
    std::vector<std::string> names(std::sregex_token_iterator(sentence.begin(), sentence.end(), rx),
                               std::sregex_token_iterator());

    for( auto & p : names ) cout << p << endl;
    return 0;
}