Python 匹配除特定字符串之外的所有内容_Python_Regex

Python 匹配除特定字符串之外的所有内容

python regex

Python 匹配除特定字符串之外的所有内容,python,regex,Python,Regex,我看过很多类似标题的帖子，但我没有发现任何与python或本网站相关的内容：除了特定文本，我如何匹配所有内容我的文本： 1234_This is a text Word AB Protocol Address ping Internet 1.1.1.1 - Internet 1.1.1.2 25 Internet 1.1.1.3 8 Internet 1.1.1.4 -

我看过很多类似标题的帖子，但我没有发现任何与python或本网站相关的内容：

除了特定文本，我如何匹配所有内容

我的文本：

1234_This is a text Word AB

Protocol  Address          ping
Internet  1.1.1.1            - 
Internet  1.1.1.2            25 
Internet  1.1.1.3            8 
Internet  1.1.1.4            - 

1234_This is a text Word BCD    
Protocol  Address          ping
Internet  2.2.2.1            10 
Internet  2.2.2.2            -

我想匹配

Word\w+

，然后匹配其余的，直到下一个1234。因此，结果应该是（在

（）

中标记的返回组）：

第一部分很简单：

matches=re.findall（r'1234\u这是一个文本（Word\w+），var）

但下一部分我无法实现。我尝试过消极前瞻：

^（？！1234）

但是它再也不匹配了…

code

使用

修饰符，您可以使用以下内容。

解释

```
（1234[\w]+（Word\w+）
```
将以下内容捕获到捕获组1中
- ```
1234
```
  逐字匹配
- ```
[\w]+
```
  匹配一个或多个单词字符或空格
- ```
（Word\w+）
```
  将以下内容捕获到捕获组2中
  - ```
  Word
```
  逐字匹配（注意尾随空格）
- ```
\w+
```
    将任何单词字符匹配一次或多次
```
（（？：（？！1234）[\s\s]）*）
```
将以下内容捕获到捕获组2中
- ```
（？：（？！1234）[\s\s]）*
```
  匹配以下任意次数（）
  - ```
  （？！1234）
```
  反向前瞻确保后续内容不匹配
- ```
[\s\s]）*
```
    匹配任意字符任意次数

正如您所说：

我要匹配Word\w+，然后匹配其余的，直到下一个1234

你想要这样的吗

import re
pattern=r'((1234_This is a text) (Word\s\w+))((\n?.*(?!\n\n))*)'
string="""1234_This is a text Word AB

Protocol  Address          ping
Internet  1.1.1.1            -
Internet  1.1.1.2            25
Internet  1.1.1.3            8
Internet  1.1.1.4            -

1234_This is a text Word BCD
Protocol  Address          ping
Internet  2.2.2.1            10
Internet  2.2.2.2            -"""

match=re.finditer(pattern,string,re.M)
for find in match:
    print("this is group_1 {}".format(find.group(1)))
    print("this is group_3 {}".format(find.group(3)))




    print("this is group_4 {}".format(find.group(4)))

输出：

this is group_1 1234_This is a text Word AB
this is group_3 Word AB
this is group_4 

Protocol  Address          ping
Internet  1.1.1.1            -
Internet  1.1.1.2            25
Internet  1.1.1.3            8
Internet  1.1.1.4            
this is group_1 1234_This is a text Word BCD
this is group_3 Word BCD
this is group_4 
Protocol  Address          ping
Internet  2.2.2.1            10
Internet  2.2.2.2            -

谢谢，这很有效。哇，这种消极的前瞻性的东西仍然很难让我理解…@Carnivore先生，它基本上是这样说的：在字符串的这个位置，下一个字符是否匹配

？如果是，请停止匹配，否则继续匹配。

[\s\s]

用于什么？您可以将其替换为

（匹配任何字符）。但是，

不起作用，尽管我希望它能起作用。@mrCarnivore如果你在正则表达式中打开单行修饰符，你可以使用

<代码>与换行符不匹配，因此使用了

[\s\s]

[\s\s]

表示匹配任何空白或非空白字符（换句话说，匹配任何字符）。@mrCarnivore lookaheads和lookbehinds实际上并不匹配字符来使用它们：它们基本上是断言。这意味着它将确保在X位置（无论X代表什么），确保Y匹配或不匹配（其中Y是某个条件）。不，这不是我想要的结果。我希望返回原始文本并将其分成不同的捕获组（我在问题中标记的组）。注意：还有一个嵌套的捕获组！非常感谢。这也行得通。但是，另一种解决方案更健壮一些，因为如果数据块之间没有空行，它也可以工作。

(1234[\w ]+(Word \w+))((?:(?!1234).)*)

import re
pattern=r'((1234_This is a text) (Word\s\w+))((\n?.*(?!\n\n))*)'
string="""1234_This is a text Word AB

Protocol  Address          ping
Internet  1.1.1.1            -
Internet  1.1.1.2            25
Internet  1.1.1.3            8
Internet  1.1.1.4            -

1234_This is a text Word BCD
Protocol  Address          ping
Internet  2.2.2.1            10
Internet  2.2.2.2            -"""

match=re.finditer(pattern,string,re.M)
for find in match:
    print("this is group_1 {}".format(find.group(1)))
    print("this is group_3 {}".format(find.group(3)))




    print("this is group_4 {}".format(find.group(4)))

this is group_1 1234_This is a text Word AB
this is group_3 Word AB
this is group_4 

Protocol  Address          ping
Internet  1.1.1.1            -
Internet  1.1.1.2            25
Internet  1.1.1.3            8
Internet  1.1.1.4            
this is group_1 1234_This is a text Word BCD
this is group_3 Word BCD
this is group_4 
Protocol  Address          ping
Internet  2.2.2.1            10
Internet  2.2.2.2            -