Python 正则表达式匹配以句点结尾的段落_Python_Regex

Python 正则表达式匹配以句点结尾的段落

python regex

Python 正则表达式匹配以句点结尾的段落,python,regex,Python,Regex,我有一系列可以采用这种格式的文档： Diagnosis of one of the following: A) Neovascular (wet) age-related macular degeneration OR B) Macular edema following retinal vein occlusion, OR C) Diabetic macular edema OR D) Diabetic retinopathy in patients with diabetic macular

我有一系列可以采用这种格式的文档：

Diagnosis of one of the following: A) Neovascular (wet) age-related
macular degeneration OR B) Macular edema following retinal vein
occlusion, OR C) Diabetic macular edema OR D) Diabetic retinopathy in
patients with diabetic macular edema. More text here.

PA Criteria

Criteria Details


Eylea (s)

Products Affected
 EYLEA

Exclusion
Criteria

Required
Medical
Information

Age Restrictions

Prescriber
Restrictions

Coverage
Duration

Other Criteria

Off Label Uses











12 months

Indications

All Medically-accepted Indications.

Formulary ID 20276, Version 12

101

我想匹配（然后删除）以句号结尾的段落中的所有文本。因此，我想删除

Diagnosis of one of the following: A) Neovascular (wet) age-related
macular degeneration OR B) Macular edema following retinal vein
occlusion, OR C) Diabetic macular edema OR D) Diabetic retinopathy in
patients with diabetic macular edema.

及

我试过这样的方法：

\n\n[\s\S]*?[.][\n\n]

但是我想说的是\n\n不可能存在于

[\s\S]*?

我该怎么做

谢谢

您可以使用以下任一正则表达式来完成此操作

选择1 此选项使用

re.DOTALL

工作原理：

```
（？：\A |\n{2}）
```
匹配以下任一项：
- ```
\A
```
  在字符串开头断言位置（不同于
```
^
```
  -在行的开头断言位置）
- ```
\n{2}
```
  匹配两个连续的换行符
```
（？：（？！\n{2}）。+
```
匹配任何字符，但无法匹配两个连续的换行符
```
\。
```
按字面意思匹配
```
（？=\n{2}|\Z）
```
前瞻匹配以下任一项（断言匹配项后面的内容，但不在结果中包含匹配项）：
- ```
\n{2}
```
  匹配两个连续的换行符
- ```
\Z
```
  与
```
\A
```
  相反-在字符串末尾断言位置（不同于
```
$
```
  -在行末尾断言位置）

选择2 此选项比选项1效率更高——使用的步骤减少约22%

它的工作原理（大部分内容与前面相同，因此我只解释区别）：

```
（？：.|\n（？！\n））+
```
匹配任何字符（除了
```
\n
```
，因为
不匹配换行符），或者
```
\n
```
如果后面没有另一个
```
\n
```

选择3 这仅适用于PCRE或。这比上述其他选项更有效-比选项2少21%的步骤，比选项1少39%。此正则表达式使用

re.DOTALL

选项

工作原理（同样，基本相同，只是解释了区别）：

```
（？：\n{2}（*SKIP）（*FAIL）|）+？
```
匹配以下一次或多次，但尽可能少（
```
+？
```
-惰性量词）
- ```
\n{2}（*SKIP）（*FAIL）
```
  匹配两个连续的换行符，然后使其失败（
```
（*SKIP）（*FAIL）
```
  就像魔法一样防止正则表达式回溯到其当前位置，然后使当前匹配失败。简单地说，这将跳过所有匹配到
```
（*SKIP）
```
  左侧的字符（直到并包括
```
\n\n
```
  ），然后在该位置之后继续模式匹配（有关更多信息，请参阅）

这里有一个不需要任何模块的简单解决方案：

doc = '...'

ps = '\n\n'.join([p for p in d.split('\n\n') if not p.endswith('.')])

ps = '\n\n'.join([p for p in d.split('\n\n') if not p.endswith('.') and p.strip()])

如果您希望它更整洁：

doc = '...'

ps = '\n\n'.join([p for p in d.split('\n\n') if not p.endswith('.')])

ps = '\n\n'.join([p for p in d.split('\n\n') if not p.endswith('.') and p.strip()])

（（.+\n）*（.\.\n））

（.+\n）

（.+\n）*

（（.+\n）*（..\.\n））

（.+？）\.

re.DOTALL

（？：\a |\n{2}）（？：（？：！\n{2}）。+（？：！\n{

re.DOTALL

doc = '...'

ps = '\n\n'.join([p for p in d.split('\n\n') if not p.endswith('.')])

ps = '\n\n'.join([p for p in d.split('\n\n') if not p.endswith('.') and p.strip()])