Python 正则表达式,用于提取单词后和特殊字符前的文本,并排除所有其他数字字符

Python 正则表达式,用于提取单词后和特殊字符前的文本,并排除所有其他数字字符,python,regex,Python,Regex,我试图为给定的示例文本编写一个正则表达式 Section 2.1. 1.1.14. Minimum Rent Schedule (subiect to adjustment, if applicable):less than or greater than twelve (12) full calendar months (and such proration or adjustment being based upon the actual number of days in such Lea

我试图为给定的示例文本编写一个正则表达式

Section 2.1. 1.1.14. Minimum Rent Schedule (subiect to adjustment, if applicable):less than or greater than twelve (12) full calendar months (and such proration or adjustment being based upon the actual number of days in such Lease Year)
期望输出

Minimum Rent Schedule (subiect to adjustment, if applicable)
单词
'Section'
和upuntill特殊字符
':'
之间的所有内容。但就像这里一样,我不想用数字捕捉任何东西

到目前为止,我一直在尝试的是

[Section]+.*[:]
这是一种模式

Ex:

import re

s = "Section 2.1. 1.1.14. Minimum Rent Schedule (subiect to adjustment, if applicable):less than or greater than twelve (12) full calendar months (and such proration or adjustment being based upon the actual number of days in such Lease Year)"
print(re.match(r"Section[\d.\s]+(.*?):", s).group(1))
Minimum Rent Schedule (subiect to adjustment, if applicable)
print(re.findall(r"Section[\d.\s]+(.*?):", your_text))
输出:

import re

s = "Section 2.1. 1.1.14. Minimum Rent Schedule (subiect to adjustment, if applicable):less than or greater than twelve (12) full calendar months (and such proration or adjustment being based upon the actual number of days in such Lease Year)"
print(re.match(r"Section[\d.\s]+(.*?):", s).group(1))
Minimum Rent Schedule (subiect to adjustment, if applicable)
print(re.findall(r"Section[\d.\s]+(.*?):", your_text))

如果有多个元素,请使用
re.findall

Ex:

import re

s = "Section 2.1. 1.1.14. Minimum Rent Schedule (subiect to adjustment, if applicable):less than or greater than twelve (12) full calendar months (and such proration or adjustment being based upon the actual number of days in such Lease Year)"
print(re.match(r"Section[\d.\s]+(.*?):", s).group(1))
Minimum Rent Schedule (subiect to adjustment, if applicable)
print(re.findall(r"Section[\d.\s]+(.*?):", your_text))

您尝试的模式使用将与列出的任何字符匹配1+次的

若要不匹配
部分
之后包含数字的任何内容,可以重复0+次,以匹配后跟至少包含一个数字的非空白字符的空格

捕获组中不包含数字的内容

Section (?:[^\s\d]*\d\S* )*([^:]+):
解释

  • 匹配节和空格
  • (?:
    非捕获组
    • [^\s\d]*
      使用
    • \d\S*
      然后匹配一个数字,然后匹配0+乘以非空白字符
  • )*
    关闭分组并重复0多次
  • ([^::]+):
    在组1中捕获与除
    以外的任何字符1+倍匹配的字符,然后匹配

比如说

import re

regex = r"Section (?:[^\s\d]*\d\S* )*([^:]+):"
s = "Section 2.1. 1.1.14. Minimum Rent Schedule (subiect to adjustment, if applicable):less than or greater than twelve (12) full calendar months (and such proration or adjustment being based upon the actual number of days in such Lease Year)"
print(re.match(regex, s).group(1))
结果

最低租金表(调整的子条款,如适用)

要查找多个,可以使用re.findall:

print(re.findall(regex, s))

您的需求是否可能包含较大的文本,如果是,您是否可以包含该文本的示例数据?