Python 获取可变数量组的正则表达式_Python_Regex_Lookahead

Python 获取可变数量组的正则表达式

python regex

Python 获取可变数量组的正则表达式,python,regex,lookahead,Python,Regex,Lookahead,这不是一个询问如何使用re.findall（）或全局修饰符（？g）或\g的问题。这是询问如何将n组与一个正则表达式匹配，其中n介于3和5之间规则：需要忽略第一个非空格字符为#的行（注释）需要至少获取三项，始终按顺序：ITEM1、ITEM2、ITEM3 class ITEM1（stuff） model=ITEM2 字段=（项目3）如果存在以下匹配项，则需要获取它们（顺序未知，可能会丢失） write\u once\u字段=（ITEM4）必填字段=（项目5）需要知道哪个匹配项是

这不是一个询问如何使用

re.findall（）

或全局修饰符

（？g）

或

\g

的问题。这是询问如何将

组与一个正则表达式匹配，其中

介于3和5之间

规则：

需要忽略第一个非空格字符为
```
#
```
的行（注释）

需要至少获取三项，始终按顺序：

ITEM1

、

ITEM2

、

ITEM3

```
class ITEM1（stuff）
```
```
model=ITEM2
```
```
字段=（项目3）
```

如果存在以下匹配项，则需要获取它们（顺序未知，可能会丢失）
- ```
write\u once\u字段=（ITEM4）
```
- ```
必填字段=（项目5）
```
需要知道哪个匹配项是哪个，所以可以按顺序检索匹配项，如果没有匹配项，则返回
```
None
```
，或者检索对

我的问题是这是否可行，以及如何做到

我已经做了这么多，但是它没有处理注释或未知顺序，或者如果某些项丢失，并且在看到下一个

类定义时停止搜索这个特定的正则表达式
我需要有条件的回答吗
谢谢你的任何提示。
我会做一些类似的事情：
from collections import defaultdict
import re

comment_line = re.compile(r"\s*#")
matches = defaultdict(dict)

with open('path/to/file.txt') as inf:
    d = {} # should catch and dispose of any matching lines
           # not related to a class
    for line in inf:
        if comment_line.match(line):
            continue # skip this line
        if line.startswith('class '):
            classname = line.split()[1]
            d = matches[classname]
        if line.startswith('model'):
            d['model'] = line.split('=')[1].strip()
        if line.startswith('fields'):
            d['fields'] = line.split('=')[1].strip()
        if line.startswith('write_once_fields'):
            d['write_once_fields'] = line.split('=')[1].strip()
        if line.startswith('required_fields'):
            d['required_fields'] = line.split('=')[1].strip()

使用正则表达式匹配可能更容易做到这一点
comment_line = re.compile(r"\s*#")
class_line = re.compile(r"class (?P<classname>)")
possible_keys = ["model", "fields", "write_once_fields", "required_fields"]
data_line = re.compile(r"\s*(?P<key>" + "|".join(possible_keys) +
                       r")\s+=\s+(?P<value>.*)")

with open( ...
    d = {} # default catcher as above
    for line in ...
       if comment_line.match(line):
           continue
       class_match = class_line.match(line)
       if class_match:
           d = matches[class_match.group('classname')]
           continue # there won't be more than one match per line
       data_match = data_line.match(line)
       if data_match:
           key,value = data_match.group('key'), data_match.group('value')
           d[key] = value

comment\u line=re.compile（r“\s*#”）
class_line=re.compile（r“class（？P）”）
可能的_键=[“模型”、“字段”、“一次写入_字段”、“必填_字段”]
data_line=re.compile（r“\s*（？P“+”|“）。join（可能的_键）+
r“\s+=\s+（？P.*）”
用开放式（。。。
d={}#如上所述的默认捕捉器
在…的行中。。。
如果注释与行匹配（行）：
持续
类匹配=类匹配行。匹配（行）
如果class_匹配：
d=匹配[class_match.group（'classname'）]
继续#每条线的比赛不会超过一场
数据匹配=数据行。匹配（行）
如果数据不匹配：
key，value=data\u match.group（'key'），data\u match.group（'value'））
d[键]=值

但这可能更难理解。YMMV。这似乎不适合正则表达式。您应该改为解析它。@AdamSmith有没有办法在没有正则表达式的情况下解析它？AdamSmith是对的，只需循环文件中的行，如果它们以“#”开头，则跳过，然后选择一个函数根据它的第一个字来解析该行。您可以构建lis然后验证结果。
comment_line = re.compile(r"\s*#")
class_line = re.compile(r"class (?P<classname>)")
possible_keys = ["model", "fields", "write_once_fields", "required_fields"]
data_line = re.compile(r"\s*(?P<key>" + "|".join(possible_keys) +
                       r")\s+=\s+(?P<value>.*)")

with open( ...
    d = {} # default catcher as above
    for line in ...
       if comment_line.match(line):
           continue
       class_match = class_line.match(line)
       if class_match:
           d = matches[class_match.group('classname')]
           continue # there won't be more than one match per line
       data_match = data_line.match(line)
       if data_match:
           key,value = data_match.group('key'), data_match.group('value')
           d[key] = value