python-re-can'；找不到此分组名称_Python_Regex_Named

python-re-can'；找不到此分组名称

python regex

python-re-can'；找不到此分组名称,python,regex,named,Python,Regex,Named,我试图就纸质参考资料的格式提出建议。例如，学术论文的格式为： author. dissertation name[D]. place where store it: organization who hold the copy, year in which the dissertation published. 显然，除了年份之外，每个项目中都可能有一些标点符号。比如说 Smith. The paper name. The subtitle of paper[D]. United States

我试图就纸质参考资料的格式提出建议。例如，学术论文的格式为：

author. dissertation name[D]. place where store it: organization who hold the copy, year in which the dissertation published.

显然，除了年份之外，每个项目中都可能有一些标点符号。比如说

Smith. The paper name. The subtitle of paper[D]. United States: MIT, 2011

Smith. The paper name. The subtitle of paper[D]. US, 2011
Smith. The paper name. The subtitle of paper[D]. US: MIT

例如，

存储位置

和

年份

通常会丢失

Smith. The paper name. The subtitle of paper[D]. United States: MIT, 2011

Smith. The paper name. The subtitle of paper[D]. US, 2011
Smith. The paper name. The subtitle of paper[D]. US: MIT

我想这样编程：

import re
reObj = re.compile(
r'.*\[D\]\.  \s*  ((?P<PLACE>[^:]*):){0,1} \s*   (?P<HOLDER>[^:]*)   (?P<YEAR>,\s*(1|2)\d{3}){0,1}',
re.VERBOSE
)

txt = '''Smith. The paper name. The subtitle of paper[D]. US: MIT, 2011
Smith. The paper name. The subtitle of paper[D]. US, 2011
Smith. The paper name. The subtitle of paper[D]. US: MIT'''.split('\n')

for i in txt:
    if reObj.search(i):
        if reObj.search(i).group('PLACE')==None:
            print('missing place')

        if reObj.search(i).group('YEAR')==None:
            print('missing year')
    else:
        print('bad formation')

输出

Smith. The paper name. The subtitle of paper[D]. US: MIT, 2011
MIT, 2011
Smith. The paper name. The subtitle of paper[D]. US, 2011
US, 2011
Smith. The paper name. The subtitle of paper[D]. US: MIT
MIT

for i in txt:
    print(i)
    print(reObj.search(i).group('YEAR'))

Smith. The paper name. The subtitle of paper[D]. US: MIT, 2011
None
Smith. The paper name. The subtitle of paper[D]. US, 2011
None
Smith. The paper name. The subtitle of paper[D]. US: MIT
None

那么，为什么我的命名组失败，以及如何修复它？谢谢

我觉得你可以用

reObj = re.compile("""
    \[D\]\.  \s*            # [D]. and 0+ whitespaces
    (?:                     # An  optional alternation group
     (?P<PLACE>[^,:]*)      # Group "PLACE": 0+ chars other than , and :
       (?:                           # An optional sequence of
          : \s* (?P<HOLDER>[^,:]*)   # :, 0+ whitespaces, Group "HOLDER" (0+ non-colons and non-commas)
        )?
        (?:                          # An optional sequence of
          ,\s* (?P<YEAR>[12]\d{3})   # , + 0+ whitespaces, Group "YEAR" (1 or 2 and then three digits
        )?                       
    )?      
    $          # end of string
    """, flags=re.X)

（？P[^:]*）

可能应该是

（？P[^，]*）

-您希望匹配到下一个逗号（或文本结尾），而不是下一个冒号。@oyster很高兴它对您有效。如果我的回答对你有帮助（请参阅），也请考虑一下投票，因为在达到15个代表点之后，你有权获得优先权。还有一个问题，如果持有者可以使用任何字符和标点符号。换句话说，HOLDER字符串会一直扩展到遇到年份，如果出现年份，它必须是最后一项。所以史密斯。报纸的名字。论文的副标题[D]。美国：麻省理工学院，2011年1月1日返回

MIT，1月1日

担任持有人，以及

Smith。报纸的名字。论文的副标题[D]。美国：麻省理工学院，1998年大楼，2011年

返回作为持有人的麻省理工学院，1998年大楼。在这两种情况下，2011年都被解读为年份。我曾尝试将

用于HOLDER，但显然这是错误的。由于英语不是我的母语，我找不到这方面的行话behavior@oyster看见将

[^，：]*

替换为

*？