Python正则表达式，what'；这是怎么回事？_Python_Regex

Python正则表达式，what'；这是怎么回事？

python regex

Python正则表达式，what'；这是怎么回事？,python,regex,Python,Regex,我最近有一本关于python的书，其中有一章是关于Regex的，有一段代码我真的不懂。有人能解释一下这里到底发生了什么（这部分是关于正则表达式组的） >>my_regex=r'（？PZip:\s*\d\d\d\d\d）\s*（状态：\s*\w\w）' >>>addrs=“Zip:10010州：纽约” >>>y=重新搜索（my_regex，addrs） >>>y.groupdict（'zip'）） {'zip'：'zip:10010'} >>>y组（2） '州：纽约' search方法将返回一个包

我最近有一本关于python的书，其中有一章是关于Regex的，有一段代码我真的不懂。有人能解释一下这里到底发生了什么（这部分是关于正则表达式组的）

>>my_regex=r'（？PZip:\s*\d\d\d\d\d）\s*（状态：\s*\w\w）'
>>>addrs=“Zip:10010州：纽约”
>>>y=重新搜索（my_regex，addrs）
>>>y.groupdict（'zip'））
{'zip'：'zip:10010'}
>>>y组（2）
'州：纽约'

search方法将返回一个包含正则表达式模式结果的对象

groupdict返回组的字典，其中键是由（？p…）定义的组的名称。此处的名称是组的名称

group返回匹配的组列表。“纽约州”是你的第三组。第一个是整个字符串，第二个是“Zip:10010”

顺便说一下，这是一个相对简单的问题。我只是在谷歌上查了一下方法文档，找到了。谷歌是你的朋友。

正则表达式定义：

(?P<zip>...)

匹配“Zip:”和零个或多个空格字符

\d

匹配数字

\w

匹配一个单词字符[a-Za-z0-9]

y.groupdict('zip')

groupdict方法返回一个字典，其中命名组作为键，它们的匹配项作为值。在本例中，将返回“zip”组的匹配项

y.group(2)

返回第二个组的匹配项，该组为未命名组“（…）”

希望能有所帮助。

语法是Python实现命名捕获组的方法。这样，您就可以使用名称而不仅仅是序列号来访问由

match

匹配的内容

由于第一组括号名为

zip

，因此可以使用匹配的

groupdict

方法访问其匹配项，以获得

{identifier:match}

对。或者，如果您只对匹配感兴趣，可以使用

y.group（'zip'）

（这通常是有意义的，因为您已经知道了标识符）。您还可以使用序列号（1）访问相同的匹配项。下一个匹配项是未命名的，因此访问它的唯一方法是它的号码。

#my_regex=r'添加到先前的答案中：在我看来，你最好选择一种类型的组（命名或未命名），并坚持使用它。通常我使用命名组。例如：

# my_regex = r' <= this means that the string is a raw string, normally you'd need to use double backslashes
# ( ... ) this groups something
# ? this means that the previous bit was optional, why it's just after a group bracket I know not
# * this means "as many of as you can find"
# \s is whitespace
# \d is a digit, also works with [0-9]
# \w is an alphanumeric character
my_regex = r'(?P<zip>Zip:\s*\d\d\d\d\d)\s*(State:\s*\w\w)'
addrs = "Zip: 10010 State: NY"

# Runs the grep on the string
y = re.search(my_regex, addrs)

>>> my_regex = r'(?P<zip>Zip:\s*\d\d\d\d\d)\s*(?P<state>State:\s*\w\w)'
>>> addrs = "Zip: 10010 State: NY"
>>> y = re.search(my_regex, addrs)
>>> print y.groupdict()
{'state': 'State: NY', 'zip': 'Zip: 10010'}

>>my_regex=r'（？PZip:\s*\d\d\d\d\d）\s*（？PState:\s*\w\w）'
>>>addrs=“Zip:10010州：纽约”
>>>y=重新搜索（my_regex，addrs）
>>>打印y.groupdict（）
{'state'：'state:NY'，'zip'：'zip:10010'}

是你的朋友：

编辑：为什么要在实际评论中把整行链接起来，而不是预览？

你不明白哪一部分？一般来说是正则表达式，或者python是如何提取“zip”组和第二个（未命名）组的？在问题中添加更多的细节会让你得到更好、更有针对性的答案。那么，这是否意味着它只是创建了一个名为zip的组，该组执行“zip:\s*\d\d\d\d\d\d\d\s*（状态：\s*\w\w）”中其他行的状态，然后它的其余部分创建了一个名为groupdict的dict，其中包含zip和我认为我得到的状态：）

# my_regex = r' <= this means that the string is a raw string, normally you'd need to use double backslashes
# ( ... ) this groups something
# ? this means that the previous bit was optional, why it's just after a group bracket I know not
# * this means "as many of as you can find"
# \s is whitespace
# \d is a digit, also works with [0-9]
# \w is an alphanumeric character
my_regex = r'(?P<zip>Zip:\s*\d\d\d\d\d)\s*(State:\s*\w\w)'
addrs = "Zip: 10010 State: NY"

# Runs the grep on the string
y = re.search(my_regex, addrs)

>>> my_regex = r'(?P<zip>Zip:\s*\d\d\d\d\d)\s*(?P<state>State:\s*\w\w)'
>>> addrs = "Zip: 10010 State: NY"
>>> y = re.search(my_regex, addrs)
>>> print y.groupdict()
{'state': 'State: NY', 'zip': 'Zip: 10010'}