python中的基本正则表达式问题，帮助我学习它_Python_Regex

python中的基本正则表达式问题，帮助我学习它

python regex

python中的基本正则表达式问题，帮助我学习它,python,regex,Python,Regex,我用这个教程来学习python中的正则表达式-看起来是一个很好的教程因此，本教程如下所示：根据教程，我应该使用的代码是： import re p = re.compile(r'^(?P<Given>\w+) (?P<Middle>\w\.) (?P<Family>\w+)$', re.MULTILINE) str = "Jack A. Smith\nMary B. Miller" m = p.match(str) print m.group(0) Jac

我用这个教程来学习python中的正则表达式-看起来是一个很好的教程

因此，本教程如下所示：

根据教程，我应该使用的代码是：

import re
p = re.compile(r'^(?P<Given>\w+) (?P<Middle>\w\.) (?P<Family>\w+)$', re.MULTILINE)
str = "Jack A. Smith\nMary B. Miller"
m = p.match(str)
print m.group(0)
Jack A. Smith
print m.group(1)
Jack
print m.group(2)
A.
print m.group(3)
Smith
print m.group(4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: no such group

重新导入
p=re.compile（r'^（？p\w+）（？p\w+$），re.MULTILINE）
str=“Jack A.Smith\n Mary B.Miller”
m=p.match（str）
打印m.group（0）
杰克·A·史密斯
打印m.group（1）
杰克
打印m.group（2）
A.
打印m.group（3）
史密斯
打印m.group（4）
回溯（最近一次呼叫最后一次）：
文件“”，第1行，在
索引器：没有这样的组

令我惊讶的是，我失去了小玛丽·B·米勒——没有

m.group（4）

因此，我有几个后续问题：

（1）我正在使用多行，为什么它只匹配第一行，即示例中的Jack A.Smith

（2）我使用给定的、中间的和族作为每个匹配的标记名，我如何使用这些标记访问数据，而不仅仅是

m.group（I）

（3）让我们假设我想进行匹配和替换？也就是说，我想匹配Mary B.Miller，替换为Jane M.Goldstein，这样替换的字符串现在将是：

str=“Jack A.Smith\nJane M.Goldstein”

。我是怎么做到的？（有点不相关，让我们称之为奖金Q）

来自模块文档：

请注意，即使在多行模式下，re.match（）也只会在字符串的开头匹配，而不会在每行的开头匹配。

您可以使用re.findall或re.finditer查找所有匹配项：

>>> for match in p.finditer(str):
     ... print match.groups()

 ('Jack', 'A.', 'Smith')
 ('Mary', 'B.', 'Miller')

要使用组名而不是索引，可以指定已使用的组名：

>>> for match in p.finditer(str):
    ... print match.group('Given')

  Jack
  Mary

从模块的文档中：

请注意，即使在多行模式下，re.match（）也只会在字符串的开头匹配，而不会在每行的开头匹配。

您可以使用re.findall或re.finditer查找所有匹配项：

>>> for match in p.finditer(str):
     ... print match.groups()

 ('Jack', 'A.', 'Smith')
 ('Mary', 'B.', 'Miller')

要使用组名而不是索引，可以指定已使用的组名：

>>> for match in p.finditer(str):
    ... print match.group('Given')

  Jack
  Mary

抄袭

这就是为什么你只能得到第一场比赛。如果需要所有匹配项，请使用

将整个正则表达式包装在

（）

中，下面是一个示例：

p = re.compile(r'^((?P<Given>\w+) (?P<Middle>\w\.) (?P<Family>\w+))$', re.MULTILINE)
str = "Jack A. Smith\nMary B. Miller"
print re.findall(p, str)

p = re.compile(r'^(?P<FullName>(?P<Given>\w+) (?P<Middle>\w\.) (?P<Family>\w+))$', re.MULTILINE)
str = "Jack A. Smith\nMary B. Miller"
matches = re.finditer(p, str)
for match in matches:
    info = match.groupdict()  ## pulling out the match as dictionary
    print info
    print info['Family']

更新：：

关于你的问题2：用这个。例如：

p = re.compile(r'^((?P<Given>\w+) (?P<Middle>\w\.) (?P<Family>\w+))$', re.MULTILINE)
str = "Jack A. Smith\nMary B. Miller"
print re.findall(p, str)

p = re.compile(r'^(?P<FullName>(?P<Given>\w+) (?P<Middle>\w\.) (?P<Family>\w+))$', re.MULTILINE)
str = "Jack A. Smith\nMary B. Miller"
matches = re.finditer(p, str)
for match in matches:
    info = match.groupdict()  ## pulling out the match as dictionary
    print info
    print info['Family']

抄袭

这就是为什么你只能得到第一场比赛。如果需要所有匹配项，请使用

将整个正则表达式包装在

（）

中，下面是一个示例：

p = re.compile(r'^((?P<Given>\w+) (?P<Middle>\w\.) (?P<Family>\w+))$', re.MULTILINE)
str = "Jack A. Smith\nMary B. Miller"
print re.findall(p, str)

p = re.compile(r'^(?P<FullName>(?P<Given>\w+) (?P<Middle>\w\.) (?P<Family>\w+))$', re.MULTILINE)
str = "Jack A. Smith\nMary B. Miller"
matches = re.finditer(p, str)
for match in matches:
    info = match.groupdict()  ## pulling out the match as dictionary
    print info
    print info['Family']

更新：：

关于你的问题2：用这个。例如：

p = re.compile(r'^((?P<Given>\w+) (?P<Middle>\w\.) (?P<Family>\w+))$', re.MULTILINE)
str = "Jack A. Smith\nMary B. Miller"
print re.findall(p, str)

p = re.compile(r'^(?P<FullName>(?P<Given>\w+) (?P<Middle>\w\.) (?P<Family>\w+))$', re.MULTILINE)
str = "Jack A. Smith\nMary B. Miller"
matches = re.finditer(p, str)
for match in matches:
    info = match.groupdict()  ## pulling out the match as dictionary
    print info
    print info['Family']

我使用给定的、中间的和族作为每个匹配的标记名，如何使用这些标记访问数据，而不仅仅是m.group（I）

您可以使用

m.group（'Given'）、m.group（'Middle'）、m.group（'Family'）

让我们说我想做匹配和替换？也就是说，我想匹配Mary B.Miller，并替换为Jane M.Goldstein，这样替换的字符串现在将是：str=“Jack A.Smith\nJane M.Goldstein”。我是怎么做到的

据我所知，

re.sub（）

可用于搜索和替换

我使用给定的、中间的和族作为每个匹配的标记名，如何使用这些标记访问数据，而不仅仅是m.group（I）

您可以使用

m.group（'Given'）、m.group（'Middle'）、m.group（'Family'）

据我所知，

re.sub（）

可用于搜索和替换。

我想我会这样做：

import re

txt='''\
Jack A. Smith
Mary B. Miller
Jordan Brewster
Kathy Beth Turner'''

>>> [m.groups() for m in re.finditer(r'^(\w+)\s+(\w\.|\w*)\s*(\b\w+\b)$', txt, re.M)]
[('Jack', 'A.', 'Smith'), ('Mary', 'B.', 'Miller'), ('Jordan', '', 'Brewster'), ('Kathy', 'Beth', 'Turner')]

工作原理如下：

^(\w+)\s+(\w\.|\w*)\s*(\b\w+\b)$

这允许您使用可选的中间名或中间首字母来捕获名称。

我想我会这样做：

import re

txt='''\
Jack A. Smith
Mary B. Miller
Jordan Brewster
Kathy Beth Turner'''

>>> [m.groups() for m in re.finditer(r'^(\w+)\s+(\w\.|\w*)\s*(\b\w+\b)$', txt, re.M)]
[('Jack', 'A.', 'Smith'), ('Mary', 'B.', 'Miller'), ('Jordan', '', 'Brewster'), ('Kathy', 'Beth', 'Turner')]

工作原理如下：

^(\w+)\s+(\w\.|\w*)\s*(\b\w+\b)$

这使您可以捕获带有可选中间名或中间首字母的姓名。

非常感谢，非常有用，您是否介意对其他Q（2）和（3）使用给定的标记访问姓名，中间和家族？以及如何匹配和替换。。。如果不是没有烦恼，v为（1）！非常感谢，非常有帮助，你介意评论一下其他Q，（2）和（3）访问带有给定标签的名称，中间和家庭吗？以及如何匹配和替换。。。如果不是没有烦恼，v为（1）！不要将

str

用作变量名。您将屏蔽构建的INDO，而不是将

str

用作变量名。你将屏蔽内置的这可能是我见过的最好的正则表达式演示，喜欢他们的正则表达式的图形表示！这可能是我见过的最好的正则表达式演示，我很喜欢他们对正则表达式的图形表示！