Python 正则表达式多重表达式_Python_Regex

Python 正则表达式多重表达式

python regex

Python 正则表达式多重表达式,python,regex,Python,Regex,我有以下结构： <ins rev="REV-NEU" editindex="0"> <insacc rev="c3ce7877-42bf-4c41-b3c0-fd225ccaf512">eins</insacc> <insacc rev="c3ce7877-42bf-4c41-b3c0-fd225ccaf512">zwei</insacc> <insacc rev="c3ce7877-42bf-4c41-b

我有以下结构：

<ins rev="REV-NEU" editindex="0">
    <insacc rev="c3ce7877-42bf-4c41-b3c0-fd225ccaf512">eins</insacc>
    <insacc rev="c3ce7877-42bf-4c41-b3c0-fd225ccaf512">zwei</insacc>
    <insacc rev="c3ce7877-42bf-4c41-b3c0-fd225ccaf512">drei</insacc>
<insacc rev="c3ce7877-42bf-4c41-b3c0-fd225ccaf512">vier</insacc>
</ins> 
<del rev="REV-NEU" editindex="1">eins</del> 
<insacc rev="c3ce7877-42bf-4c41-b3c0-fd225ccaf512">fünf</insacc>


埃因斯
茨威
德雷
维尔
埃因斯
富纳

对于正则表达式，我希望将ins标记与内部的多个insacc标记（可以是1或20）相匹配

我尝试使用以下正则表达式，但它只匹配最后一个insacc：

<ins rev="[^<]+" editindex="[^<]+">(<(insacc|deldec) rev="[^<]+">([^<]+)</(insacc|deldec)>)+</ins>

（[^对此，您应该使用lxml
）
from lxml import etree
xml = etree.fromstring(xml_string)
ins_tags = xml.xpath('//ins[./insacc]')
for ins_tag in ins_tags:
    # do work

这不是很简单吗？
您应该使用lxml
来实现这一点
from lxml import etree
xml = etree.fromstring(xml_string)
ins_tags = xml.xpath('//ins[./insacc]')
for ins_tag in ins_tags:
    # do work

是不是很简单？
请务必使用lxml
或Beautiful Soup
（）。正则表达式不能真正执行您想要的操作，因为组计数是固定的。以下是详细信息：和。
请务必使用lxml
或Beautiful Soup
（）.正则表达式无法真正执行您想要的操作，因为组计数是固定的。以下是详细信息：和。
我不希望您能够可靠或轻松地使用正则表达式执行此操作：
# -*- coding: utf 8 -*- 

import xml.etree.ElementTree as et

xml='''\
<data>
<ins rev="REV-NEU" editindex="0">
    <insacc rev="c3ce7877-42bf-4c41-b3c0-fd225ccaf512">eins</insacc>
    <insacc rev="c3ce7877-42bf-4c41-b3c0-fd225ccaf512">zwei</insacc>
    <insacc rev="c3ce7877-42bf-4c41-b3c0-fd225ccaf512">drei</insacc>
<insacc rev="c3ce7877-42bf-4c41-b3c0-fd225ccaf512">vier</insacc>
</ins> 
<del rev="REV-NEU" editindex="1">eins</del> 
<insacc rev="c3ce7877-42bf-4c41-b3c0-fd225ccaf512">fünf</insacc>
</data>'''      

for child in et.fromstring(xml).iter():
    print child.tag, child.attrib, child.text

如果只需要/ins/insacc
，请使用xpath:
for child in et.fromstring(xml).findall('./ins/insacc'):
    print child.tag, child.attrib, child.text

印刷品：
data {} 

ins {'editindex': '0', 'rev': 'REV-NEU'} 

insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} eins
insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} zwei
insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} drei
insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} vier
del {'editindex': '1', 'rev': 'REV-NEU'} eins
insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} fünf

insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} eins
insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} zwei
insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} drei
insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} vier

如果您希望所有insacc
都位于根目录下：
for child in et.fromstring(xml).iter():
    if child.tag=='insacc':
       print child.tag, child.attrib, child.text

insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} eins
insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} zwei
insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} drei
insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} vier
insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} fünf

我不相信你能用正则表达式可靠或轻松地做到这一点：
# -*- coding: utf 8 -*- 

import xml.etree.ElementTree as et

xml='''\
<data>
<ins rev="REV-NEU" editindex="0">
    <insacc rev="c3ce7877-42bf-4c41-b3c0-fd225ccaf512">eins</insacc>
    <insacc rev="c3ce7877-42bf-4c41-b3c0-fd225ccaf512">zwei</insacc>
    <insacc rev="c3ce7877-42bf-4c41-b3c0-fd225ccaf512">drei</insacc>
<insacc rev="c3ce7877-42bf-4c41-b3c0-fd225ccaf512">vier</insacc>
</ins> 
<del rev="REV-NEU" editindex="1">eins</del> 
<insacc rev="c3ce7877-42bf-4c41-b3c0-fd225ccaf512">fünf</insacc>
</data>'''      

for child in et.fromstring(xml).iter():
    print child.tag, child.attrib, child.text

如果只需要/ins/insacc
，请使用xpath:
for child in et.fromstring(xml).findall('./ins/insacc'):
    print child.tag, child.attrib, child.text

印刷品：
data {} 

ins {'editindex': '0', 'rev': 'REV-NEU'} 

insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} eins
insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} zwei
insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} drei
insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} vier
del {'editindex': '1', 'rev': 'REV-NEU'} eins
insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} fünf

insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} eins
insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} zwei
insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} drei
insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} vier

如果您希望所有insacc
都位于根目录下：
for child in et.fromstring(xml).iter():
    if child.tag=='insacc':
       print child.tag, child.attrib, child.text

insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} eins
insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} zwei
insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} drei
insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} vier
insacc {'rev': 'c3ce7877-42bf-4c41-b3c0-fd225ccaf512'} fünf

为什么不使用标准库中的XML解析器
，比如XML.etree.ElementTree
？有些人在遇到问题时会想“我知道，我会使用正则表达式。”现在他们有两个问题。
为什么不使用标准库中的XML.etree.ElementTree
之类的XML解析器呢？有些人在遇到问题时会想“我知道，我会使用正则表达式。”现在他们有两个问题。
muuuch比我能想到的任何正则表达式都要干净