Python 提取标记及其内容的正则表达式模式_Python_Regex

Python 提取标记及其内容的正则表达式模式

python regex

Python 提取标记及其内容的正则表达式模式,python,regex,Python,Regex,考虑到这一点： input = """Yesterday<person>Peter</person>drove to<location>New York</location>""" 这很好，但我不想硬编码标签，它们可以更改： print re.findall("<person>(.*?)</person>", input) print re.findall("<location>(.*?)</locati

考虑到这一点：

input = """Yesterday<person>Peter</person>drove to<location>New York</location>"""

这很好，但我不想硬编码标签，它们可以更改：

print re.findall("<person>(.*?)</person>", input)
print re.findall("<location>(.*?)</location>", input)

print re.findall（（*？），输入）
打印关于findall（（.*），输入）

使用专为工作设计的工具。我碰巧喜欢lxml，但他们的是另一个

>>> minput = """Yesterday<person>Peter Smith</person>drove to<location>New York</location>"""
>>> from lxml import html
>>> tree = html.fromstring(minput)
>>> for e in tree.iter():
        print e, e.tag, e.text_content()
        if e.tag() == 'person':          # getting the last name per comment
           last = e.text_content().split()[-1]
           print last


<Element p at 0x3118ca8> p YesterdayPeter Smithdrove toNew York
<Element person at 0x3118b48> person Peter Smith
Smith                                            # here is the last name
<Element location at 0x3118ba0> location New York

>>明普特=“昨天彼得·史密斯开车去了纽约”
>>>从lxml导入html
>>>tree=html.fromstring（minput）
>>>对于树中的e.iter（）：
打印e、e.标签、e.文本内容（）
如果e.tag（）=“person”：#获取每条评论的姓氏
last=e.text_content（）.split（）[-1]
最后打印
昨天彼得·史密斯开车去纽约
人彼得·史密斯
史密斯：这是你的姓
地点：纽约

如果您是Python新手，那么您可能希望访问此站点以获取包括LXML在内的许多软件包的安装程序。

避免使用正则表达式解析HTML，而是使用HTML解析器

下面是一个示例，使用：

请注意代码的简单性

希望这能有所帮助。

您正危险地接近@DevEx请查看答案中的修改谢谢@PyNEwbie在“Peter Smith”的情况下如何使用“text_content（）”只提取“Smith”？您不能，但一旦拥有字符串，您就可以拆分它。@PyNEwbie，有没有类似的方法可以用来获取不在标记中的单词：

“昨天”和“开车到”

？确切地说，我想做一些类似的事情：昨天的人史密斯开车去了纽约。感谢@PyNEwbie帮助lxml解析树

>>> minput = """Yesterday<person>Peter Smith</person>drove to<location>New York</location>"""
>>> from lxml import html
>>> tree = html.fromstring(minput)
>>> for e in tree.iter():
        print e, e.tag, e.text_content()
        if e.tag() == 'person':          # getting the last name per comment
           last = e.text_content().split()[-1]
           print last


<Element p at 0x3118ca8> p YesterdayPeter Smithdrove toNew York
<Element person at 0x3118b48> person Peter Smith
Smith                                            # here is the last name
<Element location at 0x3118ba0> location New York

from bs4 import BeautifulSoup    

data = "Yesterday<person>Peter</person>drove to<location>New York</location>"
soup = BeautifulSoup(data)

print 'person: %s' % soup.person.text
print 'location: %s' % soup.location.text

person: Peter
location: New York