Python BeautifulSoup：如何从html字符串中查找所有关于属性_Python_Beautifulsoup

Python BeautifulSoup：如何从html字符串中查找所有关于属性

python

Python BeautifulSoup：如何从html字符串中查找所有关于属性,python,beautifulsoup,Python,Beautifulsoup,在文本文件中，这些项目具有相同的结构，我想用BeautifulSoup解析它摘录： data = """<text id="1" sig="prenatfra-camppres-2017-part01-viewEvent-1&docRefId-0&docName-news%C2%B720170425%C2%B7LC%C2%B7assignment_862852&docIndex-3_1" tit

在文本文件中，这些项目具有相同的结构，我想用BeautifulSoup解析它

摘录：

data = """<text id="1" sig="prenatfra-camppres-2017-part01-viewEvent-1&docRefId-0&docName-news%C2%B720170425%C2%B7LC%C2%B7assignment_862852&docIndex-3_1" title="Éditorial élection présidentielle" author="NULL" year="2017" date="25/04/2017" section="NULL" sourcename="La Croix" sourcesig="LC" polarity="Positif" about="Le Pen|Macron">
<p type="title">Éditorial élection présidentielle</p>
</text>"""

soup = BeautifulSoup(data, 'html.parser') #

也许我使用了错误的解析器？谢谢。顺致敬意，

Théo

如果仔细检查文档，它会查找具有指定名称的标记

因此，在本例中，您应该查找

文本

标记，然后从中检索

关于

属性

工作示例如下所示：

from bs4 import BeautifulSoup

data = """<text id="1" sig="prenatfra-camppres-2017-part01-viewEvent-1&docRefId-0&docName-news%C2%B720170425%C2%B7LC%C2%B7assignment_862852&docIndex-3_1" title="Éditorial élection présidentielle" author="NULL" year="2017" date="25/04/2017" section="NULL" sourcename="La Croix" sourcesig="LC" polarity="Positif" about="Le Pen|Macron">
<p type="title">Éditorial élection présidentielle</p>
</text>"""

soup = BeautifulSoup(data, 'html.parser')

# to get the 'about' attribute from the first text element
print(soup.find_all('text')[0]['about'])

# to get the 'about' attributes from all the text elements, as a list
print([text['about'] for text in soup.find_all('text')])

from bs4 import BeautifulSoup

data = """<text id="1" sig="prenatfra-camppres-2017-part01-viewEvent-1&docRefId-0&docName-news%C2%B720170425%C2%B7LC%C2%B7assignment_862852&docIndex-3_1" title="Éditorial élection présidentielle" author="NULL" year="2017" date="25/04/2017" section="NULL" sourcename="La Croix" sourcesig="LC" polarity="Positif" about="Le Pen|Macron">
<p type="title">Éditorial élection présidentielle</p>
</text>"""

soup = BeautifulSoup(data, 'html.parser')

# to get the 'about' attribute from the first text element
print(soup.find_all('text')[0]['about'])

# to get the 'about' attributes from all the text elements, as a list
print([text['about'] for text in soup.find_all('text')])

Le Pen|Macron
['Le Pen|Macron']