Python BeautifulSoup:如何从html字符串中查找所有关于属性
在文本文件中,这些项目具有相同的结构,我想用BeautifulSoup解析它 摘录:Python BeautifulSoup:如何从html字符串中查找所有关于属性,python,beautifulsoup,Python,Beautifulsoup,在文本文件中,这些项目具有相同的结构,我想用BeautifulSoup解析它 摘录: data = """<text id="1" sig="prenatfra-camppres-2017-part01-viewEvent-1&docRefId-0&docName-news%C2%B720170425%C2%B7LC%C2%B7assignment_862852&docIndex-3_1" tit
data = """<text id="1" sig="prenatfra-camppres-2017-part01-viewEvent-1&docRefId-0&docName-news%C2%B720170425%C2%B7LC%C2%B7assignment_862852&docIndex-3_1" title="Éditorial élection présidentielle" author="NULL" year="2017" date="25/04/2017" section="NULL" sourcename="La Croix" sourcesig="LC" polarity="Positif" about="Le Pen|Macron">
<p type="title">Éditorial élection présidentielle</p>
</text>"""
soup = BeautifulSoup(data, 'html.parser') #
也许我使用了错误的解析器?
谢谢。
顺致敬意,
Théo如果仔细检查文档,它会查找具有指定名称的标记 因此,在本例中,您应该查找
文本
标记,然后从中检索关于
属性
工作示例如下所示:
from bs4 import BeautifulSoup
data = """<text id="1" sig="prenatfra-camppres-2017-part01-viewEvent-1&docRefId-0&docName-news%C2%B720170425%C2%B7LC%C2%B7assignment_862852&docIndex-3_1" title="Éditorial élection présidentielle" author="NULL" year="2017" date="25/04/2017" section="NULL" sourcename="La Croix" sourcesig="LC" polarity="Positif" about="Le Pen|Macron">
<p type="title">Éditorial élection présidentielle</p>
</text>"""
soup = BeautifulSoup(data, 'html.parser')
# to get the 'about' attribute from the first text element
print(soup.find_all('text')[0]['about'])
# to get the 'about' attributes from all the text elements, as a list
print([text['about'] for text in soup.find_all('text')])
from bs4 import BeautifulSoup
data = """<text id="1" sig="prenatfra-camppres-2017-part01-viewEvent-1&docRefId-0&docName-news%C2%B720170425%C2%B7LC%C2%B7assignment_862852&docIndex-3_1" title="Éditorial élection présidentielle" author="NULL" year="2017" date="25/04/2017" section="NULL" sourcename="La Croix" sourcesig="LC" polarity="Positif" about="Le Pen|Macron">
<p type="title">Éditorial élection présidentielle</p>
</text>"""
soup = BeautifulSoup(data, 'html.parser')
# to get the 'about' attribute from the first text element
print(soup.find_all('text')[0]['about'])
# to get the 'about' attributes from all the text elements, as a list
print([text['about'] for text in soup.find_all('text')])
Le Pen|Macron
['Le Pen|Macron']