Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/jsf/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python BeautifulSoup:如何从html字符串中查找所有关于属性_Python_Beautifulsoup - Fatal编程技术网

Python BeautifulSoup:如何从html字符串中查找所有关于属性

Python BeautifulSoup:如何从html字符串中查找所有关于属性,python,beautifulsoup,Python,Beautifulsoup,在文本文件中,这些项目具有相同的结构,我想用BeautifulSoup解析它 摘录: data = """<text id="1" sig="prenatfra-camppres-2017-part01-viewEvent-1&docRefId-0&docName-news%C2%B720170425%C2%B7LC%C2%B7assignment_862852&docIndex-3_1" tit

在文本文件中,这些项目具有相同的结构,我想用BeautifulSoup解析它

摘录:

data = """<text id="1" sig="prenatfra-camppres-2017-part01-viewEvent-1&docRefId-0&docName-news%C2%B720170425%C2%B7LC%C2%B7assignment_862852&docIndex-3_1" title="Éditorial élection présidentielle" author="NULL" year="2017" date="25/04/2017" section="NULL" sourcename="La Croix" sourcesig="LC" polarity="Positif" about="Le Pen|Macron">
<p type="title">Éditorial élection présidentielle</p>
</text>"""

soup = BeautifulSoup(data, 'html.parser') # 
也许我使用了错误的解析器? 谢谢。 顺致敬意,
Théo

如果仔细检查文档,它会查找具有指定名称的标记

因此,在本例中,您应该查找
文本
标记,然后从中检索
关于
属性

工作示例如下所示:

from bs4 import BeautifulSoup

data = """<text id="1" sig="prenatfra-camppres-2017-part01-viewEvent-1&docRefId-0&docName-news%C2%B720170425%C2%B7LC%C2%B7assignment_862852&docIndex-3_1" title="Éditorial élection présidentielle" author="NULL" year="2017" date="25/04/2017" section="NULL" sourcename="La Croix" sourcesig="LC" polarity="Positif" about="Le Pen|Macron">
<p type="title">Éditorial élection présidentielle</p>
</text>"""

soup = BeautifulSoup(data, 'html.parser')

# to get the 'about' attribute from the first text element
print(soup.find_all('text')[0]['about'])

# to get the 'about' attributes from all the text elements, as a list
print([text['about'] for text in soup.find_all('text')])
from bs4 import BeautifulSoup

data = """<text id="1" sig="prenatfra-camppres-2017-part01-viewEvent-1&docRefId-0&docName-news%C2%B720170425%C2%B7LC%C2%B7assignment_862852&docIndex-3_1" title="Éditorial élection présidentielle" author="NULL" year="2017" date="25/04/2017" section="NULL" sourcename="La Croix" sourcesig="LC" polarity="Positif" about="Le Pen|Macron">
<p type="title">Éditorial élection présidentielle</p>
</text>"""

soup = BeautifulSoup(data, 'html.parser')

# to get the 'about' attribute from the first text element
print(soup.find_all('text')[0]['about'])

# to get the 'about' attributes from all the text elements, as a list
print([text['about'] for text in soup.find_all('text')])
Le Pen|Macron
['Le Pen|Macron']