Python 在beautifulsoup中未返回标记内容
我正试图提取以下字符串:Python 在beautifulsoup中未返回标记内容,python,beautifulsoup,lxml,Python,Beautifulsoup,Lxml,我正试图提取以下字符串: <item> <dc:creator><![CDATA[Chris M]]></dc:creator> <pubDate>Tue, 06 Jun 2017 07:38:23 +0000</pubDate> </item> 这将产生: <dc:creator></dc:creator> 如何从标记中获取名称内容?这对我使用Python 3- 将解析器指定为x
<item>
<dc:creator><![CDATA[Chris M]]></dc:creator>
<pubDate>Tue, 06 Jun 2017 07:38:23 +0000</pubDate>
</item>
这将产生:
<dc:creator></dc:creator>
如何从标记中获取名称内容?这对我使用Python 3- 将解析器指定为
xml
import bs4 as bs
content="""
<collection>
<item><dc:creator><![CDATA[Chris M]]></dc:creator></item>
<item><dc:creator><![CDATA[Harris A]]></dc:creator></item>
</collection>
"""
soup = bs.BeautifulSoup(content, 'xml')
items = soup.findAll("item")
for i in items:
author = i.find('creator')
print(author.string)
BeautifulSoup将CData识别为一个子类,因此您可以让它检查CData的实例
>>> from bs4 import BeautifulSoup, CData
>>> text = """<item>
<dc:creator><![CDATA[Chris M]]></dc:creator>
<pubDate>Tue, 06 Jun 2017 07:38:23 +0000</pubDate>
</item>"""
>>> soup = BeautifulSoup(text)
>>> for item in soup.findAll(text=True):
if isinstance(item, CData):
print(item)
Chris M
>>来自bs4导入美化组,CData
>>>text=”“”
2017年6月6日星期二07:38:23+0000
"""
>>>soup=BeautifulSoup(文本)
>>>对于soup.findAll(text=True)中的项目:
如果isinstance(项目,CData):
打印(项目)
克里斯·M
您是否尝试了creator
而不是dc:creator
?@codekaizer是,它不会返回任何内容
Chris M
Harris A
>>> from bs4 import BeautifulSoup, CData
>>> text = """<item>
<dc:creator><![CDATA[Chris M]]></dc:creator>
<pubDate>Tue, 06 Jun 2017 07:38:23 +0000</pubDate>
</item>"""
>>> soup = BeautifulSoup(text)
>>> for item in soup.findAll(text=True):
if isinstance(item, CData):
print(item)
Chris M