Python 在beautifulsoup中未返回标记内容

Python 在beautifulsoup中未返回标记内容,python,beautifulsoup,lxml,Python,Beautifulsoup,Lxml,我正试图提取以下字符串: <item> <dc:creator><![CDATA[Chris M]]></dc:creator> <pubDate>Tue, 06 Jun 2017 07:38:23 +0000</pubDate> </item> 这将产生: <dc:creator></dc:creator> 如何从标记中获取名称内容?这对我使用Python 3- 将解析器指定为x

我正试图提取以下字符串:

<item>
<dc:creator><![CDATA[Chris M]]></dc:creator>
<pubDate>Tue, 06 Jun 2017 07:38:23 +0000</pubDate>
</item>
这将产生:

<dc:creator></dc:creator>


如何从标记中获取名称内容?

这对我使用Python 3-

将解析器指定为
xml

import bs4 as bs
content="""
<collection>
    <item><dc:creator><![CDATA[Chris M]]></dc:creator></item>
    <item><dc:creator><![CDATA[Harris A]]></dc:creator></item>
</collection>
"""

soup = bs.BeautifulSoup(content, 'xml')

items = soup.findAll("item")
for i in items:
   author = i.find('creator')
   print(author.string)

BeautifulSoup将CData识别为一个子类,因此您可以让它检查CData的实例

>>> from bs4 import BeautifulSoup, CData

>>> text = """<item>
<dc:creator><![CDATA[Chris M]]></dc:creator>
<pubDate>Tue, 06 Jun 2017 07:38:23 +0000</pubDate>
</item>"""
>>> soup = BeautifulSoup(text)
>>> for item in soup.findAll(text=True):
        if isinstance(item, CData):
            print(item)


Chris M
>>来自bs4导入美化组,CData
>>>text=”“”
2017年6月6日星期二07:38:23+0000
"""
>>>soup=BeautifulSoup(文本)
>>>对于soup.findAll(text=True)中的项目:
如果isinstance(项目,CData):
打印(项目)
克里斯·M

您是否尝试了
creator
而不是
dc:creator
?@codekaizer是,它不会返回任何内容
Chris M
Harris A
>>> from bs4 import BeautifulSoup, CData

>>> text = """<item>
<dc:creator><![CDATA[Chris M]]></dc:creator>
<pubDate>Tue, 06 Jun 2017 07:38:23 +0000</pubDate>
</item>"""
>>> soup = BeautifulSoup(text)
>>> for item in soup.findAll(text=True):
        if isinstance(item, CData):
            print(item)


Chris M