Python 为什么Beauty soup会向文档中添加额外的xml声明，以及如何删除它？_Python_Xml_Beautifulsoup

Python 为什么Beauty soup会向文档中添加额外的xml声明，以及如何删除它？

python xml

Python 为什么Beauty soup会向文档中添加额外的xml声明，以及如何删除它？,python,xml,beautifulsoup,Python,Xml,Beautifulsoup,我试图解析一个有头的简单xml。代码如下： str(BeautifulSoup(""" <?xml version="1.0" encoding="UTF-8"?> <data/> """, features='xml')) 当您将xml传递给features参数时，lxml构建xml树本身。所以你不需要自己放头球 >>> str(BeautifulSoup(""" ... <data/> ... """, features='xml'))

我试图解析一个有头的简单xml。代码如下：

str(BeautifulSoup("""
<?xml version="1.0" encoding="UTF-8"?>
<data/>
""", features='xml'))

当您将

xml

传递给

features

参数时，

lxml

构建xml树本身。所以你不需要自己放头球

>>> str(BeautifulSoup("""
... <data/>
... """, features='xml'))
'<?xml version="1.0" encoding="utf-8"?>\n<data/>'

>>>

>>str（美化组（“”）
... 
…”，features='xml'））
“\n”
>>>

是虫子还是我做错了什么

简短的回答是的，你做错了

怎么用？您得到两个XML声明的原因是，您传入了Beauty Soup使用的

features

参数

但这并不是全部的历史。在中使用

self.is_xml

，它返回文档的字符串或Unicode表示形式，并且当

self.is_xml

为truthy时，它将返回

在我的应用程序中，xml已经有了一个标题。有没有一种有效的方法可以自动删除它？还是叫美女组别理它？我也不知道。我必须搜索。要从字符串（上面最后一行）中删除xml头，类似于

str（soup.split（“\n”）[-1]

beautifulsoup4==4.4.1
lxml==3.4.3

>>> str(BeautifulSoup("""
... <data/>
... """, features='xml'))
'<?xml version="1.0" encoding="utf-8"?>\n<data/>'

>>>

if builder is None:
    if isinstance(features, basestring):
        features = [features]
    if features is None or len(features) == 0:
        features = self.DEFAULT_BUILDER_FEATURES
    builder_class = builder_registry.lookup(*features)
    if builder_class is None:
    raise FeatureNotFound(
            "Couldn't find a tree builder with the features you "
            "requested: %s. Do you need to install a parser library?"
            % ",".join(features))
    builder = builder_class()
self.builder = builder
self.is_xml = builder.is_xml
self.builder.soup = self

if self.is_xml:
    # Print the XML declaration
    encoding_part = ''
    if eventual_encoding != None:
        encoding_part = ' encoding="%s"' % eventual_encoding
    prefix = u'<?xml version="1.0"%s?>\n' % encoding_part
    ...

>>> from bs4 import BeautifulSoup
>>> doc = '''<?xml version="1.0" encoding="UTF-8"?>
... <data/>'''
>>> soup = BeautifulSoup(doc, 'xml')
>>> str(soup)
'<?xml version="1.0" encoding="utf-8"?>\n<data/>'