Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python BeautifulSoup解析器添加了不必要的结束html标记_Python_Python 3.x_Beautifulsoup - Fatal编程技术网

Python BeautifulSoup解析器添加了不必要的结束html标记

Python BeautifulSoup解析器添加了不必要的结束html标记,python,python-3.x,beautifulsoup,Python,Python 3.x,Beautifulsoup,比如说 你有类似html的 <head> <meta charset="UTF-8"> <meta name="description" content="Free Web tutorials"> <meta name="keywords" content="HTML,CSS,XML,JavaScript"> <meta name="author" content="John Doe"> <meta name=

比如说

你有类似html的

<head>
  <meta charset="UTF-8">
  <meta name="description" content="Free Web tutorials">
  <meta name="keywords" content="HTML,CSS,XML,JavaScript">
  <meta name="author" content="John Doe">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
</head>
如果您在python中使用BeautifulSoup解析它,并使用prettify打印它,它将给出如下输出

输出:

from bs4 import BeautifulSoup as bs
import urllib3

URL = 'html file'

http = urllib3.PoolManager()

page = http.request('GET', URL)
soup = bs(page.data, 'html.parser')

print(soup.prettify())
<html>
<head>
  <meta charset="UTF-8">
    <meta name="description" content="Free Web tutorials">
        <meta name="keywords" content="HTML,CSS,XML,JavaScript">
            <meta name="author" content="John Doe">
                <meta name="viewport" content="width=device-width, initial-scale=1.0">
                </meta>
             </meta>
         </meta>
     </meta>
  </meta>
</head>

但是如果你有html元标记,比如

<meta name="description" content="Free Web tutorials" />

它将按原样输出。它不会添加结束标记


那么,如何阻止BeautifulSoup添加不必要的结束标记呢?

要解决这个问题,您只需要将
html
解析器更改为
lxml
解析器

那么您的python脚本将

from bs4 import BeautifulSoup as bs
import urllib3

URL = 'html file'

http = urllib3.PoolManager()

page = http.request('GET', URL)
soup = bs(page.data, 'lxml')

print(soup.prettify())
您只需将
soup=bs(page.data,'html.parser')
更改为
soup=bs(page.data,'lxml')