Python 为什么美苏会错过<；p>；标签？_Python_Beautifulsoup

Python 为什么美苏会错过<；p>；标签？

python

Python 为什么美苏会错过<；p>；标签？,python,beautifulsoup,Python,Beautifulsoup,我正在使用BeautifulSoup，findAll方法缺少标记。我运行代码，它返回空列表。但是，如果我检查页面，我可以清楚地看到它，如下图所示我随机选择了一些网站 import requests from bs4 import BeautifulSoup #An example web site url = 'https://www.kite.com/python/answers/how-to-extract-text-from-an-html-file-in-python' soup =

我正在使用BeautifulSoup，findAll方法缺少

标记。我运行代码，它返回空列表。但是，如果我检查页面，我可以清楚地看到它，如下图所示

我随机选择了一些网站

import requests
from bs4 import BeautifulSoup
#An example web site
url = 'https://www.kite.com/python/answers/how-to-extract-text-from-an-html-file-in-python'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

print(soup.findAll("p"))

输出：

(env) pinux@main:~/dev$ python trial.py
[]

我使用浏览器检查页面：

案文显然在那里。为什么美女组抓不到他们？有人能解释一下发生了什么吗？

这个网页的某些部分似乎是用JavaScript呈现的。您可以尝试使用

selenium

，因为selenium WebDrivers会自动等待页面完全呈现

导入bs4
从selenium导入webdriver
browser=webdriver.Firefox（）
browser.get（“https://url-to-webpage.com")
soup=bs4.BeautifulSoup（browser.page\u source，features=“html.parser”）

可能是因为它们是由JavaScript生成的。使用

查看源代码

查看它们是在HTML中还是由JavaScript动态添加的。查看源代码，肯定是JS呈现的。如果网站没有SSR，那么bs4将无法工作。您需要使用Cypress、Selenium等工具，这些工具可以运行JS并捕获源代码/导航源代码。感觉就像回到了原点，但是让我们来获取这个JavaScript呈现的文本。我很困惑。再次感谢。非常感谢，我现在正在使用硒。我太糊涂了！现在我明白了。