Python 如果请求可以'；T_Python_Web Scraping_Beautifulsoup_Python Requests

Python 如果请求可以'；T

python web-scraping

Python 如果请求可以'；T,python,web-scraping,beautifulsoup,python-requests,Python,Web Scraping,Beautifulsoup,Python Requests,所以我以前试过Selenium，现在想测试bs4。我尝试运行以下代码，但收到None作为输出 res_pewdiepie = requests.get( 'https://www.youtube.com/user/PewDiePie') soup = bs4.BeautifulSoup(res_pewdiepie.content, "lxml") subs = soup.find(id="sub-count") print(subs) 经过一段时

所以我以前试过Selenium，现在想测试bs4。我尝试运行以下代码，但收到

None

作为输出

res_pewdiepie = requests.get(
    'https://www.youtube.com/user/PewDiePie')
soup = bs4.BeautifulSoup(res_pewdiepie.content, "lxml")
subs = soup.find(id="sub-count")
print(subs)

经过一段时间的研究，我发现请求不会像YouTube或Socialblade上的子计数那样加载动态内容。有没有办法通过bs4获取这些信息，或者我是否必须切换回Selenium之类的产品？

提前谢谢

BeautifulSoup

只能解析您提供给它的文本，在本例中是页面源代码。如果信息不在那里，它就无能为力。因此，我认为您必须切换回支持javascript的东西

一些选择：

BeautifulSoup

只能解析您提供给它的文本，在本例中是页面源代码。如果信息不在那里，它就无能为力。因此，我认为您必须切换回支持javascript的东西

一些选择：

我用它来做这样的东西。您可以在docker容器中运行它。您可以根据每个请求调整它等待渲染的时间。如果你正在做任何严重的爬行，也有一个刮擦插件。下面是我的一个爬虫的一个片段，它使用Docker在本地运行Splash。祝你好运

target_url = "https://somewhere.example.com/"
splash_url = "http://localhost:8050/render.json"
body = json.dumps({"url": target_url, "har": 0, "html": 1, "wait": 10,})
headers = {"Content-Type": "application/json"}

response = requests.post(splash_url, data=body, headers=headers)
result = json.loads(response.text)
html = result["html"]

target_url = "https://somewhere.example.com/"
splash_url = "http://localhost:8050/render.json"
body = json.dumps({"url": target_url, "har": 0, "html": 1, "wait": 10,})
headers = {"Content-Type": "application/json"}

response = requests.post(splash_url, data=body, headers=headers)
result = json.loads(response.text)
html = result["html"]