Python 从网页下载多个PDF文件
所以我正试图下载一些我通过HumbleBundle购买的电子书。我正在使用beautifulsoup和请求来尝试解析html并获取PDF的href链接 PythonPython 从网页下载多个PDF文件,python,pdf,beautifulsoup,python-requests,Python,Pdf,Beautifulsoup,Python Requests,所以我正试图下载一些我通过HumbleBundle购买的电子书。我正在使用beautifulsoup和请求来尝试解析html并获取PDF的href链接 Python import requests from bs4 import BeautifulSoup r = requests.get("https://www.humblebundle.com/downloads?key=fkuzzq6R8MA8ydEw") soup = BeautifulSoup(r.content, "html.
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.humblebundle.com/downloads?key=fkuzzq6R8MA8ydEw")
soup = BeautifulSoup(r.content, "html.parser")
links = soup.find_all("div", {"class": "js-all-downloads-holder"})
print(links)
<div class="flexbtn active noicon js-start-download">
<div class="right"></div>
<span class="label">PDF</span>
<a class="a" download="" href="https://dl.humble.com/makea2drpginaweekend.pdf?gamekey=fkuzzq6R8MA8ydEw&ttl=1521117317&t=b714bb732413a1f0532ec6aa72b282f9">
PDF
</a>
</div>
python3 pdf_downloader.py
[]
我将在网站和html布局上放置一个imgar链接,因为我不相信在不提示登录的情况下您可以访问html页面(这可能是我首先遇到这个问题的原因之一)
HTML
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.humblebundle.com/downloads?key=fkuzzq6R8MA8ydEw")
soup = BeautifulSoup(r.content, "html.parser")
links = soup.find_all("div", {"class": "js-all-downloads-holder"})
print(links)
<div class="flexbtn active noicon js-start-download">
<div class="right"></div>
<span class="label">PDF</span>
<a class="a" download="" href="https://dl.humble.com/makea2drpginaweekend.pdf?gamekey=fkuzzq6R8MA8ydEw&ttl=1521117317&t=b714bb732413a1f0532ec6aa72b282f9">
PDF
</a>
</div>
python3 pdf_downloader.py
[]
很抱歉写了这么长的帖子,我刚刚通宵都在做这个,现在点击下载按钮20多次会更容易,但这不是你学习的方式。你需要使用
请求登录到页面,然后再删除它。您得到的响应是任何用户在不登录的情况下都会得到的。谢谢,我想我会尝试一下selenium这个项目。可能会更容易。