如何使用python从网站下载多个文件和图像_Python

如何使用python从网站下载多个文件和图像

python

如何使用python从网站下载多个文件和图像,python,Python,所以我尝试从一个网站下载多个文件并保存到一个文件夹中。我试图获取高速公路数据，在他们的网站上有一个pdf链接列表。我想创建一个代码，将提取在他们的网站上发现的众多PDF。可能会创建一个循环，通过该网站将每个文件提取并保存到我桌面上的本地文件夹中。有人知道我怎么做吗？由于您的目标是批量下载pdf文件，最简单的方法不是编写脚本，而是使用Commtial软件。Internet Download Manager只需通过两个步骤即可满足您的需要：复制webbrowser中的所有文本，包括链接。选择任务

所以我尝试从一个网站下载多个文件并保存到一个文件夹中。我试图获取高速公路数据，在他们的网站上有一个pdf链接列表。我想创建一个代码，将提取在他们的网站上发现的众多PDF。可能会创建一个循环，通过该网站将每个文件提取并保存到我桌面上的本地文件夹中。有人知道我怎么做吗？

由于您的目标是批量下载pdf文件，最简单的方法不是编写脚本，而是使用Commtial软件。Internet Download Manager只需通过两个步骤即可满足您的需要：

复制webbrowser中的所有文本，包括链接。选择任务>从剪贴板添加批量下载。

这是一个需要编码解决方案的问题。我可以向您指出一些用于实现这一点的工具，但不是完整的代码解决方案

请求库：与HTTP服务器网站通信

BeautifulSoup:Html解析器网站源代码解析

例如：

>>> import requests
>>> from bs4 import BeautifulSoup as BS
>>> 
>>> response = requests.get('http://news.ycombinator.com')
>>> response.status_code # 200 == OK
200
>>> 
>>> soup = BS(response.text) # Create a html parsing object
>>>
>>> soup.title # Heres the browser title tag
<title>Hacker News</title>
>>>
>>> soup.title.text # The contents of the tag
u'Hacker News'
>>> 
>>> # Heres some article posts
... 
>>> post_containers = soup.find_all('tr', attrs={'class':'athing'})
>>> 
>>> print 'There are %d article posts.' % len(post_containers)
There are 30 article posts.
>>> 
>>> 
>>> # The article name is the 3rd and last object in a post_container
... 
>>> for container in post_containers:
...     title = container.contents[-1] # The last tag
...     title.a.text # Grab the `a` tag inside our titile tag, print the text
... 
u'Show HN: \u201cWho is hiring?\u201d Map'
u'\u2018Flash Boys\u2019 Programmer in Goldman Case Prevails Second Time'
u'Forthcoming OpenSSL releases'
u'Show HN: YouTube Filesystem \u2013 YTFS'
u'Google launches Uber rival RideWith'
u'Finish your stuff'
u'The Plan to Feed the World by Hacking Photosynthesis'
u'New electric engine improves safety of light aircraft'
u'Hacking Team hacked, attackers claim 400GB in dumped data'
u'Show HN: Proof of concept \u2013 Realtime single page apps'
u'Berkeley CS 61AS \u2013 Structure and Interpretation of Computer Programs, Self-Paced'
u'An evaluation of Erlang global process registries: meet Syn'
u'Show HN: Nearby Buzz \u2013\xa0Take control of your online reviews'
u"The Grateful Dead's Wall of Sound"
u'The Effects of Intermittent Fasting on Human and Animal Health'
u'JsCoq'
u'Taking stock of startup innovation in the Netherlands'
u'Hangout: Becoming a freelance developer'
u'Panning for Pangrams: The Search for the New Quick Brown Fox'
u'Show HN: MUI \u2013 Lightweight CSS Framework for Material Design'
u"Intel's 10nm 'Cannonlake' delayed, replaced by 14nm 'Kaby Lake'"
u'VP of Logistics \u2013 EasyPost (YC S13) Hiring'
u'Colorado\u2019s Effort Against Teenage Pregnancies Is a Startling Success'
u'Lexical Scanning in Go (2011)'
u'Avoiding traps in software development with systems thinking'
u"Apache Cordova: after 10 months, I won't using it anymore"
u'An exercise in profiling a Go program'
u"The Science of Pixar's \u2018Inside Out\u2019"
u'Ask HN: What tech blogs, podcasts do you follow outside of HN?'
u'NASA\u2019s New Horizons Plans July 7 Return to Normal Science Operations'
>>>

Python解决方案是使用urllib下载PDF。请看

要获得要下载的PDF列表，请使用xml模块

问题是要一个代码，而不是像IDM.meelo这样的软件。有没有免费的互联网下载管理器，没有跟踪版本？

website = urllib.urlopen('http://www.wsdot.wa.gov/mapsdata/tools/InterchangeViewer/SR5.htm').read()
root = ET.fromstring(website)
list = root.findall('table')
hrefs = list.findall('a')
for a in hrefs:
  download(a)