Python 使用urlopen打开URL列表_Python_Urllib_Urlopen

Python 使用urlopen打开URL列表

python

Python 使用urlopen打开URL列表,python,urllib,urlopen,Python,Urllib,Urlopen,我有一个python脚本，可以获取网页并对其进行镜像。它适用于一个特定的页面，但我不能让它适用于多个页面。我假设我可以将多个URL放入一个列表中，然后将其提供给函数，但我得到以下错误： Traceback (most recent call last): File "autowget.py", line 46, in <module> getUrl() File "autowget.py", line 43, in getUrl response = urll

我有一个python脚本，可以获取网页并对其进行镜像。它适用于一个特定的页面，但我不能让它适用于多个页面。我假设我可以将多个URL放入一个列表中，然后将其提供给函数，但我得到以下错误：

Traceback (most recent call last):
  File "autowget.py", line 46, in <module>
    getUrl()
  File "autowget.py", line 43, in getUrl
    response = urllib.request.urlopen(url)
  File "/usr/lib/python3.2/urllib/request.py", line 139, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.2/urllib/request.py", line 361, in open
    req.timeout = timeout
AttributeError: 'tuple' object has no attribute 'timeout'

我已经用尽了Google来寻找如何使用urlopen（）打开列表。我找到了一种可行的方法。它接受一个

.txt

文档，并逐行遍历，每行都作为URL提供，但我使用Python 3编写此文档，并且出于任何原因

TwillCommand loop

不会导入。另外，这种方法很难操作，而且需要（据说）不必要的工作

无论如何，任何帮助都将不胜感激。

它不支持元组：

urllib.request.urlopen(url[, data][, timeout])
Open the URL url, which can be either a string or a Request object.

你打错电话了。应该是：

getUrl(url[0],url[1],url[2])

在函数内部，使用类似“for u in url”的循环遍历所有url。

在代码中有一些错误：

使用变量参数列表（错误中的元组）定义getURL
将getUrls参数作为单个变量进行管理（改为列表）

您可以尝试使用此代码

import urllib2
import shutil

urls = ['https://www.example.org/', 'https://www.foo.com/', 'http://bar.com']
def getUrl(urls):
   for url in urls:
      #Only a file_name based on url string
      file_name = url.replace('https://', '').replace('.', '_').replace('/','_')
      response = urllib2.urlopen(url)
      with open(file_name, 'wb') as out_file:
         shutil.copyfileobj(response, out_file)
getUrl(urls)

您应该使用循环在URL上迭代：

我假设您希望将内容保存到单独的文件中，因此我在这里创建了一个uniqe文件名，但是您显然可以使用从模块到创建的任何内容

所以我必须创建多个变量？不幸。谢谢。我很感激。但是，难道没有一个更容易通过的名单吗？使用for循环代替var[0]、var[0]等？不，你应该使用

for

循环。你为什么不简单地用

for

循环来迭代你的URL列表呢？这是在回复盛的评论时才想到的！它将以字符串形式返回特定部分，对吗？谢谢-在我的代码前面，我将文件名设置为

'/path/to/directory'

加上'domain'，其中'domain'是

http://www

和

.com

。脚本包括FTPs和push（通过Git到GitHub页面），因此我必须有一个设置的文件路径，否则Git部分将无法工作。无论如何，再次谢谢你！谢谢你！

import urllib2
import shutil

urls = ['https://www.example.org/', 'https://www.foo.com/', 'http://bar.com']
def getUrl(urls):
   for url in urls:
      #Only a file_name based on url string
      file_name = url.replace('https://', '').replace('.', '_').replace('/','_')
      response = urllib2.urlopen(url)
      with open(file_name, 'wb') as out_file:
         shutil.copyfileobj(response, out_file)
getUrl(urls)

import shutil
import urllib.request


urls = ['https://www.example.org/', 'https://www.foo.com/']

file_name = 'foo.txt'

def fetch_urls(urls):
    for i, url in enumerate(urls):
        file_name = "page-%s.html" % i
        response = urllib.request.urlopen(url)
        with open(file_name, 'wb') as out_file:
            shutil.copyfileobj(response, out_file)

fetch_urls(urls)