通过Python使用wget_Python_Linux

通过Python使用wget

python linux

通过Python使用wget,python,linux,Python,Linux,我如何使用wget下载Python文件（视频）并将其保存在本地？将有一堆文件，所以我如何知道一个文件已下载，以便自动开始下载另一个文件谢谢。不要这样做。使用或。如果您使用为wget生成一个进程，它将一直阻止，直到wget完成下载（或因错误退出）。因此，只需在循环中调用os.system（'wget blah'），直到下载完所有文件或者，您可以使用或。您必须编写大量代码，但您将获得更好的性能，因为您可以重用单个HTTP连接来下载许多文件，而不是为每个文件打开新连接。没有理由使用os.syste

我如何使用wget下载Python文件（视频）并将其保存在本地？将有一堆文件，所以我如何知道一个文件已下载，以便自动开始下载另一个文件

谢谢。

不要这样做。使用或。如果您使用为

wget

生成一个进程，它将一直阻止，直到

wget

完成下载（或因错误退出）。因此，只需在循环中调用

os.system（'wget blah'）

，直到下载完所有文件

或者，您可以使用或。您必须编写大量代码，但您将获得更好的性能，因为您可以重用单个HTTP连接来下载许多文件，而不是为每个文件打开新连接。

没有理由使用os.system。避免用Python编写shell脚本，使用类似于urllib.urlretrieve或类似的工具

编辑。。。要回答问题的第二部分，可以使用标准库队列类设置线程池。既然你下载了很多，GIL应该不会是个问题。生成您希望下载的URL列表，并将其提供给您的工作队列。它将处理推送到工作线程的请求

我正在等待数据库更新完成，所以我很快就完成了


#!/usr/bin/python

import sys
import threading
import urllib
from Queue import Queue
import logging

class Downloader(threading.Thread):
    def __init__(self, queue):
        super(Downloader, self).__init__()
        self.queue = queue

    def run(self):
        while True:
            download_url, save_as = queue.get()
            # sentinal
            if not download_url:
                return
            try:
                urllib.urlretrieve(download_url, filename=save_as)
            except Exception, e:
                logging.warn("error downloading %s: %s" % (download_url, e))

if __name__ == '__main__':
    queue = Queue()
    threads = []
    for i in xrange(5):
        threads.append(Downloader(queue))
        threads[-1].start()

    for line in sys.stdin:
        url = line.strip()
        filename = url.split('/')[-1]
        print "Download %s as %s" % (url, filename)
        queue.put((url, filename))

    # if we get here, stdin has gotten the ^D
    print "Finishing current downloads"
    for i in xrange(5):
        queue.put((None, None))

没有理由使用python。避免用Python编写shell脚本，使用bash或类似的工具。

通过pypi安装wget

然后运行，就像文档记录的那样

python -m wget <url>

python-mwget

简短回答（简化）。获取一个文件

 import urllib
 urllib.urlretrieve("http://google.com/index.html", filename="local/index.html")

如果需要，您可以找出如何循环该操作。

您将如何操作？首先搜索与您的问题完全相同的所有先前问题：。第二，阅读这个具体问题：这个答案需要扩展。为什么不应该使用

wget

？因为它启动了一个全新的过程，只是为了做Python本身能够做的事情。因为它破坏了可移植性。编写

wget-rl1-I/stuff/I/want/http://url/

使用这些LIB中的任何一个？wget通过VPN客户端工作，然而，urllib为https:urlopen错误隧道连接提供了此错误：需要407代理身份验证

download\u url，save\u as=queue.get（）

中存在错误。应该是

download\u url，另存为=self.queue.get（）

。用Python编写shell脚本是可以的。如果您想快速完成某件事情，但又讨厌bash的语法，只需使用Python即可。如果你做了一个更大的项目，那么是的，尽量避免这些外部调用。Python是一种很好的脚本语言。对于其他感到困惑的人来说，链接库不使用wget。它使用urllib。它目前不支持任何类似wget（）的功能。

 import urllib
 urllib.urlretrieve("http://google.com/index.html", filename="local/index.html")