Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/336.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Don';不要等待用Scrapy下载文件_Python_Scrapy - Fatal编程技术网

Python Don';不要等待用Scrapy下载文件

Python Don';不要等待用Scrapy下载文件,python,scrapy,Python,Scrapy,我有一个项目管道,从项目中获取url并下载它。问题是,我有另一个管道,我手动检查这个文件,并添加一些有关它的信息。我真的需要在文件下载之前完成 class VideoCommentPipeline(object): def process_item(self, item, spider): os.system("vlc -vvv %s > /dev/null 2>&1 &" % item['file']) item['comm

我有一个项目管道,从项目中获取url并下载它。问题是,我有另一个管道,我手动检查这个文件,并添加一些有关它的信息。我真的需要在文件下载之前完成

class VideoCommentPipeline(object):

    def process_item(self, item, spider):
        os.system("vlc -vvv %s > /dev/null 2>&1 &" % item['file'])
        item['comment'] = raw_input('Your comment:')
        return item

class VideoDownloadPipeline(object):

    def process_item(self, item, spider):
        video_basename = item['file'].split('/')[-1]
        new_filename = os.path.join(VIDEOS_DIR, video_basename)
        downloaded = False
        for i in range(5):
            try:
                video = urllib2.urlopen(item['file']).read()
                downloaded = True
                break
            except:
                continue
        if not downloaded:
            raise DropItem("Couldn't download file from %s" % item)
        f = open(new_filename, 'wb')
        f.write(video)
        f.close()
        item['file'] = video_basename
        return item
但现在我总是要等待另一个项目,因为以前项目中的文件尚未下载。我宁愿检查所有项目,然后让它全部下载。我怎样才能做到这一点呢?

Scrapy提供了可供您在此处使用的功能。它没有很好的文档记录,但它存在并且可以使用,至少在最新的scrapy版本中是如此。要了解它是如何工作的,您需要阅读代码,这在我看来是非常直观的。您可以查看界面来了解媒体管道是如何工作的

要在下载之前检查每个视频,您可以编写类似的内容(您需要将其与项目字段名称匹配)


不知道为什么,但现在我得到:
(失败2次):[,]
from scrapy.contrib.pipeline.media import MediaPipeline

class VideoPipeline(MediaPipeline):
    VIDEOS_DIR = "/stack/scrapy/video/video/store"

    def get_media_requests(self, item, info):
        """
        Evaluate file and, if you like it, download it.
        """
        os.system("vlc -vvv %s > /dev/null 2>&1 &" % item['video_url'][0])
        your_opinion = raw_input("how does it look?")
        item["comment"] = your_opinion
        if your_opinion == "hot":
            # issue request download video
            return Request(item["video_url"][0], meta={"item":item})

    def media_downloaded(self, response, request, info):
        """
        File is downloaded available as response.body save it.
        """
        item = response.meta.get("item")
        video = response.body
        video_basename = item['title'][0]
        new_filename = os.path.join(self.VIDEOS_DIR, video_basename)
        f = open(new_filename, 'wb')
        f.write(video)
        f.close()