Python 碎片媒体管道，文件未下载_Python_Web Scraping_Scrapy

Python 碎片媒体管道，文件未下载

python web-scraping scrapy

Python 碎片媒体管道，文件未下载,python,web-scraping,scrapy,Python,Web Scraping,Scrapy,我是个新手。我正在尝试使用媒体管道下载文件。但当我运行spider时，文件夹中不会存储任何文件蜘蛛网： import scrapy from scrapy import Request from pagalworld.items import PagalworldItem class JobsSpider(scrapy.Spider): name = "songs" allowed_domains = ["pagalworld.me"] start_urls =['h

我是个新手。我正在尝试使用媒体管道下载文件。但当我运行spider时，文件夹中不会存储任何文件

蜘蛛网：

import scrapy
from scrapy import Request
from pagalworld.items import PagalworldItem

class JobsSpider(scrapy.Spider):
    name = "songs"
    allowed_domains = ["pagalworld.me"]
    start_urls =['https://pagalworld.me/category/11598/Latest%20Bollywood%20Hindi%20Mp3%20Songs%20-%202017.html']

    def parse(self, response):
        urls = response.xpath('//div[@class="pageLinkList"]/ul/li/a/@href').extract()

        for link in urls:

            yield Request(link, callback=self.parse_page, )




    def parse_page(self, response):
        songName=response.xpath('//li/b/a/@href').extract()
        for song in songName:
            yield Request(song,callback=self.parsing_link)


    def parsing_link(self,response):
        item= PagalworldItem()
        item['file_urls']=response.xpath('//div[@class="menu_row"]/a[@class="touch"]/@href').extract()
        yield{"download_link":item['file_urls']}

项目文件：

import scrapy


class PagalworldItem(scrapy.Item):


    file_urls=scrapy.Field()

设置文件：

BOT_NAME = 'pagalworld'

SPIDER_MODULES = ['pagalworld.spiders']
NEWSPIDER_MODULE = 'pagalworld.spiders'
ROBOTSTXT_OBEY = True
CONCURRENT_REQUESTS = 5
DOWNLOAD_DELAY = 3
ITEM_PIPELINES = {

'scrapy.pipelines.files.FilesPipeline': 1
}
FILES_STORE = '/tmp/media/'

输出如下所示：

你正在屈服：

yield {"download_link": ['http://someurl.com']}

若要使scrapy的媒体/文件管道正常工作，您需要生成包含

文件URL

字段的项目。因此，请尝试以下方法：

def parsing_link(self,response):
    item= PagalworldItem()
    item['file_urls']=response.xpath('//div[@class="menu_row"]/a[@class="touch"]/@href').extract()
    yield item

你正在屈服：

yield {"download_link": ['http://someurl.com']}

若要使scrapy的媒体/文件管道正常工作，您需要生成包含

文件URL

字段的项目。因此，请尝试以下方法：

def parsing_link(self,response):
    item= PagalworldItem()
    item['file_urls']=response.xpath('//div[@class="menu_row"]/a[@class="touch"]/@href').extract()
    yield item

您没有编写任何代码来下载/保存文件。到这里来，想一想。希望此帮助您没有编写任何代码来下载/保存文件。到这里来，想一想。希望这更有用我尝试爬行蜘蛛进行解析但不起作用你能看到吗？我之前尝试爬行蜘蛛进行解析但不起作用你能看到吗