Python Scrapy爬虫在shell中工作，但不在代码中_Python_Scrapy_Web Crawler

Python Scrapy爬虫在shell中工作，但不在代码中

python scrapy web-crawler

Python Scrapy爬虫在shell中工作，但不在代码中,python,scrapy,web-crawler,Python,Scrapy,Web Crawler,你好，我正在尝试从scrapy构建一个简单的爬虫程序代码在ScrapyShell中运行良好，但当我通过控制台运行它时，它不会将任何内容写入json文件我从project top目录运行它，如下所示 scrapy crawl filemare -o filemare.json import scrapy class FilemareSpider(scrapy.Spider): name = "filemare" allowed_domains = ['https://f

你好，我正在尝试从scrapy构建一个简单的爬虫程序

代码在ScrapyShell中运行良好，但当我通过控制台运行它时，它不会将任何内容写入json文件

我从project top目录运行它，如下所示

scrapy crawl filemare -o filemare.json


import scrapy


class FilemareSpider(scrapy.Spider):
    name = "filemare"
    allowed_domains = ['https://filemare.com/']
    start_urls = ["https://filemare.com/en-
                   us/search/firmware%20download/632913359"]

    def parse(self, response):
        items = response.xpath('//div[@class="f"]/text()').extract()
        #items = response.css('div.f::text').extract()

        for url in items:
            print(url)
            yield url

parse

方法必须返回

dict

、Scrapy

项

或

请求

对象（请参阅）。在您的情况下，您将生成一个字符串。如果运行spider，您将在输出中看到错误

更改代码的相应部分，如下所示：

...
def parse(self, response):
    items = response.xpath('//div[@class="f"]/text()').extract()

    for url in items:
        print(url)
        yield {'url': url}

谢谢，这很有用。但由于robots.txt文件，我的爬虫程序被禁用。需要将设置文件更改为ROBOTSTXT_obe=false真的吗？当我自己尝试你的代码时，爬行继续，屈服是唯一的问题。