Python 使用爬行蜘蛛刮不起作用

Python 使用爬行蜘蛛刮不起作用,python,Python,我正在和Scrapy一起工作,我试图用蜘蛛来抓取整个网站,但我没有得到任何结果 在我的候机楼 PS:我在脚本中从浏览器运行Scrapy 这是我的代码: import scrapy from scrapy.crawler import CrawlerProcess from scrapy.spiders import CrawlSpider, Rule from scrapy.linkextractors import LinkExtractor class MySpider(CrawlSpid

我正在和Scrapy一起工作,我试图用蜘蛛来抓取整个网站,但我没有得到任何结果 在我的候机楼

PS:我在脚本中从浏览器运行Scrapy

这是我的代码:

import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor

class MySpider(CrawlSpider):
    name = 'website.com'
    allowed_domains = ['website.com']
    start_urls = ['http://www.website.com']

    rules = (
        # Extract links matching 'category.php' (but not matching 'subsection.php')
        # and follow links from them (since no callback means follow=True by default).
        Rule(LinkExtractor(allow=('/', ), deny=('subsection\.php', ))),

        # Extract links matching 'item.php' and parse them with the spider's method parse_item

    )

    def parse_item(self, response):
        print(response.css('title').extract())




process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(MySpider)
process.start()

您错过了回调参数

简单地改变

Rule(LinkExtractor(allow=('/'),deny=('subsection\.php',),

根据文档,您忘记将callback参数传递给链接提取器

Rule(LinkExtractor(allow=('/', ), deny=('subsection\.php', )), callback='parse_item')