Python 使用爬行蜘蛛刮不起作用
我正在和Scrapy一起工作,我试图用蜘蛛来抓取整个网站,但我没有得到任何结果 在我的候机楼 PS:我在脚本中从浏览器运行Scrapy 这是我的代码:Python 使用爬行蜘蛛刮不起作用,python,Python,我正在和Scrapy一起工作,我试图用蜘蛛来抓取整个网站,但我没有得到任何结果 在我的候机楼 PS:我在脚本中从浏览器运行Scrapy 这是我的代码: import scrapy from scrapy.crawler import CrawlerProcess from scrapy.spiders import CrawlSpider, Rule from scrapy.linkextractors import LinkExtractor class MySpider(CrawlSpid
import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
class MySpider(CrawlSpider):
name = 'website.com'
allowed_domains = ['website.com']
start_urls = ['http://www.website.com']
rules = (
# Extract links matching 'category.php' (but not matching 'subsection.php')
# and follow links from them (since no callback means follow=True by default).
Rule(LinkExtractor(allow=('/', ), deny=('subsection\.php', ))),
# Extract links matching 'item.php' and parse them with the spider's method parse_item
)
def parse_item(self, response):
print(response.css('title').extract())
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(MySpider)
process.start()
您错过了回调参数 简单地改变
Rule(LinkExtractor(allow=('/'),deny=('subsection\.php',),
到
根据文档,您忘记将callback参数传递给链接提取器
Rule(LinkExtractor(allow=('/', ), deny=('subsection\.php', )), callback='parse_item')