Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/360.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 废弃的Sgmllinkextractor规则未对所有定义的链接进行爬网_Python_Web Scraping_Scrapy - Fatal编程技术网

Python 废弃的Sgmllinkextractor规则未对所有定义的链接进行爬网

Python 废弃的Sgmllinkextractor规则未对所有定义的链接进行爬网,python,web-scraping,scrapy,Python,Web Scraping,Scrapy,我想以以下格式抓取所有链接: http://example.com/index.php/comments/XXXXX http://example.com/XXX1/index.php/comments/XXXXX http://example.com/XXX2/index.php/comments/XXXX http://example.com/XXX3/index.php/comments/XXXX 我定义了以下规则: start_urls = ['http://example.com/'

我想以以下格式抓取所有链接:

http://example.com/index.php/comments/XXXXX
http://example.com/XXX1/index.php/comments/XXXXX
http://example.com/XXX2/index.php/comments/XXXX
http://example.com/XXX3/index.php/comments/XXXX
我定义了以下规则:

start_urls = ['http://example.com/']

rules = [Rule(SgmlLinkExtractor(allow=[r'\w+/index.php/comments/\w+']), callback='parse_blogpost', follow=True)]
但爬虫似乎只访问了这样的链接(),而没有访问这样的链接()


任何帮助都将不胜感激

请尝试使用
index.php/comments
而不是
\w+/index.php/comments/\w+/code>。您好,谢谢您的回复。我试过了,但没用。经过仔细的调查,我认为原因是的网页上没有链接(),所以爬虫程序无法跟踪类似的链接。