Scrapy restrict_xpath语法错误
我试图将Scrapy限制在以下链接的特定XPath位置。XPath是正确的(根据chrome的XPath助手插件),但是当我运行爬行器时,我的规则中出现了语法错误 我的蜘蛛代码是:Scrapy restrict_xpath语法错误,xpath,scrapy,Xpath,Scrapy,我试图将Scrapy限制在以下链接的特定XPath位置。XPath是正确的(根据chrome的XPath助手插件),但是当我运行爬行器时,我的规则中出现了语法错误 我的蜘蛛代码是: from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from scrapy.selector import HtmlXPathSele
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from tutorial.items import BassItem
import logging
from scrapy.log import ScrapyFileLogObserver
logfile = open('testlog.log', 'w')
log_observer = ScrapyFileLogObserver(logfile, level=logging.DEBUG)
log_observer.start()
class BassSpider(CrawlSpider):
name = "bass"
allowed_domains = ["talkbass.com"]
start_urls = ["http://www.talkbass.com/forum/f126"]
rules = [Rule(SgmlLinkExtractor(allow=['/f126/index*']), callback='parse_item', follow=True, restrict_xpaths=('//a[starts-with(@title,"Next ")]')]
def parse_item(self, response):
hxs = HtmlXPathSelector(response)
ads = hxs.select('//table[@id="threadslist"]/tbody/tr/td[@class="alt1"][2]/div')
items = []
for ad in ads:
item = BassItem()
item['title'] = ad.select('a/text()').extract()
item['link'] = ad.select('a/@href').extract()
items.append(item)
return items
因此,在规则内部,XPath“//a[以(@title,“Next”)]开头]返回错误,我不确定原因,因为实际的XPath是有效的。我只是想让蜘蛛抓取每个“下一页”链接。有人能帮我吗。如果您需要我的代码的任何其他部分的帮助,请告诉我。问题不在于xpath,而在于完整规则的语法不正确。以下规则修复了语法错误,但应进行检查以确保它正在执行所需的操作:
rules = (Rule(SgmlLinkExtractor(allow=['/f126/index*'], restrict_xpaths=('//a[starts-with(@title,"Next ")]')),
callback='parse_item', follow=True, ),
)
一般来说,强烈建议在问题中公布实际错误,因为对错误和实际错误的看法可能会有所不同。问题不在于xpath,而在于完整规则的语法不正确。以下规则修复了语法错误,但应进行检查以确保它正在执行所需的操作:
rules = (Rule(SgmlLinkExtractor(allow=['/f126/index*'], restrict_xpaths=('//a[starts-with(@title,"Next ")]')),
callback='parse_item', follow=True, ),
)
一般来说,强烈建议在问题中公布实际错误,因为对错误和实际错误的看法可能会有所不同。感谢您的回答-我将尝试一下。我收到的错误是:rules=[Rule(SgmlLinkExtractor(allow=['/f126/index*']),callback='parse_item',follow=True,restrict_xpath=('//a[以(@title,“Next”)开头])]^SyntaxError:无效语法谢谢您的回答-我将尝试此操作。我收到的错误是:rules=[Rule(sgmlLinkedExtractor(allow=['/f126/index*')),callback='parse_item',follow=True,restrict_xpaths=('//a[以(@title,“Next”)开头])]^SyntaxError:无效语法