Python 刮痧蜘蛛的问题
我正在尝试从Python 刮痧蜘蛛的问题,python,scrapy,scrapy-spider,Python,Scrapy,Scrapy Spider,我正在尝试从moneycontrol.com网站获取股票的成交量加权平均价格。parse函数正在运行,没有出现任何问题,但是没有调用parse\u links函数。我是不是遗漏了什么 # -*- coding: utf-8 -*- import scrapy class MoneycontrolSpider(scrapy.Spider): name = "moneycontrol" allowed_domains = ["https://www.moneycontrol.com
moneycontrol.com
网站获取股票的成交量加权平均价格。parse
函数正在运行,没有出现任何问题,但是没有调用parse\u links
函数。我是不是遗漏了什么
# -*- coding: utf-8 -*-
import scrapy
class MoneycontrolSpider(scrapy.Spider):
name = "moneycontrol"
allowed_domains = ["https://www.moneycontrol.com"]
start_urls = ["https://www.moneycontrol.com/india/stockpricequote"]
def parse(self,response):
for link in response.css('td.last > a::attr(href)').extract():
if(link):
yield scrapy.Request(link, callback=self.parse_links,method='GET')
def parse_links(self, response):
VWAP= response.xpath('//*[@id="n_vwap_val"]/text()').extract_first()
print(VWAP)
with open('quotes.txt','a+') as f:
f.write('VWAP: {}'.format(VWAP) + '\n')
如果读取日志输出,错误将变得明显
2018-09-08 19:52:38 [py.warnings] WARNING: c:\program files\python37\lib\site-packages\scrapy\spidermiddlewares\offsite.py:59: URLWarning: allowed_domains accepts only domains, not URLs. Ignoring URL entry https://www.moneycontrol.com in allowed_domains.
warnings.warn("allowed_domains accepts only domains, not URLs. Ignoring URL entry %s in allowed_domains." % domain, URLWarning)
2018-09-08 19:52:38 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-09-08 19:52:39 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.moneycontrol.com/india/stockpricequote> (referer: None)
2018-09-08 19:52:40 [scrapy.spidermiddlewares.offsite] DEBUG: Filtered offsite request to 'www.moneycontrol.com': <GET http://www.moneycontrol.com/india/stockpricequote/chemicals/aartiindustries/AI45>
有可能调用了parse_links函数,您会得到一个异常。您可以尝试在方法的开头添加一个print('entingparse_links'),或者添加一个errback回调方法来捕获文档中提到的可能的异常:
allowed_domains = ["moneycontrol.com"]