Python 类实例中的空变量,尽管专门设置了它
当我运行以下代码时:Python 类实例中的空变量,尽管专门设置了它,python,python-2.7,scrapy,Python,Python 2.7,Scrapy,当我运行以下代码时: import scrapy from scrapy.crawler import CrawlerProcess class QuotesSpider(scrapy.Spider): name = "quotes" search_url = '' def start_requests(self): print ('self.search_url is currently: ' + self.search_url) y
import scrapy
from scrapy.crawler import CrawlerProcess
class QuotesSpider(scrapy.Spider):
name = "quotes"
search_url = ''
def start_requests(self):
print ('self.search_url is currently: ' + self.search_url)
yield scrapy.Request(url=self.search_url, callback=self.parse)
def parse(self, response):
page = response.url.split("/")[-2]
filename = 'quotes-%s.html' % page
with open(filename, 'wb') as f:
f.write(response.body)
self.log('Saved file %s' % filename)
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
test_spider = QuotesSpider()
test_spider.search_url='http://quotes.toscrape.com/page/1/'
process.crawl(test_spider)
process.start() # the script will block here until the crawling is finished
我得到以下错误:
self.search_url is currently:
...
ValueError('Missing scheme in request url: %s' % self._url)
ValueError: Missing scheme in request url:
...
在函数start\u requests中,self.search\u url似乎是一个空变量,尽管我在调用函数之前已显式地将其值设置为某个值。我似乎不明白为什么会这样。最简单的方法是使用构造函数
\uuu init\uuu()
,但更简单的方法(可能只是更快)是在类中移动start\u url
的定义。例如:
import scrapy
from scrapy.crawler import CrawlerProcess
class QuotesSpider(scrapy.Spider):
name = "quotes"
search_url = 'http://quotes.toscrape.com/page/1/'
def start_requests(self):
print ('search_url is currently: ' + self.search_url)
yield scrapy.Request(url=self.search_url, callback=self.parse)
def parse(self, response):
page = response.url.split("/")[-2]
filename = 'quotes-%s.html' % page
with open(filename, 'wb') as f:
f.write(response.body)
self.log('Saved file %s' % filename)
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
test_spider = QuotesSpider()
process.crawl(test_spider)
process.start()
是否有特定的原因要将
search\u url
声明为实例属性?你可以把它传给全班吗?