Python 零碎的传递响应,缺少一个位置参数
python的新意,来自php。我想刮一些网站使用刮,并已通过教程和简单的脚本以及。现在写真正的交易时出现了以下错误: 回溯(最近一次呼叫最后一次): 文件“C:\Users\Naltroc\Miniconda3\lib\site packages\twisted\internet\defer.py”, 第653行,正在运行回调 current.result=回调(current.result,*args,**kw) 文件“C:\Users\Naltroc\Documents\Python Scripts\tutorial\tutorial\spider\quotes\u spider.py”,第52行,在parse中 自我分派网站 TypeError:thesaurus()缺少1个必需的位置参数:“response” 当调用shell命令Python 零碎的传递响应,缺少一个位置参数,python,scrapy,arguments,web-crawler,Python,Scrapy,Arguments,Web Crawler,python的新意,来自php。我想刮一些网站使用刮,并已通过教程和简单的脚本以及。现在写真正的交易时出现了以下错误: 回溯(最近一次呼叫最后一次): 文件“C:\Users\Naltroc\Miniconda3\lib\site packages\twisted\internet\defer.py”, 第653行,正在运行回调 current.result=回调(current.result,*args,**kw) 文件“C:\Users\Naltroc\Documents\Python Sc
Scrapy crawl words
时,Scrapy会自动实例化一个对象
据我所知,self
是任何类方法的第一个参数。调用类方法时,不要将self
作为参数传递,而是将其发送给变量
首先,这被称为:
# Scrapy automatically provides `response` to `parse()` when coming from `start_requests()`
def parse(self, response):
site = response.meta['site']
#same as "site = thesaurus"
self.dispatcher[site](response)
#same as "self.dispatcher['thesaurus'](response)
然后
在php中,这应该与调用$this->叙词表($response)
相同parse
显然是作为变量发送response
,但python说它丢失了它去了哪里?
完整代码如下:
import scrapy
class WordSpider(scrapy.Spider):
def __init__(self, keyword = 'apprehensive'):
self.k = keyword
name = "words"
# Utilities
def make_csv(self, words):
csv = ''
for word in words:
csv += word + ','
return csv
def save_words(self, words, fp):
with ofpen(fp, 'w') as f:
f.seek(0)
f.truncate()
csv = self.make_csv(words)
f.write(csv)
# site specific parsers
def thesaurus(self, response):
filename = 'thesaurus.txt'
words = ''
print("in func self is defined as ", self)
ul = response.css('.relevancy-block ul')
for idx, u in enumerate(ul):
if idx == 1:
break;
words = u.css('.text::text').extract()
print("words is ", words)
self.save_words(filename, words)
def oxford(self):
filename = 'oxford.txt'
words = ''
def collins(self):
filename = 'collins.txt'
words = ''
# site/function mapping
dispatcher = {
'thesaurus': thesaurus,
'oxford': oxford,
'collins': collins,
}
def parse(self, response):
site = response.meta['site']
self.dispatcher[site](response)
def start_requests(self):
urls = {
'thesaurus': 'http://www.thesaurus.com/browse/%s?s=t' % self.k,
#'collins': 'https://www.collinsdictionary.com/dictionary/english-thesaurus/%s' % self.k,
#'oxford': 'https://en.oxforddictionaries.com/thesaurus/%s' % self.k,
}
for site, url in urls.items():
print(site, url)
yield scrapy.Request(url, meta={'site': site}, callback=self.parse)
你的代码周围有很多小的ERORR。我冒昧地清理了一下,以遵循常见的python/scrapy习惯用法:)
你的代码周围有很多小的ERORR。我冒昧地清理了一下,以遵循常见的python/scrapy习惯用法:)
谢谢你的评论。1.如果我知道它总是只将
关键字
作为参数,那么是否有理由将**kwargs
添加到\uuuuuuuuuuu
中?2.它看起来像是parse
函数充当一个控制器,首先获取正确的解析器,然后将数据传递给它。这是合理的,但这是否是向周围发送响应
数据的唯一方法?3.为什么使用getattr(self,response.meta['site'])
允许调用适当的方法,而不在其前面加上self.
?关于#1。既然您从Spider继承了您想要将kwargs传递给父类的内容,这里没有什么特别值得传递的内容,但这是一种模式,可以证明这一点。2.您误解了scrapy的工作原理,默认情况下,spider会为start\u url
中的每个url启动一个请求链,并使用默认回调parse()
,其中response是其中一个start\u url的响应对象。3.你误解了自我是什么self
是对当前类对象的引用,因此在使用getattr
时,您不需要它,因为它为您提供了一个独立的引用。感谢您的审阅。1.如果我知道它总是只将关键字
作为参数,那么是否有理由将**kwargs
添加到\uuuuuuuuuuu
中?2.它看起来像是parse
函数充当一个控制器,首先获取正确的解析器,然后将数据传递给它。这是合理的,但这是否是向周围发送响应
数据的唯一方法?3.为什么使用getattr(self,response.meta['site'])
允许调用适当的方法,而不在其前面加上self.
?关于#1。既然您从Spider继承了您想要将kwargs传递给父类的内容,这里没有什么特别值得传递的内容,但这是一种模式,可以证明这一点。2.您误解了scrapy的工作原理,默认情况下,spider会为start\u url
中的每个url启动一个请求链,并使用默认回调parse()
,其中response是其中一个start\u url的响应对象。3.你误解了自我是什么self
是对当前类对象的引用,因此当使用getattr
时,您不需要它,因为它为您提供了一个独立的引用。
import scrapy
class WordSpider(scrapy.Spider):
def __init__(self, keyword = 'apprehensive'):
self.k = keyword
name = "words"
# Utilities
def make_csv(self, words):
csv = ''
for word in words:
csv += word + ','
return csv
def save_words(self, words, fp):
with ofpen(fp, 'w') as f:
f.seek(0)
f.truncate()
csv = self.make_csv(words)
f.write(csv)
# site specific parsers
def thesaurus(self, response):
filename = 'thesaurus.txt'
words = ''
print("in func self is defined as ", self)
ul = response.css('.relevancy-block ul')
for idx, u in enumerate(ul):
if idx == 1:
break;
words = u.css('.text::text').extract()
print("words is ", words)
self.save_words(filename, words)
def oxford(self):
filename = 'oxford.txt'
words = ''
def collins(self):
filename = 'collins.txt'
words = ''
# site/function mapping
dispatcher = {
'thesaurus': thesaurus,
'oxford': oxford,
'collins': collins,
}
def parse(self, response):
site = response.meta['site']
self.dispatcher[site](response)
def start_requests(self):
urls = {
'thesaurus': 'http://www.thesaurus.com/browse/%s?s=t' % self.k,
#'collins': 'https://www.collinsdictionary.com/dictionary/english-thesaurus/%s' % self.k,
#'oxford': 'https://en.oxforddictionaries.com/thesaurus/%s' % self.k,
}
for site, url in urls.items():
print(site, url)
yield scrapy.Request(url, meta={'site': site}, callback=self.parse)
import logging
import scrapy
# Utilities
# should probably use csv module here or `scrapy crawl -o` flag instead
def make_csv(words):
csv = ''
for word in words:
csv += word + ','
return csv
def save_words(words, fp):
with open(fp, 'w') as f:
f.seek(0)
f.truncate()
csv = make_csv(words)
f.write(csv)
class WordSpider(scrapy.Spider):
name = "words"
def __init__(self, keyword='apprehensive', **kwargs):
super(WordSpider, self).__init__(**kwargs)
self.k = keyword
def start_requests(self):
urls = {
'thesaurus': 'http://www.thesaurus.com/browse/%s?s=t' % self.k,
# 'collins': 'https://www.collinsdictionary.com/dictionary/english-thesaurus/%s' % self.k,
# 'oxford': 'https://en.oxforddictionaries.com/thesaurus/%s' % self.k,
}
for site, url in urls.items():
yield scrapy.Request(url, meta={'site': site}, callback=self.parse)
def parse(self, response):
parser = getattr(self, response.meta['site']) # retrieve method by name
logging.info(f'parsing using: {parser}')
parser(response)
# site specific parsers
def thesaurus(self, response):
filename = 'thesaurus.txt'
words = []
print("in func self is defined as ", self)
ul = response.css('.relevancy-block ul')
for idx, u in enumerate(ul):
if idx == 1:
break
words = u.css('.text::text').extract()
print("words is ", words)
save_words(filename, words)
def oxford(self):
filename = 'oxford.txt'
words = ''
def collins(self):
filename = 'collins.txt'
words = ''