Python 零碎的传递响应,缺少一个位置参数

Python 零碎的传递响应,缺少一个位置参数,python,scrapy,arguments,web-crawler,Python,Scrapy,Arguments,Web Crawler,python的新意,来自php。我想刮一些网站使用刮,并已通过教程和简单的脚本以及。现在写真正的交易时出现了以下错误: 回溯(最近一次呼叫最后一次): 文件“C:\Users\Naltroc\Miniconda3\lib\site packages\twisted\internet\defer.py”, 第653行,正在运行回调 current.result=回调(current.result,*args,**kw) 文件“C:\Users\Naltroc\Documents\Python Sc

python的新意,来自php。我想刮一些网站使用刮,并已通过教程和简单的脚本以及。现在写真正的交易时出现了以下错误:

回溯(最近一次呼叫最后一次):

文件“C:\Users\Naltroc\Miniconda3\lib\site packages\twisted\internet\defer.py”, 第653行,正在运行回调 current.result=回调(current.result,*args,**kw)

文件“C:\Users\Naltroc\Documents\Python Scripts\tutorial\tutorial\spider\quotes\u spider.py”,第52行,在parse中 自我分派网站

TypeError:thesaurus()缺少1个必需的位置参数:“response”

当调用shell命令
Scrapy crawl words
时,Scrapy会自动实例化一个对象

据我所知,
self
是任何类方法的第一个参数。调用类方法时,不要将
self
作为参数传递,而是将其发送给变量

首先,这被称为:

# Scrapy automatically provides `response` to `parse()` when coming from `start_requests()`
def parse(self, response):
        site = response.meta['site']
        #same as "site = thesaurus"
        self.dispatcher[site](response)
        #same as "self.dispatcher['thesaurus'](response)
然后

在php中,这应该与调用
$this->叙词表($response)
相同
parse
显然是作为变量发送
response
,但python说它丢失了它去了哪里?

完整代码如下:

import scrapy

class WordSpider(scrapy.Spider):
    def __init__(self, keyword = 'apprehensive'):
        self.k = keyword
    name = "words"
    # Utilities
    def make_csv(self, words):
        csv = ''
        for word in words:
            csv += word + ','
        return csv

    def save_words(self, words, fp):
        with ofpen(fp, 'w') as f:
            f.seek(0)
            f.truncate()
            csv = self.make_csv(words)
            f.write(csv)

    # site specific parsers
    def thesaurus(self, response):
        filename = 'thesaurus.txt'
        words = ''
        print("in func self is defined as ", self)
        ul = response.css('.relevancy-block ul')
        for idx, u in enumerate(ul):
            if idx == 1:
                break;
            words = u.css('.text::text').extract()
            print("words is ", words)

        self.save_words(filename, words)

    def oxford(self):
        filename = 'oxford.txt'
        words = ''

    def collins(self):
        filename = 'collins.txt'
        words = ''

    # site/function mapping
    dispatcher = {
        'thesaurus': thesaurus,
        'oxford': oxford,
        'collins': collins,
    }

    def parse(self, response):
        site = response.meta['site']
        self.dispatcher[site](response)

    def start_requests(self):
        urls = {
            'thesaurus': 'http://www.thesaurus.com/browse/%s?s=t' % self.k,
            #'collins': 'https://www.collinsdictionary.com/dictionary/english-thesaurus/%s' % self.k,
            #'oxford': 'https://en.oxforddictionaries.com/thesaurus/%s' % self.k,
        }

        for site, url in urls.items():
            print(site, url)
            yield scrapy.Request(url, meta={'site': site}, callback=self.parse)

你的代码周围有很多小的ERORR。我冒昧地清理了一下,以遵循常见的python/scrapy习惯用法:)


你的代码周围有很多小的ERORR。我冒昧地清理了一下,以遵循常见的python/scrapy习惯用法:)


谢谢你的评论。1.如果我知道它总是只将
关键字
作为参数,那么是否有理由将
**kwargs
添加到
\uuuuuuuuuuu
中?2.它看起来像是
parse
函数充当一个控制器,首先获取正确的解析器,然后将数据传递给它。这是合理的,但这是否是向周围发送
响应
数据的唯一方法?3.为什么使用
getattr(self,response.meta['site'])
允许调用适当的方法,而不在其前面加上
self.
?关于#1。既然您从Spider继承了您想要将kwargs传递给父类的内容,这里没有什么特别值得传递的内容,但这是一种模式,可以证明这一点。2.您误解了scrapy的工作原理,默认情况下,spider会为
start\u url
中的每个url启动一个请求链,并使用默认回调
parse()
,其中response是其中一个start\u url的响应对象。3.你误解了自我是什么
self
是对当前类对象的引用,因此在使用
getattr
时,您不需要它,因为它为您提供了一个独立的引用。感谢您的审阅。1.如果我知道它总是只将
关键字
作为参数,那么是否有理由将
**kwargs
添加到
\uuuuuuuuuuu
中?2.它看起来像是
parse
函数充当一个控制器,首先获取正确的解析器,然后将数据传递给它。这是合理的,但这是否是向周围发送
响应
数据的唯一方法?3.为什么使用
getattr(self,response.meta['site'])
允许调用适当的方法,而不在其前面加上
self.
?关于#1。既然您从Spider继承了您想要将kwargs传递给父类的内容,这里没有什么特别值得传递的内容,但这是一种模式,可以证明这一点。2.您误解了scrapy的工作原理,默认情况下,spider会为
start\u url
中的每个url启动一个请求链,并使用默认回调
parse()
,其中response是其中一个start\u url的响应对象。3.你误解了自我是什么
self
是对当前类对象的引用,因此当使用
getattr
时,您不需要它,因为它为您提供了一个独立的引用。
import scrapy

class WordSpider(scrapy.Spider):
    def __init__(self, keyword = 'apprehensive'):
        self.k = keyword
    name = "words"
    # Utilities
    def make_csv(self, words):
        csv = ''
        for word in words:
            csv += word + ','
        return csv

    def save_words(self, words, fp):
        with ofpen(fp, 'w') as f:
            f.seek(0)
            f.truncate()
            csv = self.make_csv(words)
            f.write(csv)

    # site specific parsers
    def thesaurus(self, response):
        filename = 'thesaurus.txt'
        words = ''
        print("in func self is defined as ", self)
        ul = response.css('.relevancy-block ul')
        for idx, u in enumerate(ul):
            if idx == 1:
                break;
            words = u.css('.text::text').extract()
            print("words is ", words)

        self.save_words(filename, words)

    def oxford(self):
        filename = 'oxford.txt'
        words = ''

    def collins(self):
        filename = 'collins.txt'
        words = ''

    # site/function mapping
    dispatcher = {
        'thesaurus': thesaurus,
        'oxford': oxford,
        'collins': collins,
    }

    def parse(self, response):
        site = response.meta['site']
        self.dispatcher[site](response)

    def start_requests(self):
        urls = {
            'thesaurus': 'http://www.thesaurus.com/browse/%s?s=t' % self.k,
            #'collins': 'https://www.collinsdictionary.com/dictionary/english-thesaurus/%s' % self.k,
            #'oxford': 'https://en.oxforddictionaries.com/thesaurus/%s' % self.k,
        }

        for site, url in urls.items():
            print(site, url)
            yield scrapy.Request(url, meta={'site': site}, callback=self.parse)
import logging
import scrapy


# Utilities
# should probably use csv module here or `scrapy crawl -o` flag instead
def make_csv(words):
    csv = ''
    for word in words:
        csv += word + ','
    return csv


def save_words(words, fp):
    with open(fp, 'w') as f:
        f.seek(0)
        f.truncate()
        csv = make_csv(words)
        f.write(csv)


class WordSpider(scrapy.Spider):
    name = "words"

    def __init__(self, keyword='apprehensive', **kwargs):
        super(WordSpider, self).__init__(**kwargs)
        self.k = keyword

    def start_requests(self):
        urls = {
            'thesaurus': 'http://www.thesaurus.com/browse/%s?s=t' % self.k,
            # 'collins': 'https://www.collinsdictionary.com/dictionary/english-thesaurus/%s' % self.k,
            # 'oxford': 'https://en.oxforddictionaries.com/thesaurus/%s' % self.k,
        }

        for site, url in urls.items():
            yield scrapy.Request(url, meta={'site': site}, callback=self.parse)

    def parse(self, response):
        parser = getattr(self, response.meta['site'])  # retrieve method by name
        logging.info(f'parsing using: {parser}')
        parser(response)

    # site specific parsers
    def thesaurus(self, response):
        filename = 'thesaurus.txt'
        words = []
        print("in func self is defined as ", self)
        ul = response.css('.relevancy-block ul')
        for idx, u in enumerate(ul):
            if idx == 1:
                break
            words = u.css('.text::text').extract()
            print("words is ", words)
        save_words(filename, words)

    def oxford(self):
        filename = 'oxford.txt'
        words = ''

    def collins(self):
        filename = 'collins.txt'
        words = ''