Python 如何在scrapy中访问爬行器中的命令行参数？_Python_Scrapy

Python 如何在scrapy中访问爬行器中的命令行参数？

python scrapy

Python 如何在scrapy中访问爬行器中的命令行参数？,python,scrapy,Python,Scrapy,我想在scrapy crawl…命令行中传递一个参数，以便在扩展中的规则定义中使用，如下所示 name = 'example.com' allowed_domains = ['example.com'] start_urls = ['http://www.example.com'] rules = ( # Extract links matching 'category.php' (but not matching 'subsection.php') # and follow

我想在

scrapy crawl…

命令行中传递一个参数，以便在扩展中的规则定义中使用，如下所示

name = 'example.com'
allowed_domains = ['example.com']
start_urls = ['http://www.example.com']

rules = (
    # Extract links matching 'category.php' (but not matching 'subsection.php')
    # and follow links from them (since no callback means follow=True by default).
    Rule(SgmlLinkExtractor(allow=('category\.php', ), deny=('subsection\.php', ))),

    # Extract links matching 'item.php' and parse them with the spider's method parse_item
    Rule(SgmlLinkExtractor(allow=('item\.php', )), callback='parse_item'),
)

我希望在命令行参数中指定中的allow属性。

我通过谷歌搜索发现，我可以在spider的

\uuuuu init\uuuu

方法中获取参数值，但是如何在命令行中获取要在规则定义中使用的参数呢？

您可以在

\uuuu init\uuu

方法中构建spider的

规则属性，类似于：
class MySpider(CrawlSpider):

    name = 'example.com'
    allowed_domains = ['example.com']
    start_urls = ['http://www.example.com']

    def __init__(self, allow=None, *args, **kwargs):
        self.rules = (
            Rule(SgmlLinkExtractor(allow=(self.allow,),)),
        )
        super(MySpider, self).__init__(*args, **kwargs)

然后在命令行上传递allow
属性，如下所示：
scrapy crawl example.com -a allow="item\.php"