Python 3.x 如何从命令提示符将变量传递给正在scrapy内部执行的lua脚本?

Python 3.x 如何从命令提示符将变量传递给正在scrapy内部执行的lua脚本?,python-3.x,lua,scrapy,scrapy-splash,Python 3.x,Lua,Scrapy,Scrapy Splash,我试图在scrapy中传递一个变量作为用户定义参数,该变量将用于lua脚本的for循环,我的代码如下: import scrapy from scrapy_splash import SplashRequest from scrapy.selector import Selector class ProductsSpider(scrapy.Spider): name = 'allproducts' script = ''' function main(spl

我试图在scrapy中传递一个变量作为用户定义参数,该变量将用于lua脚本的for循环,我的代码如下:

import scrapy
from scrapy_splash import SplashRequest
from scrapy.selector import Selector


class ProductsSpider(scrapy.Spider):
    name = 'allproducts'

    script = '''
        function main(splash, args)
           assert(splash:go(args.url))
           assert(splash:wait(0.5))
           result = {}
           local upto = tonumber(splash.number)
           for i=1,upto,1
           do
             #something
           end
           return output
        
        end
    '''

    def start_requests(self):
        url='https://medicalsupplies.co.uk'
        yield SplashRequest(url=url, callback=self.parse, endpoint='render.html', args={'wait':0.5})
        yield SplashRequest(url=url, callback=self.parse_other_pages, endpoint='execute',
            args={'wait':0.5, 'lua_source':self.script, 'number':int(self.number)}, dont_filter=True)

    def parse(self, response):
        for tr in response.xpath("//table[@id='date']/tbody/tr"):
            yield{
                    'output' : #something
            }

    def parse_other_pages(self,response):
        for page in response.data:
            sel=Selector(text=page)
            for tr in sel.xpath("//table[@id='date']/tbody/tr"):
                yield{
                     'output' : #something
                   }
WARNING: Bad request to Splash: {'error': 400, 'type': 'ScriptError', 'description': 'Error happened while executing Lua script', 'info': {'source': '[string "..."]', 'line_number': 7, 'error': "attempt to index global 'self' (a nil value)", 'type': 'LUA_ERROR', 'message': 'Lua error: [string "..."]:7: attempt to index global \'self\' (a nil value)'}}
 WARNING: Bad request to Splash: {'error': 400, 'type': 'ScriptError', 'description': 'Error happened while executing Lua script', 'info': {'source': '[string "..."]', 'line_number': 9, 'error': "'for' limit must be a number", 'type': 'LUA_ERROR', 'message WARNING: Bad request to Splash: {'error': 400, 'type': 'ScriptError', 'description': 'Error happened while executing Lua script', 'info': {'source': '[string "..."]', 'line_number': 9, 'error': "'for' limit must be a number", 'type': 'LUA_ERROR', 'message': 'Lua error: [string "..."]:9: \'for\' limit must be a number'}}
因此,我面临的问题是,当我使用一个整数运行lua脚本的for循环时,即
fori=1,5,1
脚本运行得很好,但是当我尝试使用
scrapy crawl allproducts-a number=5-o test.json从命令提示符向脚本提供输入时,同时使用
fori=1,{self.number},1
对于脚本中的for循环,我的代码会抛出一个错误,我甚至无法在此字符串上使用f字符串,有没有办法在不破坏代码的情况下将变量传递给文本字符串(这里称为脚本)?我知道我没有使用正确的语法,但我还没有找到任何相同的资源,感谢任何建议

铲运机发出的实际警告如下:

import scrapy
from scrapy_splash import SplashRequest
from scrapy.selector import Selector


class ProductsSpider(scrapy.Spider):
    name = 'allproducts'

    script = '''
        function main(splash, args)
           assert(splash:go(args.url))
           assert(splash:wait(0.5))
           result = {}
           local upto = tonumber(splash.number)
           for i=1,upto,1
           do
             #something
           end
           return output
        
        end
    '''

    def start_requests(self):
        url='https://medicalsupplies.co.uk'
        yield SplashRequest(url=url, callback=self.parse, endpoint='render.html', args={'wait':0.5})
        yield SplashRequest(url=url, callback=self.parse_other_pages, endpoint='execute',
            args={'wait':0.5, 'lua_source':self.script, 'number':int(self.number)}, dont_filter=True)

    def parse(self, response):
        for tr in response.xpath("//table[@id='date']/tbody/tr"):
            yield{
                    'output' : #something
            }

    def parse_other_pages(self,response):
        for page in response.data:
            sel=Selector(text=page)
            for tr in sel.xpath("//table[@id='date']/tbody/tr"):
                yield{
                     'output' : #something
                   }
WARNING: Bad request to Splash: {'error': 400, 'type': 'ScriptError', 'description': 'Error happened while executing Lua script', 'info': {'source': '[string "..."]', 'line_number': 7, 'error': "attempt to index global 'self' (a nil value)", 'type': 'LUA_ERROR', 'message': 'Lua error: [string "..."]:7: attempt to index global \'self\' (a nil value)'}}
 WARNING: Bad request to Splash: {'error': 400, 'type': 'ScriptError', 'description': 'Error happened while executing Lua script', 'info': {'source': '[string "..."]', 'line_number': 9, 'error': "'for' limit must be a number", 'type': 'LUA_ERROR', 'message WARNING: Bad request to Splash: {'error': 400, 'type': 'ScriptError', 'description': 'Error happened while executing Lua script', 'info': {'source': '[string "..."]', 'line_number': 9, 'error': "'for' limit must be a number", 'type': 'LUA_ERROR', 'message': 'Lua error: [string "..."]:9: \'for\' limit must be a number'}}
编辑1:根据@Alexander的建议,修改lua脚本并将变量作为整数参数传递给SplashRequest,还使用local(local upto=tonumber(splash.number))在lua脚本中实例化变量

现警告如下:

import scrapy
from scrapy_splash import SplashRequest
from scrapy.selector import Selector


class ProductsSpider(scrapy.Spider):
    name = 'allproducts'

    script = '''
        function main(splash, args)
           assert(splash:go(args.url))
           assert(splash:wait(0.5))
           result = {}
           local upto = tonumber(splash.number)
           for i=1,upto,1
           do
             #something
           end
           return output
        
        end
    '''

    def start_requests(self):
        url='https://medicalsupplies.co.uk'
        yield SplashRequest(url=url, callback=self.parse, endpoint='render.html', args={'wait':0.5})
        yield SplashRequest(url=url, callback=self.parse_other_pages, endpoint='execute',
            args={'wait':0.5, 'lua_source':self.script, 'number':int(self.number)}, dont_filter=True)

    def parse(self, response):
        for tr in response.xpath("//table[@id='date']/tbody/tr"):
            yield{
                    'output' : #something
            }

    def parse_other_pages(self,response):
        for page in response.data:
            sel=Selector(text=page)
            for tr in sel.xpath("//table[@id='date']/tbody/tr"):
                yield{
                     'output' : #something
                   }
WARNING: Bad request to Splash: {'error': 400, 'type': 'ScriptError', 'description': 'Error happened while executing Lua script', 'info': {'source': '[string "..."]', 'line_number': 7, 'error': "attempt to index global 'self' (a nil value)", 'type': 'LUA_ERROR', 'message': 'Lua error: [string "..."]:7: attempt to index global \'self\' (a nil value)'}}
 WARNING: Bad request to Splash: {'error': 400, 'type': 'ScriptError', 'description': 'Error happened while executing Lua script', 'info': {'source': '[string "..."]', 'line_number': 9, 'error': "'for' limit must be a number", 'type': 'LUA_ERROR', 'message WARNING: Bad request to Splash: {'error': 400, 'type': 'ScriptError', 'description': 'Error happened while executing Lua script', 'info': {'source': '[string "..."]', 'line_number': 9, 'error': "'for' limit must be a number", 'type': 'LUA_ERROR', 'message': 'Lua error: [string "..."]:9: \'for\' limit must be a number'}}
函数main(splash,args)
没有
self
参数。然而第5行引用了它:
对于i=1,{self.number},1
。并且函数不是用
声明的方法(函数类型的Lua表的字段),其中
self
是该表

你的意思是“飞溅”


我认为,您应该在Python代码(
start\u requests
)中添加
'number':self.number
args
),然后从Lua脚本中将其称为
tonumber(args.number)

我指的是类ProductsSpider的self,因为脚本在同一个类中,我相信脚本会识别self,不?怪胎,不,我不这么认为。将Python对象映射到Lua表即使不是不可能,也是很困难的。请参阅更新的answer.script=''函数main(splash,args)assert(splash:go(args.url))assert(splash:wait(0.5))result={}upto=splash.number,对于i=1,upto,1 do#something end返回output end''我像你提到的那样尝试过,但似乎仍然没有帮助,我得到了相同的结果warning@Freak是的,我的意思是这样的。运气不好。似乎
splash.number
为零。您还可以尝试
args.number
splash.args.number
。如果做不到这一点,您可以将该数字连接到Python
脚本
变量中。