Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/352.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python scrapy未选择下拉选项_Python_Scrapy - Fatal编程技术网

Python scrapy未选择下拉选项

Python scrapy未选择下拉选项,python,scrapy,Python,Scrapy,我正试图用scrapy刮网站。该网站有三个下拉菜单,网站还使用\uu VIEWSTATE。我能够提取第一个下拉列表(“dcode”)的值,但无法提取第二个下拉列表(“blk”)的选项 我无法理解为什么我的代码没有进入parse_blk函数 我犯了一个错误 Traceback (most recent call last): File "c:\wpy-3670\python-3.6.7.amd64\lib\site-packages\scrapy\utils\defer.py", line 1

我正试图用scrapy刮网站。该网站有三个下拉菜单,网站还使用
\uu VIEWSTATE
。我能够提取第一个下拉列表(“dcode”)的值,但无法提取第二个下拉列表(“blk”)的选项

我无法理解为什么我的代码没有进入
parse_blk
函数

我犯了一个错误

Traceback (most recent call last):
  File "c:\wpy-3670\python-3.6.7.amd64\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback
    yield next(it)
  File "c:\wpy-3670\python-3.6.7.amd64\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable
    for r in iterable:
  File "c:\wpy-3670\python-3.6.7.amd64\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output
    for x in result:
  File "c:\wpy-3670\python-3.6.7.amd64\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable
    for r in iterable:
  File "c:\wpy-3670\python-3.6.7.amd64\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 339, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "c:\wpy-3670\python-3.6.7.amd64\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable
    for r in iterable:
  File "c:\wpy-3670\python-3.6.7.amd64\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "c:\wpy-3670\python-3.6.7.amd64\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable
    for r in iterable:
  File "c:\wpy-3670\python-3.6.7.amd64\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "C:\WPy-3670\uplist\uplist\spiders\test1.py", line 65, in parse_blk
    dont_filter=True
  File "c:\wpy-3670\python-3.6.7.amd64\lib\site-packages\scrapy\http\request\form.py", line 49, in from_response
    form = _get_form(response, formname, formid, formnumber, formxpath)
  File "c:\wpy-3670\python-3.6.7.amd64\lib\site-packages\scrapy\http\request\form.py", line 84, in _get_form
    raise ValueError("No <form> element found in %s" % response)
ValueError: No <form> element found in <200 http://sec.up.nic.in/site/PRIVoterSearch2015.aspx>
回溯(最近一次呼叫最后一次):
文件“c:\wpy-3670\python-3.6.7.amd64\lib\site packages\scrapy\utils\defer.py”,第102行,在iter\U errback中
下一个(it)
文件“c:\wpy-3670\python-3.6.7.amd64\lib\site packages\scrapy\core\spidermw.py”,第84行,在evaluate\u iterable中
对于iterable中的r:
文件“c:\wpy-3670\python-3.6.7.amd64\lib\site packages\scrapy\spidermiddleware\offsite.py”,第29行,进程中\u spider\u输出
对于结果中的x:
文件“c:\wpy-3670\python-3.6.7.amd64\lib\site packages\scrapy\core\spidermw.py”,第84行,在evaluate\u iterable中
对于iterable中的r:
文件“c:\wpy-3670\python-3.6.7.amd64\lib\site packages\scrapy\spidermiddleware\referer.py”,第339行,在
返回(_set_referer(r)表示结果中的r或())
文件“c:\wpy-3670\python-3.6.7.amd64\lib\site packages\scrapy\core\spidermw.py”,第84行,在evaluate\u iterable中
对于iterable中的r:
文件“c:\wpy-3670\python-3.6.7.amd64\lib\site packages\scrapy\spidermiddleware\urlength.py”,第37行,在
返回(结果中的r表示r或()如果_过滤器(r))
文件“c:\wpy-3670\python-3.6.7.amd64\lib\site packages\scrapy\core\spidermw.py”,第84行,在evaluate\u iterable中
对于iterable中的r:
文件“c:\wpy-3670\python-3.6.7.amd64\lib\site packages\scrapy\spidermiddleware\depth.py”,第58行,在
返回(结果中的r表示r或()如果_过滤器(r))
文件“C:\WPy-3670\uplist\uplist\spiders\test1.py”,第65行,在parse_blk中
Don_filter=True
文件“c:\wpy-3670\python-3.6.7.amd64\lib\site packages\scrapy\http\request\form.py”,第49行,在from\U响应中
form=\u get\u form(响应、formname、formid、formnumber、formxpath)
文件“c:\wpy-3670\python-3.6.7.amd64\lib\site packages\scrapy\http\request\form.py”,第84行,格式为
raise VALUERROR(“在%s”%响应中未找到元素)
ValueError:在中找不到元素
到目前为止我的代码

import scrapy,re
from scrapy.item import Item
#from scrapy.shell import inspect_response

class blkname(Item):
    text = scrapy.Field()

class test1(scrapy.Spider):
    name = "test1"
    allowed_domains = ["sec.up.nic.in"]

    start_urls = ["http://sec.up.nic.in/site/PRIVoterSearch2015.aspx"]


    def parse(self, response):
        for dcode in response.css('select#dcode > option ::attr(value)').extract():
            #print( response.css('input#__VIEWSTATEGENERATOR::attr(value)').extract_first())
            #print(response.css('input#__VIEWSTATE::attr(value)').extract_first())
            #print(dcode)
            yield scrapy.FormRequest.from_response(
                response,
                headers={'user-agent': 'Mozilla/5.0'},
                formdata={
                        'dcode': dcode,
                        '__VIEWSTATE': response.css('input#__VIEWSTATE::attr(value)').extract_first(),
                        '__EVENTTARGET': 'dcode',
                        '__ASYNCPOST': 'true',
                        },
                callback=self.parse_blk,
                dont_filter=True

            ) 

    def parse_blk(self, response):
        for blk in response.css('select#blk > option ::attr(value)').extract():
            #block = response.css('select#blk > option ::attr(value)').extract()
            #print(block)
            #print(response.css('hiddenField|__VIEWSTATE::attr(value)').extract_first())
            #data = re.findall("__VIEWSTATE| =(.+?);|", response.body.decode("utf-8"), re.S)
            data = re.findall("(?<=__VIEWSTATE).*$", response.body.decode("utf-8"), re.S)
            #print(data)
            #print(block)
            viewstate = str(data).split('|')[1]
            #print (viewstate)
            yield scrapy.FormRequest.from_response(
                        response,
                        headers={'user-agent': 'Mozilla/5.0'},
                        formdata={
                                'dcode':response.css('select#dcode > option[selected] ::attr(value)').extract_first(),
                                'blk': blk,
                                '__VIEWSTATE': viewstate,
                                '__EVENTTARGET': 'blk',
                                '__ASYNCPOST': 'true',
                           },
                    callback=self.parse_gp,
                    dont_filter=True
                )
    def parse_gp(self, response):
       for gp in response.css('select#gp > option ::attr(value)').extract():
            print(gp)
导入刮屑,重新导入
从scrapy.item导入项目
#从scrapy.shell导入检查\u响应
类别blkname(项目):
text=scrapy.Field()
类test1(scrapy.Spider):
name=“test1”
允许的_域=[“sec.up.nic.in”]
起始URL=[”http://sec.up.nic.in/site/PRIVoterSearch2015.aspx"]
def解析(自我,响应):
对于响应.css中的dcode('select#dcode>option::attr(value')。extract():
#打印(response.css('input#uu VIEWSTATEGENERATOR::attr(value)')).extract_first())
#打印(response.css('input#uuu VIEWSTATE::attr(value)')).extract_first())
#打印(dcode)
从_响应中生成scrapy.FormRequest.from(
答复,,
headers={'user-agent':'Mozilla/5.0'},
formdata={
“dcode”:dcode,
“_VIEWSTATE”:response.css('input#_VIEWSTATE::attr(value)')。extract_first(),
“\uu事件目标”:“dcode”,
“\uu ASYNCPOST”:“true”,
},
callback=self.parse_blk,
Don_filter=True
) 
def parse_blk(自我,响应):
对于blk in response.css('select#blk>option::attr(value')。extract():
#block=response.css('select#blk>option::attr(value)')。extract()
#打印(块)
#打印(response.css('hiddenField | uu VIEWSTATE::attr(value)')).extract_first())
#data=re.findall(“u视图状态|=(.+?);|”),response.body.decode(“utf-8”),re.S)
data=re.findall((?您的parse()应该是

def parse(self, response):
    for dcode in response.css('select#dcode > option ::attr(value)').extract():
        yield scrapy.FormRequest.from_response(
            response,
            headers={'user-agent': 'Mozilla/5.0'},
            formdata={
                    'dcode': dcode,
                    '__VIEWSTATE': response.css('input#__VIEWSTATE::attr(value)').extract_first(),
                    '__EVENTTARGET': 'dcode',
                    '__ASYNCPOST': 'true',
            },
            callback=self.parse_blk,
            dont_filter=True
        )
您的parse()应该是

def parse(self, response):
    for dcode in response.css('select#dcode > option ::attr(value)').extract():
        yield scrapy.FormRequest.from_response(
            response,
            headers={'user-agent': 'Mozilla/5.0'},
            formdata={
                    'dcode': dcode,
                    '__VIEWSTATE': response.css('input#__VIEWSTATE::attr(value)').extract_first(),
                    '__EVENTTARGET': 'dcode',
                    '__ASYNCPOST': 'true',
            },
            callback=self.parse_blk,
            dont_filter=True
        )

我修改了表单数据并包含了所有参数,但它仍然没有选择块。如果我尝试打印blk id,它将不返回任何内容。正如我所说,我的程序不会进入parse_blk函数…如果您尝试在parse_blk函数中为循环执行上面的print语句,它将被执行。好的,我将尝试,但print命令应该会执行同样,print inside loop.response是TextResponse,而不是HtmlResponsechange其他方法。我修改了表单数据并包含了所有参数,但它仍然没有选择块。如果我试图打印blk id,它将不返回任何内容。正如我所说,我的程序不会进入parse_blk函数…如果您尝试上面的print语句对于parse_blk函数中的循环,它正在被执行。好的,我会尝试,但print命令也应该在循环中打印。response是TextResponse,而不是HtmlResponsechange相应地更改其他方法