Python scrapy未选择下拉选项_Python_Scrapy

Python scrapy未选择下拉选项

python scrapy

Python scrapy未选择下拉选项,python,scrapy,Python,Scrapy,我正试图用scrapy刮网站。该网站有三个下拉菜单，网站还使用\uu VIEWSTATE。我能够提取第一个下拉列表（“dcode”）的值，但无法提取第二个下拉列表（“blk”）的选项我无法理解为什么我的代码没有进入parse_blk函数我犯了一个错误 Traceback (most recent call last): File "c:\wpy-3670\python-3.6.7.amd64\lib\site-packages\scrapy\utils\defer.py", line 1

我正试图用scrapy刮网站。该网站有三个下拉菜单，网站还使用

\uu VIEWSTATE

。我能够提取第一个下拉列表（“dcode”）的值，但无法提取第二个下拉列表（“blk”）的选项

我无法理解为什么我的代码没有进入

parse_blk

函数

我犯了一个错误

Traceback (most recent call last):
  File "c:\wpy-3670\python-3.6.7.amd64\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback
    yield next(it)
  File "c:\wpy-3670\python-3.6.7.amd64\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable
    for r in iterable:
  File "c:\wpy-3670\python-3.6.7.amd64\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output
    for x in result:
  File "c:\wpy-3670\python-3.6.7.amd64\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable
    for r in iterable:
  File "c:\wpy-3670\python-3.6.7.amd64\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 339, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "c:\wpy-3670\python-3.6.7.amd64\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable
    for r in iterable:
  File "c:\wpy-3670\python-3.6.7.amd64\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "c:\wpy-3670\python-3.6.7.amd64\lib\site-packages\scrapy\core\spidermw.py", line 84, in evaluate_iterable
    for r in iterable:
  File "c:\wpy-3670\python-3.6.7.amd64\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "C:\WPy-3670\uplist\uplist\spiders\test1.py", line 65, in parse_blk
    dont_filter=True
  File "c:\wpy-3670\python-3.6.7.amd64\lib\site-packages\scrapy\http\request\form.py", line 49, in from_response
    form = _get_form(response, formname, formid, formnumber, formxpath)
  File "c:\wpy-3670\python-3.6.7.amd64\lib\site-packages\scrapy\http\request\form.py", line 84, in _get_form
    raise ValueError("No <form> element found in %s" % response)
ValueError: No <form> element found in <200 http://sec.up.nic.in/site/PRIVoterSearch2015.aspx>

回溯（最近一次呼叫最后一次）：
文件“c:\wpy-3670\python-3.6.7.amd64\lib\site packages\scrapy\utils\defer.py”，第102行，在iter\U errback中
下一个（it）
文件“c:\wpy-3670\python-3.6.7.amd64\lib\site packages\scrapy\core\spidermw.py”，第84行，在evaluate\u iterable中
对于iterable中的r：
文件“c:\wpy-3670\python-3.6.7.amd64\lib\site packages\scrapy\spidermiddleware\offsite.py”，第29行，进程中\u spider\u输出
对于结果中的x：
文件“c:\wpy-3670\python-3.6.7.amd64\lib\site packages\scrapy\core\spidermw.py”，第84行，在evaluate\u iterable中
对于iterable中的r：
文件“c:\wpy-3670\python-3.6.7.amd64\lib\site packages\scrapy\spidermiddleware\referer.py”，第339行，在
返回（_set_referer（r）表示结果中的r或（））
文件“c:\wpy-3670\python-3.6.7.amd64\lib\site packages\scrapy\core\spidermw.py”，第84行，在evaluate\u iterable中
对于iterable中的r：
文件“c:\wpy-3670\python-3.6.7.amd64\lib\site packages\scrapy\spidermiddleware\urlength.py”，第37行，在
返回（结果中的r表示r或（）如果_过滤器（r））
文件“c:\wpy-3670\python-3.6.7.amd64\lib\site packages\scrapy\core\spidermw.py”，第84行，在evaluate\u iterable中
对于iterable中的r：
文件“c:\wpy-3670\python-3.6.7.amd64\lib\site packages\scrapy\spidermiddleware\depth.py”，第58行，在
返回（结果中的r表示r或（）如果_过滤器（r））
文件“C:\WPy-3670\uplist\uplist\spiders\test1.py”，第65行，在parse_blk中
Don_filter=True
文件“c:\wpy-3670\python-3.6.7.amd64\lib\site packages\scrapy\http\request\form.py”，第49行，在from\U响应中
form=\u get\u form（响应、formname、formid、formnumber、formxpath）
文件“c:\wpy-3670\python-3.6.7.amd64\lib\site packages\scrapy\http\request\form.py”，第84行，格式为
raise VALUERROR（“在%s”%响应中未找到元素）
ValueError:在中找不到元素

到目前为止我的代码

import scrapy,re
from scrapy.item import Item
#from scrapy.shell import inspect_response

class blkname(Item):
    text = scrapy.Field()

class test1(scrapy.Spider):
    name = "test1"
    allowed_domains = ["sec.up.nic.in"]

    start_urls = ["http://sec.up.nic.in/site/PRIVoterSearch2015.aspx"]


    def parse(self, response):
        for dcode in response.css('select#dcode > option ::attr(value)').extract():
            #print( response.css('input#__VIEWSTATEGENERATOR::attr(value)').extract_first())
            #print(response.css('input#__VIEWSTATE::attr(value)').extract_first())
            #print(dcode)
            yield scrapy.FormRequest.from_response(
                response,
                headers={'user-agent': 'Mozilla/5.0'},
                formdata={
                        'dcode': dcode,
                        '__VIEWSTATE': response.css('input#__VIEWSTATE::attr(value)').extract_first(),
                        '__EVENTTARGET': 'dcode',
                        '__ASYNCPOST': 'true',
                        },
                callback=self.parse_blk,
                dont_filter=True

            ) 

    def parse_blk(self, response):
        for blk in response.css('select#blk > option ::attr(value)').extract():
            #block = response.css('select#blk > option ::attr(value)').extract()
            #print(block)
            #print(response.css('hiddenField|__VIEWSTATE::attr(value)').extract_first())
            #data = re.findall("__VIEWSTATE| =(.+?);|", response.body.decode("utf-8"), re.S)
            data = re.findall("(?<=__VIEWSTATE).*$", response.body.decode("utf-8"), re.S)
            #print(data)
            #print(block)
            viewstate = str(data).split('|')[1]
            #print (viewstate)
            yield scrapy.FormRequest.from_response(
                        response,
                        headers={'user-agent': 'Mozilla/5.0'},
                        formdata={
                                'dcode':response.css('select#dcode > option[selected] ::attr(value)').extract_first(),
                                'blk': blk,
                                '__VIEWSTATE': viewstate,
                                '__EVENTTARGET': 'blk',
                                '__ASYNCPOST': 'true',
                           },
                    callback=self.parse_gp,
                    dont_filter=True
                )
    def parse_gp(self, response):
       for gp in response.css('select#gp > option ::attr(value)').extract():
            print(gp)

导入刮屑，重新导入
从scrapy.item导入项目
#从scrapy.shell导入检查\u响应
类别blkname（项目）：
text=scrapy.Field（）
类test1（scrapy.Spider）：
name=“test1”
允许的_域=[“sec.up.nic.in”]
起始URL=[”http://sec.up.nic.in/site/PRIVoterSearch2015.aspx"]
def解析（自我，响应）：
对于响应.css中的dcode（'select#dcode>option:：attr（value'）。extract（）：
#打印（response.css（'input#uu VIEWSTATEGENERATOR:：attr（value）'））.extract_first（））
#打印（response.css（'input#uuu VIEWSTATE:：attr（value）'））.extract_first（））
#打印（dcode）
从_响应中生成scrapy.FormRequest.from(
答复,，
headers={'user-agent'：'Mozilla/5.0'}，
formdata={
“dcode”：dcode，
“_VIEWSTATE”：response.css（'input#_VIEWSTATE:：attr（value）'）。extract_first（），
“\uu事件目标”：“dcode”，
“\uu ASYNCPOST”：“true”，
},
callback=self.parse_blk，
Don_filter=True
) 
def parse_blk（自我，响应）：
对于blk in response.css（'select#blk>option:：attr（value'）。extract（）：
#block=response.css（'select#blk>option:：attr（value）'）。extract（）
#打印（块）
#打印（response.css（'hiddenField | uu VIEWSTATE:：attr（value）'））.extract_first（））
#data=re.findall（“u视图状态|=（.+？）；|”），response.body.decode（“utf-8”），re.S）
data=re.findall（（？您的parse（）应该是
def parse(self, response):
    for dcode in response.css('select#dcode > option ::attr(value)').extract():
        yield scrapy.FormRequest.from_response(
            response,
            headers={'user-agent': 'Mozilla/5.0'},
            formdata={
                    'dcode': dcode,
                    '__VIEWSTATE': response.css('input#__VIEWSTATE::attr(value)').extract_first(),
                    '__EVENTTARGET': 'dcode',
                    '__ASYNCPOST': 'true',
            },
            callback=self.parse_blk,
            dont_filter=True
        )

您的parse（）应该是
def parse(self, response):
    for dcode in response.css('select#dcode > option ::attr(value)').extract():
        yield scrapy.FormRequest.from_response(
            response,
            headers={'user-agent': 'Mozilla/5.0'},
            formdata={
                    'dcode': dcode,
                    '__VIEWSTATE': response.css('input#__VIEWSTATE::attr(value)').extract_first(),
                    '__EVENTTARGET': 'dcode',
                    '__ASYNCPOST': 'true',
            },
            callback=self.parse_blk,
            dont_filter=True
        )

我修改了表单数据并包含了所有参数，但它仍然没有选择块。如果我尝试打印blk id，它将不返回任何内容。正如我所说，我的程序不会进入parse_blk函数…如果您尝试在parse_blk函数中为循环执行上面的print语句，它将被执行。好的，我将尝试，但print命令应该会执行同样，print inside loop.response是TextResponse，而不是HtmlResponsechange其他方法。我修改了表单数据并包含了所有参数，但它仍然没有选择块。如果我试图打印blk id，它将不返回任何内容。正如我所说，我的程序不会进入parse_blk函数…如果您尝试上面的print语句对于parse_blk函数中的循环，它正在被执行。好的，我会尝试，但print命令也应该在循环中打印。response是TextResponse，而不是HtmlResponsechange相应地更改其他方法