Python Scrapy-选择表单中的项并提取显示的表_Python_Scrapy

Python Scrapy-选择表单中的项并提取显示的表

python scrapy

Python Scrapy-选择表单中的项并提取显示的表,python,scrapy,Python,Scrapy,我试图从一个网页中提取信息，它要求我从下拉列表中进行选择，并根据选择显示一个包含各种信息的表格。在我想要迭代并提取表信息的页面上，我有一个表单/列表的选择值列表网页：我被困在如何形成用于提取的XPath上。我研究的例子都有我可以访问的类，但这不是。此外，此网站要求我在根据选择显示表格之前从表格/列表中单击，我使用的是FormRequest.from_响应方法，但我不确定是否正确设置了此方法我要提取的信息是产品名称、版本型号、支持结束通知和寿命结束/支持结束信息。我想先将结果存储在数据框中，

我试图从一个网页中提取信息，它要求我从下拉列表中进行选择，并根据选择显示一个包含各种信息的表格。在我想要迭代并提取表信息的页面上，我有一个表单/列表的选择值列表

网页：

我被困在如何形成用于提取的XPath上。我研究的例子都有我可以访问的类，但这不是。此外，此网站要求我在根据选择显示表格之前从表格/列表中单击，我使用的是FormRequest.from_响应方法，但我不确定是否正确设置了此方法

我要提取的信息是产品名称、版本型号、支持结束通知和寿命结束/支持结束信息。我想先将结果存储在数据框中，因为我需要连接其他来源的信息，然后导出到excel/csv

列表中第一个产品的预期结果主机入侵预防

你找错地方了。在您选择列表中的任何内容后，上述网站不会发送任何FormRequest。相反，它从一个https://www.mcafee.com/enterprise/admin/support/eol.xml 只需显示一段数据：

import scrapy


class McAfee_Spider(scrapy.Spider):
    name = 'McAfee'
    allowed_domains = 'mcafee.com'
    start_urls = ['https://www.mcafee.com/enterprise/admin/support/eol.xml']

    def parse(self, response):
        for product in response.xpath('//product'):
            product_title = product.xpath('./@title').get()
            for element in product.xpath('./element'):
                element_title = element.xpath('./@title').get()
                element_version = element.xpath('./@version').get()
                element_eos = element.xpath('./@eos').get()
                element_eos_notification = element.xpath('./@eos_notification').get()
                element_comment = element.xpath('./comment/text()').get()


                yield {
                    'product_title': product_title,
                    'element_title': element_title,
                    'element_version': element_version,
                    'element_eos': element_eos,
                    'element_eos_notification': element_eos_notification,
                    'element_commment': element_comment,
                }

为什么每次循环迭代都要声明一个类？您的问题是xpath是如何工作的？或者如何选择正确的xpath来提取数据？@RomanPerekhrest这样循环就存在了，这样我就可以在网站上的表单中为选择传递产品名称。我希望能够从表格中提取我选择的每种产品类型的表格结果。@AmjasdMasdhash我正在尝试根据下拉列表中的选择提取网页上的表格list@RomanPerekhrest我明白你提到的问题了。我将循环移到内部以迭代解析函数，因为我希望列表中的每个项都有一个新的表单选择，然后提取在选择之后出现的表。我尝试运行上述代码，但发现一个错误。文件C:\anaconda3\Scripts\McAfee\McAfee\spider\Scrapy\u McAfee.py，第5行，正在执行\u count:null，NameError:name'null'未定义错误状态第5行，我们将spider命名为'McAfee'。不确定出现此错误的原因。我正在研究但没有成功。修复了问题，与

import pandas as pd
results = {'product':['McAfee Host Intrusion Prevention', 'McAfee Host Prevention for Linux'],
          'version':['8.0','8.0 Patch 6'],
          'eos_notif':['',''],
          'eol_date':['','']}
pd.DataFrame(results)

import scrapy


class McAfee_Spider(scrapy.Spider):
    name = 'McAfee'
    allowed_domains = 'mcafee.com'
    start_urls = ['https://www.mcafee.com/enterprise/admin/support/eol.xml']

    def parse(self, response):
        for product in response.xpath('//product'):
            product_title = product.xpath('./@title').get()
            for element in product.xpath('./element'):
                element_title = element.xpath('./@title').get()
                element_version = element.xpath('./@version').get()
                element_eos = element.xpath('./@eos').get()
                element_eos_notification = element.xpath('./@eos_notification').get()
                element_comment = element.xpath('./comment/text()').get()


                yield {
                    'product_title': product_title,
                    'element_title': element_title,
                    'element_version': element_version,
                    'element_eos': element_eos,
                    'element_eos_notification': element_eos_notification,
                    'element_commment': element_comment,
                }