Python 刮皮不'；无法识别xpath_Python_Xpath_Web Scraping_Scrapy

Python 刮皮不'；无法识别xpath

python xpath web-scraping scrapy

Python 刮皮不'；无法识别xpath,python,xpath,web-scraping,scrapy,Python,Xpath,Web Scraping,Scrapy,我尝试从该页面获取数据，但从“规格”按钮获取。我尝试使用此代码获取产品的名称，但它不起作用 class SpecSpider(scrapy.Spider): name='specName' start_urls = ['https://octopart.com/electronic-parts/integrated-circuits-ics'] custom_settings = { 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFil

我尝试从该页面获取数据，但从“规格”按钮获取。我尝试使用此代码获取产品的名称，但它不起作用

class SpecSpider(scrapy.Spider):
name='specName'

start_urls = ['https://octopart.com/electronic-parts/integrated-circuits-ics']
custom_settings = {
    'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter',
}

def parse(self,response):

    return FormRequest.from_response(response, formxpath="//form[@class='btn-group']", clickdata={"value":"serp-grid"}, callback = self.scrape_pages)

def scrape_pages(self, response):
    #open_in_browser(response)
    items = SpecItem() 

    for product in response.xpath("//div[class='inner-body']/div[class='serp-wrap-all']/table[class='table-valign-middle matrix-table']"):

        name = product.xpath(".//tr/td[class='matrix-col-part']/a[class='nowrap']/text()").extract()            
        items['ProductName']=''.join(name).strip()

        price = product.xpath("//tr/td['4']/div[class='small']/text()").extract()
        items['Price'] = ''.join(price).strip()



        yield items

此xpath

response.xpath（//div[class='internal-body']/div[class='serp-wrap-all']/table[class='table-valign-middle matrix table']）不起作用
class SpecSpider(scrapy.Spider):
name='specName'

start_urls = ['https://octopart.com/electronic-parts/integrated-circuits-ics']
custom_settings = {
    'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter',
}

def parse(self,response):

    return FormRequest.from_response(response, formxpath="//form[@class='btn-group']", clickdata={"value":"serp-grid"}, callback = self.scrape_pages)

def scrape_pages(self, response):
    #open_in_browser(response)
    items = SpecItem() 

    for product in response.xpath("//div[class='inner-body']/div[class='serp-wrap-all']/table[class='table-valign-middle matrix-table']"):

        name = product.xpath(".//tr/td[class='matrix-col-part']/a[class='nowrap']/text()").extract()            
        items['ProductName']=''.join(name).strip()

        price = product.xpath("//tr/td['4']/div[class='small']/text()").extract()
        items['Price'] = ''.join(price).strip()



        yield items

任何建议
如果您只想要顶级产品名称，请使用的css选择器
.serp-card-pdp-link

并提取文本
中间价来自css选择器
.avg-price-faux-btn

您可以使用.css（选择器）
将css与scrapy一起应用。您使用的XPATH语法错误
//div[class='inner-body']/div[class='serp-wrap-all']/table[class='table-valign-middle]
矩阵表']
正确的格式是在“类”之前添加“@”
//div[@class='inner-body']/div[@class='serp-wrap-all']/
在上面的链接中没有“矩阵表”
尝试使用以下方法：
//div[@class='inner-body']/div[@class='serp-wrap-all']//*[包含（@class，'matrix-table'）]
谢谢，这就是问题所在。