与python中的正则表达式匹配的子字符串_Python_Regex_Web Scraping_Scrapy

与python中的正则表达式匹配的子字符串

python regex web-scraping scrapy

与python中的正则表达式匹配的子字符串,python,regex,web-scraping,scrapy,Python,Regex,Web Scraping,Scrapy,我试图得到一个与python中的正则表达式匹配的子字符串，这是从超市网站上获得的价格。我的代码如下所示： import scrapy import re class namePriceSpider(scrapy.Spider): name = 'namePrice' start_urls = [ 'https://www.cotodigital3.com.ar/sitios/cdigi/browse/' ] def parse(self, re

我试图得到一个与python中的正则表达式匹配的子字符串，这是从超市网站上获得的价格。我的代码如下所示：

import scrapy
import re

class namePriceSpider(scrapy.Spider):
    name = 'namePrice'
    start_urls = [
        'https://www.cotodigital3.com.ar/sitios/cdigi/browse/'
    ]

    def parse(self, response):
        all_category_products = response.xpath('//*[@id="products"]')
        for product in all_category_products:
            name = product.xpath('//div[@class="descrip_full"]/text()').extract()
            price = product.xpath('//span[@class ="atg_store_newPrice"]/text()').extract()
            yield {'name': name,
                   'price': re.search(r'$\d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2})', price).group(1)}

当我运行spider时，我在parse中得到这个错误行16

'price'：re.search（r'$\d{1,3}（？：[，]\d{3}）*（？：[，]\d{2}），price.group（1）}和TypeError:expected string或bytes like object。

我已经解决了它，它有多个错误，但最大的错误是price不应该有。extract（），应该是这样的

price = product.xpath('//span[@class="atg_store_productPrice" and not(@style)]/span[@class '
                                  '="atg_store_newPrice"]/text() | //span[@class="price_discount"]/text()').re(
                r'\$\d{'
                r'1,'
                r'5}(?:['
                r'.,'
                r']\d{'
                r'3})*('
                r'?:[., '
                r']\d{2})*')

我看到的一件事是，美元符号，

，需要转义，

\$

，因为它是一个元字符（用于行尾）。此外，您没有捕获括号，因此

group（1）

没有可参考的内容。我想您可以在这里使用

group（0）

。

.extract（）

返回一个列表，但搜索需要一个字符串。首先使用

extract_

或在列表上循环。