Python 找到合适的正则表达式_Python_Regex_Scrapy

Python 找到合适的正则表达式

python regex scrapy

Python 找到合适的正则表达式,python,regex,scrapy,Python,Regex,Scrapy,您能帮我在这两页中的每一页中找到要提取的正则表达式（Margaux或Saint-Julien）：在：Margaux，胭脂在：2ème Vin，圣朱利安，胭脂我的代码： item ["appelation"] = res.select('.//div[@class="pro_col_right"]/div[@class="pro_blk_trans"]/div[@class="pro_blk_trans_titre"]/text()').re(r'\s*\w+\-\w+\-\w+|\w+\-

您能帮我在这两页中的每一页中找到要提取的正则表达式（

Margaux

或

Saint-Julien

）：

在：

Margaux，胭脂

在：

2ème Vin，圣朱利安，胭脂

我的代码：

item ["appelation"] = res.select('.//div[@class="pro_col_right"]/div[@class="pro_blk_trans"]/div[@class="pro_blk_trans_titre"]/text()').re(r'\s*\w+\-\w+\-\w+|\w+\-\w+|\[^Rouge,Blanc]')

我的正则表达式找不到Margaux，但它提取了Saint Julien

不确定您为什么需要这个，但是假设

是您的html文件，那么这个正则表达式将找到您要查找的内容

import re
m = re.search(r"\<div\ class=\"pro_blk_trans_titre\"\>(.*)\</div\>", s)
print(m.group(1).strip().encode("utf8"))

# page1: b'Margaux, Rouge'
# page2: b'2\xc3\xa8me Vin, Saint-Julien, Rouge'

重新导入
m=重新搜索（r“\（.*）\”，s）
打印（m.group（1.strip（）.encode（“utf8”））
#第1页：红玛歌'
#第2页：b'2\xc3\xa8me Vin，圣朱利安，胭脂'

使用正则表达式提取您已经知道的确切形式的东西有什么意义？thx@joente，我想通过遍历每瓶葡萄酒，自动提取win在该链接上的亲昵。问题是wine页面的结构不相似（查找第1页和第2页的链接），所以我想使用一个正则表达式，它可以找到我的网页的任何结构