Python 如何在Scrapy中用br元素分隔文本_Python_Web Scraping_Scrapy_Scrapy Spider

Python 如何在Scrapy中用br元素分隔文本

python web-scraping scrapy

Python 如何在Scrapy中用br元素分隔文本,python,web-scraping,scrapy,scrapy-spider,Python,Web Scraping,Scrapy,Scrapy Spider,我需要从以下标记获取违规列表： Violations: A summary of the violations found during the inspection are listed below.<

我需要从以下标记获取违规列表：

<b><font size="2" face="Verdana">Violations:</font></b><br>
<i><font size="2" face="Verdana">A summary of the violations found during the inspection are listed below.</font></i><br>
<br>
<font size="2" face="Verdana">209    Food not protected from contamination [s. 12(a)] <br>
<br>
 302 *Critical*  Equipment/utensils/food contact surfaces not properly washed and sanitized [s. 17(2)] <br>
<br>
 306    Food premises not maintained in a sanitary condition [s. 17(1)] <br>
<br>
</font><br>

违规行为：

检查期间发现的违规情况汇总如下。



209未受污染保护的食品[s.12（a）]



302*关键*设备/器具/食品接触面未正确清洗和消毒[第17（2）条]



306食物业处所没有保持徖生状况[第17（1）条]

你知道我该如何做到这一点吗？

使用类似的方法

response.xpath('string(//font)').extract()

UPD：如果解析此页面时遇到类似问题，请使用选择器

response.xpath（“字符串（//font[5]）”）.extract（）

Schwimmschulkai xx
80xx Gxx
Gxx
Stxx

然后这样解决：

response.css（'dd[itemprop=“Address”]：：text'）.getall（）

输出：

['Schwimmschulkai-xx'，80xx-Gxx'，Gxx'，Stxx']

请随意调整此解决方案以解决您自己的问题。

这为我提供了文档中的所有字体元素。我尝试过“//font[text（）=”违规行为：“]/following:：font”，但这也不起作用。你能给我一个网页url吗？