Python xpath：如何在<；strong>；要素_Python_Xpath_Web Scraping_Scrapy

Python xpath：如何在<；strong>；要素

python xpath web-scraping scrapy

Python xpath：如何在<；strong>；要素,python,xpath,web-scraping,scrapy,Python,Xpath,Web Scraping,Scrapy,我正在开发一个Scrapy spider，其中xpath用于提取所需的信息。源页面首先是使用网站的搜索功能生成的。例如，我的兴趣是获取标题中带有“计算机”的项目。在源页面上，由于搜索过程，所有“计算机”都以粗体显示。“计算机”可以在标题的开头、中间或结尾。有些项目的标题中没有“计算机”。请参见以下示例： Example 1: ("computer" at the beginning) <a class="title" href="whatever1">

我正在开发一个Scrapy spider，其中xpath用于提取所需的信息。源页面首先是使用网站的搜索功能生成的。例如，我的兴趣是获取标题中带有“计算机”的项目。在源页面上，由于搜索过程，所有“计算机”都以粗体显示。“计算机”可以在标题的开头、中间或结尾。有些项目的标题中没有“计算机”。请参见以下示例：

Example 1: ("computer" at the beginning) <a class="title" href="whatever1"> Computer , used </a> Example 2: ("computer" in the middle) <a class="title" href="whatever2"> Low price computer , great deal </a> Example 3: ("computer" at the end) <a class="title" href="whatever3"> Don't miss this Computer </a> Example 4: (no keyword of "computer") <a class="title" href="whatever4"> Best laptop deal ever! </a>

我需要一个xpath代码来涵盖所有这四种情况并收集每个项目的完整标题。
最简单的方法是搜索所有“文本”节点并“加入”它们：

注意
text（）前面的双斜杠。
这是这里的关键点。
是的，它是一个符咒！一个相关的问题：如何将此与Scrapy中的item loader合并？这行得通吗：
il.add_xpath（'title'，'”）.join（.//a[@class=“title”]//text（））
？@LearnAWK当然可以，那么：
il.add_xpath（'title'，'。//a[@class=“title”]//text（），join（）怎么样
？别忘了导入
Join
处理器。太好了！我以前没有使用过
Join
函数。很高兴我从你那里学到了新东西。最好！@LearnAWK当然，愉快的网页抓取！双
//text（）
是关键！单
/text（）
会丢失
中的文本。谢谢！
Example 1: , used Example 2: , great deal Example 3: (Nothing) Example 4: Best laptop deal ever!

"".join(response.xpath('.//a[@class="title"]//text()').extract())