Html scrapy css最后一个子选择器无法选择文本_Html_Css_Scrapy

Html scrapy css最后一个子选择器无法选择文本

html css scrapy

Html scrapy css最后一个子选择器无法选择文本,html,css,scrapy,Html,Css,Scrapy,我试图在一个Scrapy框架中使用CSS选择器来选择/匹配HTML中的元素。然而，我被我希望用最后一个子选择器提取的一个字段卡住了以下是HTML： <td class="Table-Standard-AwardName Table-Scholarship-AwardName"> <a id="ctl00_ContentPlaceHolder1_ScholarshipDataControl_grvScholarshipSearch_ctl02_hylScholarshipNa

我试图在一个Scrapy框架中使用CSS选择器来选择/匹配HTML中的元素。然而，我被我希望用最后一个子选择器提取的一个字段卡住了

以下是HTML：

<td class="Table-Standard-AwardName Table-Scholarship-AwardName">

<a id="ctl00_ContentPlaceHolder1_ScholarshipDataControl_grvScholarshipSearch_ctl02_hylScholarshipName" class="bold" href="/Scholarships/14123/Family-Bursary,-The">Family Bursary, The</a>   

<br>

<span>Field of Study:</span> 

EcologyEnvironmental Science

</td>

我仔细研究了其他问题，并尝试了多种方法，如

nth-last-child（）和组合兄弟姐妹选择符，但都没有成功。救命啊
 “生态环境科学”不是一个元素（如span、div或其他），而只是该td
内容的一部分。所以它不符合条件…>*
表示“该类的td
的任何直接子级
您必须将其放入一个范围中，才能通过CSS仅选择该部分内容，如
...
  <span>Field of Study:</span> 
  <span>EcologyEnvironmental Science</span>
</td>

。。。
研究领域：
生态环境科学
如前所述，EcologyEnvironmental Science
文本是td
元素的一部分，这就是为什么您只需要提取其文本，请尝试以下操作：
values = response.css('.Table-Standard-AwardName.Table-Scholarship-AwardName::text').extract()
out = next(filter(None, map(methodcaller('strip'), values)))
# you can assign 'EcologyEnvironmental Science' to your item

尝试response.css（'td.Table-Standard-AwardName.Table-Scholarship-AwardName>*：：text'）[-1].extract（）
。使用xpath？response.xpath（'/html/body/Table/tbody/tr/td/text（）'）尝试了这个方法，检查者给出了response.xpath（'//*[@id=“ctl00\u ContentPlaceHolder1\u ScholarshipDataControl\u grvScholarshipSearch”]/tbody/tr[2]/text（）'））
。两者均未产生输出。@Gaby aka G.Petrioli:我尝试了该方法和其他索引。在uu getitem_uuuuuuuuuuuu（self，pos）59 60 def uu getitem_uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu（self，pos）中获得：-->61 o=super（SelectorList，self）。u getitem）返回selfelse o 63 Indexer:列表索引超出范围
@NatashaTing您能在没有最后一个
的情况下尝试吗，这将是response.css（'td.Table-Standard-AwardName.Table-Scholarship-‌AwardName*：：text'）[-1].extract（）
是的，如果我写了这个页面，我会把它放在span
中。td
的最后一个内容是否有选择器？谢谢，是的，我没有想过全部调用并过滤输出。太好了，如果这个问题对您有帮助，请随意标记为已回答。我实际上希望使用css/xpath/regex进行选择。在我使用的HTML文件中，每个td
行中的文本都非常不同，因此很难找到一个通用值来过滤它们。我一直在四处寻找-应该有一个选择器用于最后一个td节点。非常感谢您提出的备选方案，如果我在接下来的几天内没有找到另一个工作方法，我将我想，也许你可以更新这个问题，添加几个例子，这样我们就可以添加更多的建议
values = response.css('.Table-Standard-AwardName.Table-Scholarship-AwardName::text').extract()
out = next(filter(None, map(methodcaller('strip'), values)))
# you can assign 'EcologyEnvironmental Science' to your item