Html 如何使用scrapy访问下表的所有特定数据？_Html_Xpath_Scrapy

Html 如何使用scrapy访问下表的所有特定数据？

html xpath scrapy

Html 如何使用scrapy访问下表的所有特定数据？,html,xpath,scrapy,Html,Xpath,Scrapy,我正在尝试访问url中以下表元素的所有数据我试着用scrapy爬行数据。但无法爬网，出现了一些错误，我无法爬网所需的所有数据。请帮助我更正代码，以便对“名称”、“图像链接”、“如何执行练习”和表中的所有其他可用数据进行爬网。我正在尝试以下代码： from scrapy.spider import Spider from scrapy.selector import Selector from myproject.items import getExercise class MySpider

我正在尝试访问url中以下表元素的所有数据我试着用scrapy爬行数据。但无法爬网，出现了一些错误，我无法爬网所需的所有数据。请帮助我更正代码，以便对“名称”、“图像链接”、“如何执行练习”和表中的所有其他可用数据进行爬网。我正在尝试以下代码：

from scrapy.spider import Spider
from scrapy.selector import Selector
from myproject.items import getExercise

class MySpider(Spider):
   name = "getExercise"
   allowed_domains = ["www.jefit.com"]
   start_urls = ["https://www.jefit.com/exercises/1/" ]

def parse(self, response):

   item = getExercise()
   item['exerciseName']=response.xpath('//table[@class = "JefitMainTable"]/tbody/tr/td[2]/table[2]/thead/tr/th/text()').extract()
   return item

尝试用XPath中的

替换

/tbody

当仅在浏览器DOM中检查XPath时，这是一个常见问题，因为浏览器会自动将

tbody

元素放入表中

在scrapy shell中尝试XPath表达式通常是个好主意：

$ scrapy shell https://www.jefit.com/exercises/1/
>>> response.xpath('//table[@class = "JefitMainTable"]/tbody/tr/td[2]/table[2]/thead/tr/th/text()').extract()
[]
>>> response.xpath('//table[@class = "JefitMainTable"]//tr/td[2]/table[2]/thead/tr/th/text()').extract()
[u'Band Cross Over']

除了没有获得预期的数据之外，您是否还收到了错误？没有。它返回了一个错误