Python Scrapy：如何同时选择头部和身体标记_Python_Xpath_Scrapy_Web Crawler

Python Scrapy：如何同时选择头部和身体标记

python xpath scrapy web-crawler

Python Scrapy：如何同时选择头部和身体标记,python,xpath,scrapy,web-crawler,Python,Xpath,Scrapy,Web Crawler,所以，我有一个爬虫程序，它需要从头部的元标记和身体的元素标记中提取一些数据当我尝试这个的时候对于response.xpath//html中的课程：还有这个对于响应的课程。xpath//head: 它只从中的元标记获取数据。。。标签当我尝试这个的时候有关response.xpath//body中的课程：它只从html中的标记获取数据。。。标签如何组合这两个选择器，我也尝试过对于response.xpath//head |//body中的课程：但它只从…返回“meta”标记，并没

所以，我有一个爬虫程序，它需要从头部的元标记和身体的元素标记中提取一些数据

当我尝试这个的时候

对于response.xpath//html中的课程：

还有这个

对于响应的课程。xpath//head:

它只从中的元标记获取数据。。。标签

当我尝试这个的时候

有关response.xpath//body中的课程：

它只从html中的标记获取数据。。。标签

如何组合这两个选择器，我也尝试过

对于response.xpath//head |//body中的课程：

但它只从…返回“meta”标记，并没有从尸体上提取任何东西

我也试过这个

对于响应的课程。xpath/*：

它可以工作，但效率很低，提取需要很多时间。我相信有一种更有效的方法可以做到这一点

这是一个粗略的代码，如果有帮助的话

yeild下pagetype、pagefeatured的前两个元素位于。。。标签最后两个要素coursetloc、coursetfees在使用extract_first获取提取中的第一个值，不要使用join 使用[开始]-with@name，dkn]要查找meta标记，//meta表示文档的所有内容。输出：

发布url或html代码@宏杰李发布代码…我是指网站URL谢谢，但我想将值存储在变量中，以将值发送到Elasticsearch，而不仅仅是在屏幕上打印，就像您在上面的示例代码中看到的那样。没关系，我需要在代码中更改的只是将其更改为课程响应。xpath//body:to更改为课程响应。xpath//meta:All good now。。。。

class MySpider(BaseSpider):
name = "dkcourses"
start_urls = ['http://www.example.com/scrapy/all-courses-listing']
allowed_domains = ["example.com"]
def parse(self, response):
 hxs = Selector(response)
 for courses in response.xpath("//body"):
 yield {
            'pagetype': ''.join(courses.xpath('.//meta[@name="dkpagetype"]/@content').extract()),
            'pagefeatured': ''.join(courses.xpath('.//meta[@name="dkpagefeatured"]/@content').extract()),
            'coursetloc': ''.join(courses.xpath('.//meta[@name="dkcoursetloc"]/@content').extract()),
            'coursetfees': ''.join(courses.xpath('.//meta[@name="dkcoursetfees"]/@content').extract()),
           }
 for url in hxs.xpath('//ul[@class="scrapy"]/li/a/@href').extract()):
  yield Request(response.urljoin(url), callback=self.parse)

In [5]: for meta in response.xpath('//meta[starts-with(@name, "dkn")]'):
   ...:     name = meta.xpath('@name').extract_first()
   ...:     content = meta.xpath('@content').extract_first()
   ...:     print({name:content})

{'dknpagetype': 'Course'}
{'dknpagefeatured': ''}
{'dknpagedate': '2016-01-01'}
{'dknpagebanner': 'http://www.deakin.edu.au/__data/assets/image/0006/757986/Banner_Cyber-Alt2.jpg'}
{'dknpagethumbsquare': 'http://www.deakin.edu.au/__data/assets/image/0009/757989/SQ_Cyber1-2.jpg'}
{'dknpagethumblandscape': 'http://www.deakin.edu.au/__data/assets/image/0007/757987/LS_Cyber1-1.jpg'}
{'dknpagethumbportrait': 'http://www.deakin.edu.au/__data/assets/image/0008/757988/PT_Cyber1-3.jpg'}
{'dknpagetitle': 'Graduate Diploma of Cyber Security'}
{'dknpageurl': 'http://www.deakin.edu.au/course/graduate-diploma-cyber-security'}
{'dknpagedescription': "Take your understanding of cyber security to the next level with Deakin's Graduate Diploma of Cyber Security and build your capacity to investigate and combat cyber-crime."}
{'dknpageid': '723503'}