Python 不刮除数据_Python_Web Scraping_Scrapy

Python 不刮除数据

python web-scraping scrapy

Python 不刮除数据,python,web-scraping,scrapy,Python,Web Scraping,Scrapy,我编写了以下脚本来从中提取数据：但是，当我运行它时，我只收到一个空文件，除了标题？为什么会这样您的XPath选择器返回None。可能应该是： “title”：title.xpath“text”。首先提取此外，还可以去除多余的符号： “title”：title.xpath“text”。extract\u firstdefault=.strip 默认值=如果选择器未找到任何内容，则避免出现异常。试一试，并告诉我您没有从该页面获得预期的标题。您定义的xpath有错误。此外，每个字符串中都有巨大的

我编写了以下脚本来从中提取数据：

但是，当我运行它时，我只收到一个空文件，除了标题？为什么会这样

您的XPath选择器返回None。可能应该是：

“title”：title.xpath“text”。首先提取

此外，还可以去除多余的符号：

“title”：title.xpath“text”。extract\u firstdefault=.strip

默认值=如果选择器未找到任何内容，则避免出现异常。

试一试，并告诉我您没有从该页面获得预期的标题。您定义的xpath有错误。此外，每个字符串中都有巨大的空白，因此您需要删除它们。下面的脚本将为您提供干净的输出

import scrapy

class MySpider(scrapy.Spider):
    name = 'jobs'
    start_urls = ['https://www.freelancer.in/jobs/python_web-scraping_web-crawling/']

    def parse(self, response):

        for title in response.xpath('//*[@class="JobSearchCard-primary-heading-link"]/text()').extract():
            yield{
                'title' : title.strip()
            }

试试这个：

import scrapy

class MySpider(scrapy.Spider):
    name = 'jobs'
    start_urls = ['https://www.freelancer.in/jobs/python_web-scraping_web-crawling/']

    def parse(self, response):
        for title in response.xpath('//div[@class = "JobSearchCard-primary-heading"]//a'):
            yield {
                'title' : title.xpath('./text()').extract_first().strip()
            }

内部xpath应该相对于循环的节点

import scrapy

class MySpider(scrapy.Spider):
    name = 'jobs'
    start_urls = ['https://www.freelancer.in/jobs/python_web-scraping_web-crawling/']

    def parse(self, response):
        for title in response.xpath('//div[@class = "JobSearchCard-primary-heading"]//a'):
            yield {
                'title' : title.xpath('./text()').extract_first().strip()
            }