Python 使用scrapy的绝对路径的相对路径_Python_Web Crawler_Scrapy_Scraper

Python 使用scrapy的绝对路径的相对路径

python web-crawler scrapy

Python 使用scrapy的绝对路径的相对路径,python,web-crawler,scrapy,scraper,Python,Web Crawler,Scrapy,Scraper,我正在尝试爬网一个论坛，最终在有链接的帖子中发布帖子。现在我只是想把帖子的用户名刮下来。但我认为URL不是静态的存在一个问题 spider.py from scrapy.spiders import CrawlSpider from scrapy.selector import Selector from scrapy.item import Item, Field class TextPostItem(Item): title = Field() url = Field(

我正在尝试爬网一个论坛，最终在有链接的帖子中发布帖子。现在我只是想把帖子的用户名刮下来。但我认为URL不是静态的存在一个问题

spider.py

from scrapy.spiders import CrawlSpider
from scrapy.selector import Selector
from scrapy.item import Item, Field


class TextPostItem(Item):
    title = Field()
    url = Field()
    submitted = Field()


class RedditCrawler(CrawlSpider):
    name = 'post-spider'
    allowed_domains = ['flashback.org']
    start_urls = ['https://www.flashback.org/t2637903']


    def parse(self, response):
        s = Selector(response)
        next_link = s.xpath('//a[@class="smallfont2"]//@href').extract()[0]
        if len(next_link):
            yield self.make_requests_from_url(next_link)
        posts =   Selector(response).xpath('//div[@id="posts"]/div[@class="alignc.p4.post"]')
        for post in posts:
            i = TextPostItem()
            i['title'] = post.xpath('tbody/tr[1]/td/span/text()').extract() [0]
            #i['url'] = post.xpath('div[2]/ul/li[1]/a/@href').extract()[0]
            yield i

请提供以下错误：

raise ValueError('Missing scheme in request url: %s' % self._url)
ValueError: Missing scheme in request url: /t2637903p2

有什么想法吗？

您需要将

response.url

与您使用以下方法提取的相对url“连接起来”：

还请注意，无需实例化

选择器

对象-您可以直接使用

response.xpath（）

快捷方式：

def parse(self, response):
    next_link = response.xpath('//a[@class="smallfont2"]//@href').extract()[0]
    # ...

你好，谢谢你的回答。我以前见过使用“urljoin”的类似解决方案。但是我不明白如何在我的代码中使用它。这一点到底会到哪里去？@Jomasdf好的，当你发出请求时使用它：

屈服于自我。从url（urljoin（response.url，next\u link））发出请求。

。啊，我明白了。非常感谢你！

def parse(self, response):
    next_link = response.xpath('//a[@class="smallfont2"]//@href').extract()[0]
    # ...