Python Scrapy从请求url获取错误的值_Python_Web Scraping_Scrapy

Python Scrapy从请求url获取错误的值

python web-scraping scrapy

Python Scrapy从请求url获取错误的值,python,web-scraping,scrapy,Python,Web Scraping,Scrapy,我试图从中提取标题，但得到不同的标题，而不是响应url的标题。我正在尝试这个- class ElementSpider(scrapy.Spider): name = 'qwerty4' allowed_domains = ["burbank.com.au"] start_urls = ["https://www.burbank.com.au/victoria/home-details/alphington-153-179727", "https://www.burban

我试图从中提取标题，但得到不同的标题，而不是响应url的标题。我正在尝试这个-

class ElementSpider(scrapy.Spider):
    name = 'qwerty4'
    allowed_domains = ["burbank.com.au"]
    start_urls = ["https://www.burbank.com.au/victoria/home-details/alphington-153-179727", "https://www.burbank.com.au/victoria/home-details/sandringham-151-171569", "https://www.burbank.com.au/victoria/home-details/sandringham-151-181680", "https://www.burbank.com.au/victoria/home-details/bellfield-184-171585", "https://www.burbank.com.au/victoria/home-details/carlton-178-172662", "https://www.burbank.com.au/victoria/home-details/carlton-178-178079" ]

    def parse(self, response):
        title = response.xpath('//div[@class="col-md-4 col-xs-12 col-sm-12"]/div[@class="housename"]/span/text()').extract()[0]
        print response.url
        print title

以及为某些请求获取错误的数据。输出为-

请建议如何解决此问题。

他们不希望自己的网站被刮掉，因此添加了一项技术，使刮板变得混乱

在settings.py中更改一些字段

CONCURRENT_REQUESTS = 1
DOWNLOAD_DELAY = 2

似乎该网站存储了viewstate

为了避免这种情况，您需要通过设置CONCURRENT_REQUESTS=1来消除scrapy的并发性

否则，您需要进一步研究viewstate是如何生成的，它可能是IP绑定的，这可能意味着您需要一些代理来解决这一问题。

scrapy deals同时请求。这意味着scrapy一次发送指定数量的请求。若将其更改为1，则它只发送一个请求，并在收到第一个请求的响应后发送下一个请求。如果设置为50，scrapy一次发送50个请求。