Python 3.x 使用scrapy的响应显示错误或其他数据
我正在抓取一个页面,但当我请求一个包含所有信息的链接时,它显示数据不存在,但我用firefox inspector检查json,响应包含所有信息,我已经处理了标题,但我没有成功地让我显示数据 我的代码: settings.py:Python 3.x 使用scrapy的响应显示错误或其他数据,python-3.x,scrapy,Python 3.x,Scrapy,我正在抓取一个页面,但当我请求一个包含所有信息的链接时,它显示数据不存在,但我用firefox inspector检查json,响应包含所有信息,我已经处理了标题,但我没有成功地让我显示数据 我的代码: settings.py: USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0' ROBOTSTXT_OBEY = False CONCURRENT_REQUEST
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0'
ROBOTSTXT_OBEY = False
CONCURRENT_REQUESTS = 1
DOWNLOAD_DELAY = 3
COOKIES_ENABLED = False
mi_spider.py:
from scrapy import Spider
from scrapy.http import Request
from json import loads, dump
N_categoria = 0
API_key = 'P1MfFHfQMOtL16Zpg36NcntJYCLFm8FqFfudnavl'
class MetrocScrapingSpider(Spider):
name = 'metroc_scraping'
allowed_domains = ['metrocuadrado.com']
start_urls = ['https://www.metrocuadrado.com/']
def parse(self, response):
print()
print('Entra aca 1')
print()
aptos_links = response.xpath('//*[@class= "box-list"]')[N_categoria].xpath('.//li//a/@href').extract()
data_links = []
for url in aptos_links:
items = {}
url = url.split('.com')[-1].split('/')
for ind, info in enumerate(url):
if info == '':
url.pop(ind)
items['inmu_'] = url[0]
items['type_'] = url[1]
items['loc_'] = url[-1]
data_links.append(items)
n_cat = 1
yield Request(url= response.url,
callback= self.first_parse,
meta= {'data_links': data_links,
'n_cat': n_cat,
'aptos_links': aptos_links},
dont_filter= True)
def first_parse(self, response):
data_links = response.meta['data_links']
n_cat = response.meta['n_cat']
aptos_links = response.meta['aptos_links']
n_from = 0
cat_linl = aptos_links[n_cat]
data_link = data_links[n_cat]
print(data_link)
inmu_ = data_link['inmu_']
type_ = data_link['type_']
loc_ = data_link['loc_']
api_link = 'https://www.metrocuadrado.com/rest-search/search?realEstateTypeList='+inmu_+'&realEstateBusinessList='+type_+'&city='+loc_+'&from='
yield Request(url= api_link + str(n_from) + '&size=50',
callback= self.main_parse,
meta= {'data_links': data_links,
'n_cat': n_cat,
'n_from': n_from,
'api_link': api_link},
dont_filter= True,
headers= {'Accept': 'application/json, text/plain, */*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'es-ES,es;q=0.8,en-US;q=0.5,en;q=0.3',
'Connection': 'keep-alive',
'DNT': '1',
'Host': 'www.metrocuadrado.com',
'Upgrade-Insecure-Requests': '1',
'Referer': cat_linl,
'Pragma': 'no-cache',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0',
'X-Api-Key': API_key,
'X-Requested-With': 'XMLHttpRequest'})
def main_parse(self, response):
print()
print(response.url)
print()
print(response.status)
print()
jsonresponse = loads(response.text)
print(jsonresponse)
如您所见,“totalHits”为0,“totalEntries”也为0,结果为空。但是,如果您查看firefox检查器:
firefox inspector中的部分响应(我不知道是否很难看到,但“totalHits”是3135,“totalEntries”是3135:
我不知道为什么会发生这种情况,请提供帮助?我不确定,但您是否可以尝试在收益函数中切换标题位置,如:header:callback:etc…hello@MuratDemir,我按您所说的做了,但仍然不起作用。您似乎相信Firefox inspector显示了页面源代码。它没有。请参阅