Python 服务器是否可以读取由scrapy发送的请求元数据？_Python_Python 2.7_Scrapy_Web Crawler

Python 服务器是否可以读取由scrapy发送的请求元数据？

python python-2.7 scrapy web-crawler

Python 服务器是否可以读取由scrapy发送的请求元数据？,python,python-2.7,scrapy,web-crawler,Python,Python 2.7,Scrapy,Web Crawler,下面的代码基本上是亚马逊蜘蛛的一个示例我想知道amazon服务器（或任何其他服务器）是否可以知道我们将哪些数据传递到scrapy Request.meta。如果Request.meta没有与我们的请求一起传递，那么我们如何将元数据接收到response.meta中。有人能解释一下request.meta和response.meta是如何工作的吗 import random from HTMLParser import HTMLParser import scrapy from scra

下面的代码基本上是亚马逊蜘蛛的一个示例
我想知道amazon服务器（或任何其他服务器）是否可以知道我们将哪些数据传递到scrapy Request.meta。如果Request.meta没有与我们的请求一起传递，那么我们如何将元数据接收到response.meta中。

有人能解释一下request.meta和response.meta是如何工作的吗

import random from HTMLParser import HTMLParser import scrapy from scrapy.crawler import CrawlerProcess import os import sys sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '../..'))) from amazon.items import AmazonItem from amazon.user_agents import user_agent_list class MLStripper(HTMLParser): def __init__(self): self.reset() self.fed = [] def handle_data(self, d): self.fed.append(d) def get_data(self): return ''.join(self.fed) def strip_tags(html): s = MLStripper() s.feed(html) return s.get_data() class Amazon(scrapy.Spider): allowed_domains = ['amazon.in'] start_urls = ['http://www.amazon.in'] name = 'amazon' def parse(self, response): product_detail = response.xpath('//li[@class="s-result-item celwidget "]') for product in product_detail: asin = product.xpath('@data-asin').extract_first().encode('ascii', 'ignore') url = 'http://www.amazon.in/dp/' + asin brand = product.xpath('div/div/div/span[2]/text()').extract_first() if brand != 'Azani': request = scrapy.Request(url, callback=self.parse_product) request.meta['asin'] = asin yield request next_page = response.xpath('//a[@id="pagnNextLink"]/@href').extract_first() if next_page: next_page = 'http://www.amazon.in' + next_page request = scrapy.Request(next_page, callback=self.parse) yield request def offer_page(self, response): item = response.meta['item'] seller = response.xpath('//div[@class="a-row a-spacing-mini olpOffer"]/div/h3/span/a/text()').extract() price = response.xpath('//div[@class="a-row a-spacing-mini olpOffer"]/div/span/span/text()').extract() seller_price = zip(seller, price) item['brand'] = response.xpath('//div[@id="olpProductByline"]/text()').extract_first().strip().replace('by ', '') item['price'] = '{}'.format(seller_price) item['no_of_seller'] = len(seller_price) yield item def parse_product(self, response): def html_to_text(html): s = MLStripper() s.feed(html) return s.get_data() asin = response.meta['asin'] item = AmazonItem() item['asin'] = asin item['product_name'] = response.xpath('//*[@id="productTitle"]/text()').extract_first().strip() item['bullet_point'] = html_to_text( response.xpath('//*[@id="feature-bullets"]').extract_first()).strip() item['description'] = html_to_text(response.xpath('//*[@id="productDescription"]').extract_first()).strip() child_asins = response.xpath('//*[@class="dropdownAvailable"]/@value').extract() child_asins = map(lambda x: x.split(',')[-1], child_asins) child_asins = ','.join(child_asins) item['child_asin'] = child_asins.encode('utf-8', 'ignore') offer_page = 'http://www.amazon.in/gp/offer-listing/' + asin request = scrapy.Request(offer_page, callback=self.offer_page) request.meta['item'] = item yield request
没有
您可以通过检查
request.body
和
request.headers
属性来查看请求发送到源代码的内容

$ scrapy shell "http://stackoverflow.com" >[1]: request.headers <[1]: {b'Accept': b'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', b'Accept-Encoding': b'gzip,deflate', b'Accept-Language': b'en', b'User-Agent': b'scrapy'} >[2]: request.body <[2]: b'' >[3]: request.method <[3]: 'GET'

$scrapy shell”http://stackoverflow.com" >[1] ：request.headers [2] ：request.body [3] ：request.method