Web crawler 如何防止同一url返回不同的响应？_Web Crawler_Scrapy_Scrapy Spider - Fatal编程技术网

Web crawler 如何防止同一url返回不同的响应？

web-crawler scrapy

Web crawler 如何防止同一url返回不同的响应？,web-crawler,scrapy,scrapy-spider,Web Crawler,Scrapy,Scrapy Spider,现在我正在尝试使用scrapy来抓取一个网站我发现，给定相同的url，请求的响应可能会不同。这似乎是该网站的两个版本。我还使用了相同的用户代理是否有一些方法可以让反应保持一致？或者我只能分析每个响应的版本，然后使用不同的XPath提取项目 scrapy shell中的response.headers如下所示：这完全取决于网站，而不是网站。在这种情况下，可以检查响应.标题，特别是上次修改的标题，该标题应返回上次修改的日期信息。谢谢您的建议。现在我有点困惑。“response.heade

现在我正在尝试使用scrapy来抓取一个网站

我发现，给定相同的url，请求的响应可能会不同。这似乎是该网站的两个版本。我还使用了相同的用户代理

是否有一些方法可以让反应保持一致？或者我只能分析每个响应的版本，然后使用不同的XPath提取项目

scrapy shell中的response.headers如下所示：

这完全取决于网站，而不是网站。在这种情况下，可以检查

响应.标题

，特别是上次修改的

标题，该标题应返回上次修改的日期信息。
谢谢您的建议。现在我有点困惑。“response.headers”返回：“{…，”X-Ua-Compatible“：”IE=8'}”，但当我查看响应时，我可以在“head”部分中看到，“”和此元信息在所有版本的响应中都是相同的。我不知道这是否是问题所在。第一个标题与元信息不同，一些站点无法返回上次修改的信息。是的，此站点没有上次修改的信息。
{'Cache-Control': 'max-age=0, private, must-revalidate',
 'Content-Type': 'text/html; charset=utf-8',
 'Date': 'Fri, 04 Dec 2015 18:56:59 GMT',
 'Server': 'nginx/1.6.2',
 'Set-Cookie': 'auth_token=hello; domain=www.medhelp.org; path=/; expires=Thu, 01-Jan-1970 00:00:00 GMT',
 'X-Rack-Cache': 'miss',
 'X-Request-Id': '70f23a01ac124fd58acc9e9e7bafb609',
 'X-Runtime': '0.150452',
 'X-Ua-Compatible': 'IE=8'}




[scrapy]相关文章推荐



                                                        
如何在scrapy图像下载中处理图像文件名复制
scrapy 
Scrapy 为什么我的刮痧总是告诉我；TCP连接超时“；
scrapy 
使用Scrapy将项目JSON发布到API
scrapy 
Scrapy from_设置和from_爬虫类方法是如何工作的？
scrapy 
Scrapy 刮屑机无输出
scrapy 
Scrapy：选择器返回带有.extract的完整元素（但正确分配数据）
scrapy 
Scrapy 刮痕：超出初始位置的爬行
scrapy 
在脚本中运行scrapy X小时？
scrapy 
Scrapy：如何实现第三方功能中的收益率？
scrapy 
通过Scrapy Splash返回图像内容
scrapy 
Scrapy 刮擦一个有保护的链接？
scrapy 
Scrapy 我写了一个错误的蜘蛛，却找不到，谁能更正代码？
scrapy 
Scrapy shell不会返回整个页面
scrapy 
                                       





随机文章推荐



                                                        
Raspberry pi quickcam 9000 pro缺少raspberry pi uvcdynctrl焦点控制
raspberry-pi 
Raspberry pi 树莓皮2，根访问被拒绝
raspberry-pi 
Raspberry pi 在Raspberry Pi上-登录后自动启动终端？
raspberry-pi 
Raspberry pi pyusb-[Error 5]在Linux上，而不是Windows上
raspberry-piusb 
Raspberry pi 如何将Watson文本到语音的输出路由/管道到本地说话人vs终端
raspberry-pi 
Raspberry pi 如何从客户端更新多个RasberryPi3？
raspberry-pi 
Raspberry pi 通过RPi3上的ALSA输出模拟和hdmi声音
raspberry-pi 
Raspberry pi gcloud组件存储库添加错误
raspberry-pi 
Raspberry pi 让Jack和Alsa在覆盆子上合作
raspberry-pi 
Raspberry pi 如何通过PulseAudio RTP单播将音频流传送到Raspberry Pi？
raspberry-pi 
Raspberry pi Rook Ceph部署：创建csi Cepfsplugin供应器和csi rdb插件供应器时出错
raspberry-pi 
Raspberry pi SparkFun RP2040和MicroPython
raspberry-pi


                                        

                                        
                                        


                                                
                                                        [web crawler]相关推荐
                                                        
Web crawler 拒绝访问，但允许机器人（如Google）访问sitemap.xml
									Web Crawler
							 
Web crawler 我应该如何处理爬虫中的规范URL
									Web Crawler
							 
Web crawler 我可以让Discovery或任何其他IBM服务为我的聊天机器人爬网吗？
									Web Crawler
							 
                                                        
                                                

                                                
                                                        Tags
                                                        
Fortran
Aem
Iframe
Loopbackjs
Parse Platform
Webstorm
Autodesk Forge
Service
Google Maps Api 3
Xcode
Mono
Css
Dynamics Crm 2011
Kernel
Command Line
Sprite Kit
Android Ndk
Eclipse
Asterisk
Jms
Https
Plot
Hyperledger Fabric
Machine Learning
Google Cloud Dataflow
Routes
Visual Studio 2010
Ios8
Tcp
Kdb
Compilation
Xamarin
Vbscript
Karate
Jsf
Youtube Api
Terminal
Ibm Cloud
Jquery
Sdk
Download
Applescript
Laravel 4
Excel Formula
Python 2.7
Cmd
Sails.js
Webview
Internet Explorer
Winforms
Joomla
Sqlite
Sencha Touch 2
Tableau Api
Sencha Touch
Memory Leaks
C++11
Responsive Design
Facebook Graph Api
Mariadb
Odoo
Tinymce
Tensorflow
Drupal 6
Google Plus
Prometheus
Ruby On Rails 3
Mod Rewrite
Embedded
Programming Languages
Entity Framework Core
Jqgrid
Xml
Asp.net Mvc 4
Udp
Ssis
Email
Stata
Dll
Sublimetext3
Clojure
Vmware
Db2
Ruby
Resharper
Sap
Requirejs
Streaming
Blockchain
Actionscript
Linkedin
Cocoa
Amazon Redshift
Youtube
Leaflet
Jquery Plugins
Docusignapi
Artificial Intelligence
Google Api
Android Layout
C++ Cli
Jsf 2
R
Chart.js
Stored Procedures
Ibm Midrange
Linker
Memory
Silverlight 4.0
Scripting
Sql
Sql Server 2005
Jestjs
Ssas
Build
Go
Qml
Listview
Npm
Codenameone
Firebase
Io
Process
Rally
Utf 8
Apache Camel
Vhdl
Stanford Nlp
Methods
Swing
Ignite
Numpy
Log4net
Ssh
Google Apps Script
Windows Runtime
Properties
Class
React Native
Magento
Artifactory
Amazon Dynamodb
Mediawiki
Path
Mobile
Spring Integration
Perl
Google App Engine
Nlp
Pine Script
Gcc
Breeze
Computer Vision
Cygwin
Sequelize.js
Windbg
Visual Studio
Ethereum
Hive
Javafx 2
Jdbc
Python
Robotframework
Hash
Stm32
Gitlab
Google Chrome
Log4j
Login
Shiny
Mercurial
Drupal
Keras
Ms Office
Directory
Twilio
Configuration
Ssl
EmptyTag
Internationalization
Bots
Silverstripe
Editor
Jekyll
Clearcase
Wolfram Mathematica
Moodle
Google Cloud Platform
Vue.js
Cmake
Openlayers
Indexing
Unicode
Checkbox
Vb6
Computer Science
Yii2
Orchardcms
Yii
Ios4
Drupal 7


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网