Python 如何使scrapy跟随无效链接？_Python_Python 3.x_Scrapy - Fatal编程技术网

Python 如何使scrapy跟随无效链接？

python python-3.x scrapy

Python 如何使scrapy跟随无效链接？,python,python-3.x,scrapy,Python,Python 3.x,Scrapy,我经常使用scrapy来检查长长的链接列表，看它们是否可用我的问题是当链接格式不正确时（例如，没有以http://或https://开头），爬虫程序崩溃 ValueError: Missing scheme in request url: http.www.gobiernoenlinea.gob.ve/noticias/viewNewsUser01.jsp?applet=1&id_noticia=41492 我阅读了熊猫系列的链接列表，并检查了每一个链接。当响应可访问时，我将其记录为

我经常使用scrapy来检查长长的链接列表，看它们是否可用

我的问题是当链接格式不正确时（例如，没有以http://或https://开头），爬虫程序崩溃

ValueError: Missing scheme in request url: http.www.gobiernoenlinea.gob.ve/noticias/viewNewsUser01.jsp?applet=1&id_noticia=41492

我阅读了熊猫系列的链接列表，并检查了每一个链接。当响应可访问时，我将其记录为“ok”，否则记录为“dead”

我仍然对发现那些格式不正确的URL感兴趣。我如何验证它们并为它们生成“死”呢？

您只需检查它是否以

https

和

http

开头

如果没有，则手动预编

http

if not LINK.startswith('http:') and not LINK.startswith('https:'):
    LINK = "http://" + LINK

用

try/except ValueError

在

中包装是否会产生请求
，然后在except中产生链接checkerItem（index=index['index']，url=url[1]，code='invalid'）
或类似的工作？还有。。。如果您只感兴趣地检查服务器将以某种方式响应该URL，您可能需要考虑是否执行<代码>方法=“头”<代码>。将保存获取页面内容的操作，如果您不需要，可以使用bandwith并加快速度。try/exceptValueError
将是最好的方法（使用except中的self.logger.warning（）
）method=HEAD
没有帮助，因为在到达yield
之前，在Request（）中发生异常。
if not LINK.startswith('http:') and not LINK.startswith('https:'):
    LINK = "http://" + LINK




[python 3.x]相关文章推荐



                                                        
Python 3.x Python3.4 plistlib不'；t工作（str与字节错误）
python-3.x 
Python 3.x 如何在诅咒中禁用键盘输入
python-3.x 
Python 3.x 在Python3中创建分层组合？
python-3.x 
Python 3.x 当绘图更改时，如何在绘图中保留一些相对于线条的文字
python-3.xmatplotlibplot 
Python 3.x 使用Python从Google检索第一个搜索结果
python-3.x 
Python 3.x 在python中使用或在.feature fles中表现
python-3.xunit-testing 
Python 3.x 在python中按一定顺序复制变量名
python-3.xvariables 
Python 3.x 用数组切片数组
python-3.xnumpy 
Python 3.x 将行从文本拆分为键和值，并根据键搜索值
python-3.x 
Python 3.x Python:获取输入时出现unicode错误
python-3.x 
Python 3.x 名称错误：名称'；插入sql'；没有定义
python-3.xtkinter 
Python 3.x 如何使用Python根据元数据文件对图像文件进行排序？
python-3.xsorting 
Python 3.x Scipy&x27；s ks_2samp函数提供了良好的D_统计，但p_值错误
python-3.x 
Python 3.x 如何使用python pptx模块在power point中绘制带有标签和标题的添加垂直折线图？
python-3.x 
Python 3.x Tkinter按钮在状态更改为active（活动）后未更改回正确的颜色
python-3.xtkinter 
Python 3.x 我如何将相同的输入存储在不同的位置，并对每次递增1的每个答案使用不同的名称？比如，名字1，名字2，等等
python-3.x 
Python 3.x 在Python中应用自定义函数将数字从字符串提取到多列
python-3.xregexpandasdataframe 
Python 3.x PyMongo/MongoDB insert\u是否有很多事务性的功能？
python-3.xmongodb 
Python 3.x 使用httpx的python3.4
python-3.x 
Python 3.x 如何强制Discord在没有嵌入的情况下发送链接？
python-3.xdiscorddiscord.py 
                                       





随机文章推荐



                                                        
如何在专用服务器上使用第二个wan ip设置与vbox/vmware的网桥连接
ip 
什么是IMS（IP多媒体子系统），它能为VoIP解决方案带来什么？
ip 
如何使用Iptables构建一系列IP？
ipserver 
如何在OpsWorks中查找IP地址
ipchef-infra 
将IP（GKE LoadBalancer IP）绑定到Google云端点
ip


                                        

                                        
                                        


                                                
                                                        [python]相关推荐
                                                        
（python）使用datetime解析日期时的常见值错误
									Python
							 									Datetime
							 
Python没有以正确的方式计算两个简单值？
									Python
							 
Python 如何在图形中显示我的绘图？
									Python
							 									Python 2.7
							 									Python 3.x
							 									Numpy
							 									Matplotlib
							 
在Mac上安装python不起作用
									Python
							 									Numpy
							 									Installation
							 
按位置分组相等值列表-python
									Python
							 									Dictionary
							 
Python 在Cellery.py中导入Django应用程序函数
									Python
							 									Linux
							 									Django
							 									Rabbitmq
							 
Python &引用；“最不惊讶”；和可变默认参数
									Python
							 
Python 不使用.split（）的文件中每行的平均字符数和平均字数
									Python
							 									Python 3.x
							 
Python 获取与给定区域相邻的区域
									Python
							 									Pandas
							 
Python 圆环体opengl不收敛
									Python
							 									Opengl
							 									Graphics
							 
Python pytest参数化多个测试的执行顺序
									Python
							 									Python 2.7
							 									Python 3.x
							 
Python对象作为字典列表
									Python
							 
Python pyCUDA在云上工作吗？
									Python
							 
Python 来自abc模块的可哈希抽象类
									Python
							 									Python 3.x
							 
Python PYQT：如何从主窗口（父窗口）向子窗口发送自定义信号？
									Python
							 
Python 如何在pyqt5中单击按钮时创建文件选择器
									Python
							 
Python 通过2个外键连接django中的模型
									Python
							 									Django
							 
Python 有没有办法在Jupyter实验室切换单元格输出？
									Python
							 									Jupyter Notebook
							 
Python 二维数组的numpy列表乘法
									Python
							 									Arrays
							 									Numpy
							 
Python CSR矩阵中行和列的重新排序
									Python
							 
Python Pygame:Charachter无敌将受到2点而不是1点的伤害
									Python
							 
Python 存储图像特征向量最有效的方法是什么？
									Python
							 									Mysql
							 									Django
							 
Python 下标算子的不同位置导致了分组比中相同的结果
									Python
							 									Pandas
							 									Dataframe
							 
Python OCR、识别和裁剪矩形形状
									Python
							 
Python 从类定义中的列表访问类变量
									Python
							 									Python 3.x
							 
如何使用openCV从python中的图像中提取白色密度或像素数？
									Python
							 									Opencv
							 									Image Processing
							 
Python djangosaml2对用户进行身份验证，但我在django视图中获得匿名用户
									Python
							 									Django
							 
MPC控制器Python
									Python
							 									Model
							 
AWS can'；t导入Python greenlet模块
									Python
							 									Aws Lambda
							 
形状不匹配：无法将对象广播到单个形状如何解决python中的此类错误？
									Python
							 
                                                        
                                                

                                                
                                                        Tags
                                                        
Seo
Build
Css
Curl
Google Cloud Dataflow
Numpy
Npm
Liferay
Webgl
Enums
Discord.py
Service
Vmware
Cocoa
Ipad
Ignite
Assembly
Computer Science
Next.js
Tfs
Hash
Arrays
Scala
Angular Material
C#
Pagination
Maps
Codeigniter
Nginx
Mapbox
Webview
Parallel Processing
Spring Mvc
Apache Nifi
Inno Setup
Breeze
X86
Orm
Xcode
Continuous Integration
Razor
Command Line
Joomla
Less
Mediawiki
Generics
Google Drive Api
Triggers
Google Analytics
Axapta
Maven 2
Clearcase
Sqlalchemy
Rally
Csv
Nest
Functional Programming
Windows 8
Javafx 2
Mdx
Nunit
Linux
Lotus Notes
Notifications
Phpmyadmin
Compiler Construction
Concurrency
D3.js
Yocto
Hazelcast
Chart.js
Struct
Typescript
Opengl Es
Sap
Vim
Autohotkey
Memory Leaks
Asp.net Web Api
Macros
Web Applications
Windows
Orchardcms
Log4j
Mapreduce
Jdbc
Grid
Autodesk Forge
Https
3d
C
Activerecord
Isabelle
Flutter
Fiware
Vhdl
Selenium
Karate
Quickbooks
Loops
Couchbase
Streaming
Sapui5
Xamarin
Xslt
Glsl
Cygwin
Directx
Vba
Blackberry
Prestashop
Blazor
Visual Studio 2013
Python Sphinx
Struts2
Meteor
Alfresco
Dataframe
Junit
Uitableview
Utf 8
Amazon Cloudformation
.net 4.0
Internet Explorer
Ethereum
Gatsby
Xaml
Zsh
Stripe Payments
Web Services
Gremlin
Syntax
Io
Cassandra
Sails.js
Tabs
Tinymce
Timer
Compression
Docker Compose
Pdf
Sockets
Embedded
Angular
File Io
Url Rewriting
Maven
Express
Scikit Learn
Blockchain
Sql Server
Sed
Codenameone
Apache Storm
Visual Studio 2017
Jenkins
Ruby On Rails 3.1
Deep Learning
Ibm Mobilefirst
Prometheus
Antlr
Antlr4
Flash
Llvm
Winforms
Java 8
Applescript
Math
Google Chrome Devtools
Jquery
Oracle Apex
Jupyter Notebook
Jms
Debugging
Sip
Proxy
Knockout.js
Internet Explorer 8
Jar
Spring
Magento
Synchronization
Google Bigquery
Visual Studio Code
Geolocation
Composer Php
Moodle
Transactions
Protractor
Debian
Function
Corda
Artificial Intelligence
Kotlin
Data Structures
Jira
Mobile
Stanford Nlp
Twilio
Firefox
Xsd


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网