Python 3.x 我们应该在哪里修改scrapy,以记住在刮片时导致403错误的网站?
我有一个刮刀,可以刮取url和嵌入的url,我想记录返回403的url:Python 3.x 我们应该在哪里修改scrapy,以记住在刮片时导致403错误的网站?,python-3.x,scrapy,http-status-code-403,Python 3.x,Scrapy,Http Status Code 403,我有一个刮刀,可以刮取url和嵌入的url,我想记录返回403的url: >>>scrapy crawl myscraper -o results.jl ... 2020-11-11 02:38:08 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10052252-anjisilesai-angel-schlesser.ht
>>>scrapy crawl myscraper -o results.jl
...
2020-11-11 02:38:08 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10052252-anjisilesai-angel-schlesser.html>: HTT
P status code is not handled or not allowed
2020-11-11 02:38:15 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10034901-aqisi-arquiste.html> (referer: https://www.nosetime.com/p
inpai/2-a.html)
2020-11-11 02:38:15 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10034901-aqisi-arquiste.html>: HTTP status code
is not handled or not allowed
2020-11-11 02:38:20 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10070420-antonio-visconti.html> (referer: https://www.nosetime.com
/pinpai/2-a.html)
2020-11-11 02:38:20 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10070420-antonio-visconti.html>: HTTP status co
de is not handled or not allowed
2020-11-11 02:38:27 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10080993-alandelong-alain-delon.html> (referer: https://www.noseti
me.com/pinpai/2-a.html)
2020-11-11 02:38:27 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10080993-alandelong-alain-delon.html>: HTTP sta
tus code is not handled or not allowed
2020-11-11 02:38:34 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10086521-afunanzhixiang-afnan-perfumes.html> (referer: https://www
.nosetime.com/pinpai/2-a.html)
2020-11-11 02:38:34 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10086521-afunanzhixiang-afnan-perfumes.html>: H
TTP status code is not handled or not allowed
2020-11-11 02:38:40 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10021207-adaofu-duominggesi-adolfo-dominguez.html> (referer: https
://www.nosetime.com/pinpai/2-a.html)
2020-11-11 02:38:40 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10021207-adaofu-duominggesi-adolfo-dominguez.ht
ml>: HTTP status code is not handled or not allowed
2020-11-11 02:38:46 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10058341-yabaoxin-aubusson.html> (referer: https://www.nosetime.co
m/pinpai/2-a.html)
2020-11-11 02:38:47 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10058341-yabaoxin-aubusson.html>: HTTP status c
ode is not handled or not allowed
2020-11-11 02:38:50 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10019426-angela-ciampagna.html> (referer: https://www.nosetime.com
/pinpai/2-a.html)
2020-11-11 02:38:50 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10019426-angela-ciampagna.html>: HTTP status co
de is not handled or not allowed
2020-11-11 02:38:54 [scrapy.extensions.logstats] INFO: Crawled 718 pages (at 10 pages/min), scraped 0 items (at 0 items/min)
2020-11-11 02:38:55 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10091158-anfasi-anfass.html> (referer: https://www.nosetime.com/pi
npai/2-a.html)
2020-11-11 02:38:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10091158-anfasi-anfass.html>: HTTP status code
is not handled or not allowed
2020-11-11 02:38:58 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10035539-antonio-banderas.html> (referer: https://www.nosetime.com
/pinpai/2-a.html)
2020-11-11 02:38:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10035539-antonio-banderas.html>: HTTP status co
de is not handled or not allowed
2020-11-11 02:39:03 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10023035-an-jielade-ann-gerard.html> (referer: https://www.nosetim
e.com/pinpai/2-a.html)
2020-11-11 02:39:03 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10023035-an-jielade-ann-gerard.html>: HTTP stat
us code is not handled or not allowed
2020-11-11 02:39:09 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10038064-siyuefenfang-april-aromatics.html> (referer: https://www.
nosetime.com/pinpai/2-a.html)
2020-11-11 02:39:10 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10038064-siyuefenfang-april-aromatics.html>: HT
TP status code is not handled or not allowed
2020-11-11 02:39:17 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10058274-affinessence.html> (referer: https://www.nosetime.com/pin
pai/2-a.html)
2020-11-11 02:39:17 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10058274-affinessence.html>: HTTP status code i
s not handled or not allowed
2020-11-11 02:39:24 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10070090-altaia.html> (referer: https://www.nosetime.com/pinpai/2-
a.html)
2020-11-11 02:39:24 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10070090-altaia.html>: HTTP status code is not
handled or not allowed
2020-11-11 02:39:31 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10092558-niziranzhizi-adopt-by-reserve-naturelle.html> (referer: h
ttps://www.nosetime.com/pinpai/2-a.html)
2020-11-11 02:39:31 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10092558-niziranzhizi-adopt-by-reserve-naturell
e.html>: HTTP status code is not handled or not allowed
2020-11-11 02:39:38 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10081520-ariana-grande.html> (referer: https://www.nosetime.com/pi
npai/2-a.html)
2020-11-11 02:39:38 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10081520-ariana-grande.html>: HTTP status code
is not handled or not allowed
2020-11-11 02:39:42 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10040587-amouroud.html> (referer: https://www.nosetime.com/pinpai/
2-a.html)
2020-11-11 02:39:42 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10040587-amouroud.html>: HTTP status code is no
t handled or not allowed
2020-11-11 02:39:49 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10048515-atelier-des-ors.html> (referer: https://www.nosetime.com/
pinpai/2-a.html)
2020-11-11 02:39:49 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10048515-atelier-des-ors.html>: HTTP status cod
e is not handled or not allowed
2020-11-11 02:39:54 [scrapy.extensions.logstats] INFO: Crawled 728 pages (at 10 pages/min), scraped 0 items (at 0 items/min)
2020-11-11 02:39:56 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10071490-areej-le-dore.html> (referer: https://www.nosetime.com/pi
npai/2-a.html)
2020-11-11 02:39:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10071490-areej-le-dore.html>: HTTP status code
is not handled or not allowed
2020-11-11 02:40:02 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10087560-amanbeisi-armand-basi.html> (referer: https://www.nosetim
e.com/pinpai/2-a.html)
2020-11-11 02:40:02 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10087560-amanbeisi-armand-basi.html>: HTTP stat
us code is not handled or not allowed
2020-11-11 02:40:07 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10018412-yisuo-aesop.html> (referer: https://www.nosetime.com/pinp
ai/2-a.html)
2020-11-11 02:40:07 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10018412-yisuo-aesop.html>: HTTP status code is
not handled or not allowed
2020-11-11 02:40:13 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10084271-amafu-armaf.html> (referer: https://www.nosetime.com/pinp
ai/2-a.html)
2020-11-11 02:40:14 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10084271-amafu-armaf.html>: HTTP status code is
not handled or not allowed
2020-11-11 02:40:20 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10086871-mishi-agent-provocateur.html> (referer: https://www.noset
ime.com/pinpai/2-a.html)
2020-11-11 02:40:20 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10086871-mishi-agent-provocateur.html>: HTTP st
atus code is not handled or not allowed
2020-11-11 02:40:27 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10064463-a-f-abercrombie-fitch.html> (referer: https://www.nosetim
e.com/pinpai/2-a.html)
2020-11-11 02:40:27 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10064463-a-f-abercrombie-fitch.html>: HTTP stat
us code is not handled or not allowed
2020-11-11 02:40:32 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10038832-adesi-weinitasi-aedes-de-venustas.html> (referer: https:/
/www.nosetime.com/pinpai/2-a.html)
2020-11-11 02:40:33 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10038832-adesi-weinitasi-aedes-de-venustas.html
>: HTTP status code is not handled or not allowed
2020-11-11 02:40:40 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10033391-aikekapa-acca-kappa.html> (referer: https://www.nosetime.
com/pinpai/2-a.html)
2020-11-11 02:40:40 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10033391-aikekapa-acca-kappa.html>: HTTP status
code is not handled or not allowed
2020-11-11 02:40:47 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10035498-annasu-anna-sui.html> (referer: https://www.nosetime.com/
pinpai/2-a.html)
2020-11-11 02:40:47 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10035498-annasu-anna-sui.html>: HTTP status cod
e is not handled or not allowed
2020-11-11 02:40:52 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10020871-aimu-amouage.html> (referer: https://www.nosetime.com/pin
pai/2-a.html)
2020-11-11 02:40:52 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10020871-aimu-amouage.html>: HTTP status code i
s not handled or not allowed
2020-11-11 02:40:54 [scrapy.extensions.logstats] INFO: Crawled 738 pages (at 10 pages/min), scraped 0 items (at 0 items/min)
2020-11-11 02:40:58 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10051198-paermazhishui-acqua-di-parma.html> (referer: https://www.
nosetime.com/pinpai/2-a.html)
2020-11-11 02:40:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10051198-paermazhishui-acqua-di-parma.html>: HT
TP status code is not handled or not allowed
2020-11-11 02:40:58 [scrapy.core.engine] INFO: Closing spider (finished)
2020-11-11 02:40:58 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
C:\Users\antoi\Documents\Programming\Learning\DataScience\nosetime_scraper>tree /F
Folder PATH listing
Volume serial number is 804E-C690
C:.
│ perfumes.jl
│ scrapy.cfg
│
├───.idea
│ │ .gitignore
│ │ misc.xml
│ │ modules.xml
│ │ nosetime_scraper.iml
│ │ workspace.xml
│ │
│ └───inspectionProfiles
│ profiles_settings.xml
│
└───nosetime_scraper
│ items.py
│ middlewares.py
│ pipelines.py
│ settings.py
│ __init__.py
│
├───spiders
│ │ nosetime_spider.py
│ │ __init__.py
│ │
│ └───__pycache__
│ nosetime_spider.cpython-36.pyc
│ __init__.cpython-36.pyc
│
└───__pycache__
pipelines.cpython-36.pyc
settings.cpython-36.pyc
__init__.cpython-36.pyc