Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/variables/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 我们应该在哪里修改scrapy,以记住在刮片时导致403错误的网站?_Python 3.x_Scrapy_Http Status Code 403 - Fatal编程技术网

Python 3.x 我们应该在哪里修改scrapy,以记住在刮片时导致403错误的网站?

Python 3.x 我们应该在哪里修改scrapy,以记住在刮片时导致403错误的网站?,python-3.x,scrapy,http-status-code-403,Python 3.x,Scrapy,Http Status Code 403,我有一个刮刀,可以刮取url和嵌入的url,我想记录返回403的url: >>>scrapy crawl myscraper -o results.jl ... 2020-11-11 02:38:08 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10052252-anjisilesai-angel-schlesser.ht

我有一个刮刀,可以刮取url和嵌入的url,我想记录返回403的url:

>>>scrapy crawl myscraper -o results.jl
...
2020-11-11 02:38:08 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10052252-anjisilesai-angel-schlesser.html>: HTT
P status code is not handled or not allowed
2020-11-11 02:38:15 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10034901-aqisi-arquiste.html> (referer: https://www.nosetime.com/p
inpai/2-a.html)
2020-11-11 02:38:15 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10034901-aqisi-arquiste.html>: HTTP status code
 is not handled or not allowed
2020-11-11 02:38:20 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10070420-antonio-visconti.html> (referer: https://www.nosetime.com
/pinpai/2-a.html)
2020-11-11 02:38:20 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10070420-antonio-visconti.html>: HTTP status co
de is not handled or not allowed
2020-11-11 02:38:27 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10080993-alandelong-alain-delon.html> (referer: https://www.noseti
me.com/pinpai/2-a.html)
2020-11-11 02:38:27 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10080993-alandelong-alain-delon.html>: HTTP sta
tus code is not handled or not allowed
2020-11-11 02:38:34 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10086521-afunanzhixiang-afnan-perfumes.html> (referer: https://www
.nosetime.com/pinpai/2-a.html)
2020-11-11 02:38:34 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10086521-afunanzhixiang-afnan-perfumes.html>: H
TTP status code is not handled or not allowed
2020-11-11 02:38:40 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10021207-adaofu-duominggesi-adolfo-dominguez.html> (referer: https
://www.nosetime.com/pinpai/2-a.html)
2020-11-11 02:38:40 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10021207-adaofu-duominggesi-adolfo-dominguez.ht
ml>: HTTP status code is not handled or not allowed
2020-11-11 02:38:46 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10058341-yabaoxin-aubusson.html> (referer: https://www.nosetime.co
m/pinpai/2-a.html)
2020-11-11 02:38:47 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10058341-yabaoxin-aubusson.html>: HTTP status c
ode is not handled or not allowed
2020-11-11 02:38:50 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10019426-angela-ciampagna.html> (referer: https://www.nosetime.com
/pinpai/2-a.html)
2020-11-11 02:38:50 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10019426-angela-ciampagna.html>: HTTP status co
de is not handled or not allowed
2020-11-11 02:38:54 [scrapy.extensions.logstats] INFO: Crawled 718 pages (at 10 pages/min), scraped 0 items (at 0 items/min)
2020-11-11 02:38:55 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10091158-anfasi-anfass.html> (referer: https://www.nosetime.com/pi
npai/2-a.html)
2020-11-11 02:38:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10091158-anfasi-anfass.html>: HTTP status code
is not handled or not allowed
2020-11-11 02:38:58 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10035539-antonio-banderas.html> (referer: https://www.nosetime.com
/pinpai/2-a.html)
2020-11-11 02:38:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10035539-antonio-banderas.html>: HTTP status co
de is not handled or not allowed
2020-11-11 02:39:03 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10023035-an-jielade-ann-gerard.html> (referer: https://www.nosetim
e.com/pinpai/2-a.html)
2020-11-11 02:39:03 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10023035-an-jielade-ann-gerard.html>: HTTP stat
us code is not handled or not allowed
2020-11-11 02:39:09 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10038064-siyuefenfang-april-aromatics.html> (referer: https://www.
nosetime.com/pinpai/2-a.html)
2020-11-11 02:39:10 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10038064-siyuefenfang-april-aromatics.html>: HT
TP status code is not handled or not allowed
2020-11-11 02:39:17 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10058274-affinessence.html> (referer: https://www.nosetime.com/pin
pai/2-a.html)
2020-11-11 02:39:17 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10058274-affinessence.html>: HTTP status code i
s not handled or not allowed
2020-11-11 02:39:24 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10070090-altaia.html> (referer: https://www.nosetime.com/pinpai/2-
a.html)
2020-11-11 02:39:24 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10070090-altaia.html>: HTTP status code is not
handled or not allowed
2020-11-11 02:39:31 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10092558-niziranzhizi-adopt-by-reserve-naturelle.html> (referer: h
ttps://www.nosetime.com/pinpai/2-a.html)
2020-11-11 02:39:31 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10092558-niziranzhizi-adopt-by-reserve-naturell
e.html>: HTTP status code is not handled or not allowed
2020-11-11 02:39:38 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10081520-ariana-grande.html> (referer: https://www.nosetime.com/pi
npai/2-a.html)
2020-11-11 02:39:38 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10081520-ariana-grande.html>: HTTP status code
is not handled or not allowed
2020-11-11 02:39:42 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10040587-amouroud.html> (referer: https://www.nosetime.com/pinpai/
2-a.html)
2020-11-11 02:39:42 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10040587-amouroud.html>: HTTP status code is no
t handled or not allowed
2020-11-11 02:39:49 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10048515-atelier-des-ors.html> (referer: https://www.nosetime.com/
pinpai/2-a.html)
2020-11-11 02:39:49 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10048515-atelier-des-ors.html>: HTTP status cod
e is not handled or not allowed
2020-11-11 02:39:54 [scrapy.extensions.logstats] INFO: Crawled 728 pages (at 10 pages/min), scraped 0 items (at 0 items/min)
2020-11-11 02:39:56 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10071490-areej-le-dore.html> (referer: https://www.nosetime.com/pi
npai/2-a.html)
2020-11-11 02:39:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10071490-areej-le-dore.html>: HTTP status code
is not handled or not allowed
2020-11-11 02:40:02 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10087560-amanbeisi-armand-basi.html> (referer: https://www.nosetim
e.com/pinpai/2-a.html)
2020-11-11 02:40:02 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10087560-amanbeisi-armand-basi.html>: HTTP stat
us code is not handled or not allowed
2020-11-11 02:40:07 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10018412-yisuo-aesop.html> (referer: https://www.nosetime.com/pinp
ai/2-a.html)
2020-11-11 02:40:07 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10018412-yisuo-aesop.html>: HTTP status code is
 not handled or not allowed
2020-11-11 02:40:13 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10084271-amafu-armaf.html> (referer: https://www.nosetime.com/pinp
ai/2-a.html)
2020-11-11 02:40:14 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10084271-amafu-armaf.html>: HTTP status code is
 not handled or not allowed
2020-11-11 02:40:20 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10086871-mishi-agent-provocateur.html> (referer: https://www.noset
ime.com/pinpai/2-a.html)
2020-11-11 02:40:20 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10086871-mishi-agent-provocateur.html>: HTTP st
atus code is not handled or not allowed
2020-11-11 02:40:27 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10064463-a-f-abercrombie-fitch.html> (referer: https://www.nosetim
e.com/pinpai/2-a.html)
2020-11-11 02:40:27 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10064463-a-f-abercrombie-fitch.html>: HTTP stat
us code is not handled or not allowed
2020-11-11 02:40:32 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10038832-adesi-weinitasi-aedes-de-venustas.html> (referer: https:/
/www.nosetime.com/pinpai/2-a.html)
2020-11-11 02:40:33 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10038832-adesi-weinitasi-aedes-de-venustas.html
>: HTTP status code is not handled or not allowed
2020-11-11 02:40:40 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10033391-aikekapa-acca-kappa.html> (referer: https://www.nosetime.
com/pinpai/2-a.html)
2020-11-11 02:40:40 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10033391-aikekapa-acca-kappa.html>: HTTP status
 code is not handled or not allowed
2020-11-11 02:40:47 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10035498-annasu-anna-sui.html> (referer: https://www.nosetime.com/
pinpai/2-a.html)
2020-11-11 02:40:47 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10035498-annasu-anna-sui.html>: HTTP status cod
e is not handled or not allowed
2020-11-11 02:40:52 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10020871-aimu-amouage.html> (referer: https://www.nosetime.com/pin
pai/2-a.html)
2020-11-11 02:40:52 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10020871-aimu-amouage.html>: HTTP status code i
s not handled or not allowed
2020-11-11 02:40:54 [scrapy.extensions.logstats] INFO: Crawled 738 pages (at 10 pages/min), scraped 0 items (at 0 items/min)
2020-11-11 02:40:58 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.nosetime.com/pinpai/10051198-paermazhishui-acqua-di-parma.html> (referer: https://www.
nosetime.com/pinpai/2-a.html)
2020-11-11 02:40:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.nosetime.com/pinpai/10051198-paermazhishui-acqua-di-parma.html>: HT
TP status code is not handled or not allowed
2020-11-11 02:40:58 [scrapy.core.engine] INFO: Closing spider (finished)
2020-11-11 02:40:58 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
C:\Users\antoi\Documents\Programming\Learning\DataScience\nosetime_scraper>tree /F
Folder PATH listing
Volume serial number is 804E-C690
C:.
│   perfumes.jl
│   scrapy.cfg
│
├───.idea
│   │   .gitignore
│   │   misc.xml
│   │   modules.xml
│   │   nosetime_scraper.iml
│   │   workspace.xml
│   │
│   └───inspectionProfiles
│           profiles_settings.xml
│
└───nosetime_scraper
    │   items.py
    │   middlewares.py
    │   pipelines.py
    │   settings.py
    │   __init__.py
    │
    ├───spiders
    │   │   nosetime_spider.py
    │   │   __init__.py
    │   │
    │   └───__pycache__
    │           nosetime_spider.cpython-36.pyc
    │           __init__.cpython-36.pyc
    │
    └───__pycache__
            pipelines.cpython-36.pyc
            settings.cpython-36.pyc
            __init__.cpython-36.pyc