Python 为什么Scrapy没有';这页不行吗?

Python 为什么Scrapy没有';这页不行吗?,python,web-scraping,scrapy,scrapy-spider,Python,Web Scraping,Scrapy,Scrapy Spider,我正在尝试爬网此网页: http://www.oddsportal.com/search/results/:69Dxbc61/ 这是我的代码: import scrapy class Test2Spider(scrapy.Spider): name = "test2" allowed_domains = ["oddportal.com"] start_urls = ( 'http://www.oddsportal.com/search/results/

我正在尝试爬网此网页:

http://www.oddsportal.com/search/results/:69Dxbc61/
这是我的代码:

import scrapy

class Test2Spider(scrapy.Spider):
    name = "test2"
    allowed_domains = ["oddportal.com"]
    start_urls = (
        'http://www.oddsportal.com/search/results/:69Dxbc61/',
    )

    def parse(self, response):
        for partita in response.css('tr.deactivate'):
            yield {
                'score': partita.css('td.table-score::text').extract_first(),
            }
但我明白了:

# scrapy runspider test2.py -o uno.json
2018-04-19 16:45:56 [scrapy] INFO: Scrapy 1.0.3 started (bot: cinvestbacktest)
2018-04-19 16:45:56 [scrapy] INFO: Optional features available: ssl, http11, boto
2018-04-19 16:45:56 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'cinvestbacktest.spiders', 'FEED_URI': 'uno.json', 'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter', 'SPIDER_MODULES': ['cinvestbacktest.spiders'], 'BOT_NAME': 'cinvestbacktest', 'FEED_FORMAT': 'json', 'HTTPCACHE_STORAGE': 'scrapy_splash.SplashAwareFSCacheStorage'}
2018-04-19 16:45:56 [scrapy] INFO: Enabled extensions: CloseSpider, FeedExporter, TelnetConsole, LogStats, CoreStats, SpiderState
2018-04-19 16:45:56 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, RedirectMiddleware, CookiesMiddleware, SplashCookiesMiddleware, SplashMiddleware, HttpCompressionMiddleware, ChunkedTransferMiddleware, DownloaderStats
2018-04-19 16:45:56 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, SplashDeduplicateArgsMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2018-04-19 16:45:56 [scrapy] INFO: Enabled item pipelines: 
2018-04-19 16:45:56 [scrapy] INFO: Spider opened
2018-04-19 16:45:56 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-04-19 16:45:56 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-04-19 16:45:56 [scrapy] DEBUG: Crawled (404) <GET http://www.oddsportal.com/search/results/:69Dxbc61/> (referer: None)
2018-04-19 16:45:56 [scrapy] DEBUG: Ignoring response <404 http://www.oddsportal.com/search/results/:69Dxbc61/>: HTTP status code is not handled or not allowed
2018-04-19 16:45:56 [scrapy] INFO: Closing spider (finished)
2018-04-19 16:45:56 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 241,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 12816,
 'downloader/response_count': 1,
 'downloader/response_status_count/404': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2018, 4, 19, 14, 45, 56, 756377),
 'log_count/DEBUG': 3,
 'log_count/INFO': 7,
 'response_received_count': 1,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2018, 4, 19, 14, 45, 56, 473849)}
2018-04-19 16:45:56 [scrapy] INFO: Spider closed (finished)
#scrapy runspider test2.py-o uno.json
2018-04-19 16:45:56[scrapy]信息:scrapy 1.0.3已启动(bot:cinvestbacktest)
2018-04-19 16:45:56[scrapy]信息:可选功能:ssl、http11、boto
2018-04-19 16:45:56[scrapy]信息:覆盖的设置:{'NEWSPIDER_模块':'cinvestbacktest.SPIDER','FEED_URI':'uno.json','DUPEFILTER_类':'scrapy_splash.SplashAwareDupeFilter','SPIDER_模块':['cinvestbacktest.SPIDER'],'BOT_NAME':'cinvestbacktest','FEED_FORMAT':'json','HTTPCACHE_STORAGE':'scrapy_splash.SplashAwareFSCacheStorage'}
2018-04-19 16:45:56[scrapy]信息:启用的扩展:CloseSpider、FeedExporter、TelnetConsole、LogStats、CoreStats、SpiderState
2018-04-19 16:45:56[剪贴]信息:启用的下载中间件:HttpAuthMiddleware,DownloadTimeoutMiddleware,UserAgentMiddleware,RetryMiddleware,DefaultHeadersMiddleware,MetaRefreshMiddleware,RedirectMiddleware,CookiesMiddleware,SplashCookiesMiddleware,SplashMiddleware,HttpCompressionMiddleware,ChunkedTransferMiddleware,下载者状态
2018-04-19 16:45:56[scrapy]信息:启用的蜘蛛中间件:HttpErrorMiddleware、SplashDeduplicateArgsMiddleware、OffItemIDdleware、RefererMiddleware、UrlLengthMiddleware、DepthMiddleware
2018-04-19 16:45:56[碎片]信息:启用的项目管道:
2018-04-19 16:45:56[剪贴]信息:蜘蛛打开
2018-04-19 16:45:56[抓取]信息:抓取0页(0页/分钟),抓取0项(0项/分钟)
2018-04-19 16:45:56[scrapy]调试:Telnet控制台监听127.0.0.1:6023
2018-04-19 16:45:56[scrapy]调试:爬网(404)(参考:无)
2018-04-19 16:45:56[scrapy]调试:忽略响应:HTTP状态代码未处理或不允许
2018-04-19 16:45:56[刮擦]信息:闭合卡盘(已完成)
2018-04-19 16:45:56[刮屑]信息:倾销刮屑统计数据:
{'downloader/request_bytes':241,
“下载程序/请求计数”:1,
“downloader/request\u method\u count/GET”:1,
“downloader/response_字节”:12816,
“下载程序/响应计数”:1,
“下载程序/响应状态\计数/404”:1,
“完成原因”:“完成”,
“完成时间”:datetime.datetime(2018,4,19,14,45,56,756377),
“日志计数/调试”:3,
“日志计数/信息”:7,
“响应\u已接收\u计数”:1,
“调度程序/出列”:1,
“调度程序/出列/内存”:1,
“调度程序/排队”:1,
“调度程序/排队/内存”:1,
“开始时间”:datetime.datetime(2018,4,19,14,45,56473849)}
2018-04-19 16:45:56[刮擦]信息:蜘蛛网关闭(完成)

为什么?

用scrapy打开网站时,该网站似乎出现了404错误,而在浏览器中查看时,该网站工作正常

这种行为通常意味着您的请求头有问题


在这种情况下,似乎只需设置一个不同的脚本就可以解决问题。

我刚刚运行了您的脚本,发现它工作得非常完美。但是,粘贴部分存在缩进问题。我已经修好了。@SIM您认为它为什么不能在我的计算机上运行?我在我的sublime文本编辑器中尝试了您的脚本,并使用了
标题的
CrawlerProcess()
运行了它。就这样。