Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/340.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 刮擦:错误:下载时出错<;得到http://stackoverflow.com/questions?sort=votes> 类型错误:';浮动';对象是不可编辑的_Python_Scrapy Spider - Fatal编程技术网

Python 刮擦:错误:下载时出错<;得到http://stackoverflow.com/questions?sort=votes> 类型错误:';浮动';对象是不可编辑的

Python 刮擦:错误:下载时出错<;得到http://stackoverflow.com/questions?sort=votes> 类型错误:';浮动';对象是不可编辑的,python,scrapy-spider,Python,Scrapy Spider,我是python和scrapy的新手,我从视频中复制了这些代码,它们在视频中工作得很好,但当我尝试时,出现了一个类型错误“float”对象不可编辑,下面是代码 import scrapy class StackOverflowSpider(scrapy.Spider): name="stackoverflow" start_urls=["http://stackoverflow.com/questions?sort=votes"] def parse(self,response):

我是python和scrapy的新手,我从视频中复制了这些代码,它们在视频中工作得很好,但当我尝试时,出现了一个类型错误“float”对象不可编辑,下面是代码

import scrapy

class StackOverflowSpider(scrapy.Spider):
name="stackoverflow"
start_urls=["http://stackoverflow.com/questions?sort=votes"]

def parse(self,response):
    for href in response.css('.question-summary h3 a::attr(href)'):
        full_url=response.urljoin(href.extract())
        yield scrapy.Request(full_url,callback=self.parse_question)

def parse_question(self,response):
    yield {
        'title':response.css('h1 a::text').extract()[0],
        'votes':response.css(".question.vote-count-post::text").extract()[0],
        'body':response.css(".question.post-text").extract()[0],
        'tags':response.css(".question.post-tag::text").extract(),
        'link':response.url,
    }
下面是错误:

2017-03-10 16:06:39 [scrapy] INFO: Enabled item pipelines:[]
2017-03-10 16:06:39 [scrapy] INFO: Spider opened
2017-03-10 16:06:39 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-03-10 16:06:39 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-03-10 16:06:40 [scrapy] ERROR: Error downloading <GET http://stackoverflow.com/questions?sort=votes>
Traceback (most recent call last):
  File "C:\Anaconda2\lib\site-packages\twisted\internet\defer.py", line 1299, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "C:\Anaconda2\lib\site-packages\twisted\python\failure.py", line 393, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "C:\Anaconda2\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request
    defer.returnValue((yield download_func(request=request,spider=spider)))
  File "C:\Anaconda2\lib\site-packages\scrapy\utils\defer.py", line 45, in mustbe_deferred
    result = f(*args, **kw)
  File "C:\Anaconda2\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 65, in download_request
    return handler.download_request(request, spider)
  File "C:\Anaconda2\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 60, in download_request
    return agent.download_request(request)
  File "C:\Anaconda2\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 285, in download_request
    method, to_bytes(url, encoding='ascii'), headers, bodyproducer)
  File "C:\Anaconda2\lib\site-packages\twisted\web\client.py", line 1631, in request
    parsedURI.originForm)
  File "C:\Anaconda2\lib\site-packages\twisted\web\client.py", line 1408, in _requestWithEndpoint
    d = self._pool.getConnection(key, endpoint)
  File "C:\Anaconda2\lib\site-packages\twisted\web\client.py", line 1294, in getConnection
    return self._newConnection(key, endpoint)
  File "C:\Anaconda2\lib\site-packages\twisted\web\client.py", line 1306, in _newConnection
    return endpoint.connect(factory)
  File "C:\Anaconda2\lib\site-packages\twisted\internet\endpoints.py", line 788, in connect
    EndpointReceiver, self._hostText, portNumber=self._port
  File "C:\Anaconda2\lib\site-packages\twisted\internet\_resolver.py", line 174, in resolveHostName
    onAddress = self._simpleResolver.getHostByName(hostName)
  File "C:\Anaconda2\lib\site-packages\scrapy\resolver.py", line 21, in getHostByName
    d = super(CachingThreadedResolver, self).getHostByName(name, timeout)
  File "C:\Anaconda2\lib\site-packages\twisted\internet\base.py", line 276, in getHostByName
    timeoutDelay = sum(timeout)
TypeError: 'float' object is not iterable
2017-03-10 16:06:40 [scrapy] INFO: Closing spider (finished)
2017-03-10 16:06:40 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 1,
 'downloader/exception_type_count/exceptions.TypeError': 1,
 'downloader/request_bytes': 235,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2017, 3, 10, 8, 6, 40, 117000),
 'log_count/DEBUG': 1,
 'log_count/ERROR': 1,
 'log_count/INFO': 7,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2017, 3, 10, 8, 6, 39, 797000)}
2017-03-10 16:06:40 [scrapy] INFO: Spider closed (finished)
2017-03-10 16:06:39[scrapy]信息:启用的项目管道:[]
2017-03-10 16:06:39[剪贴]信息:蜘蛛打开
2017-03-10 16:06:39[抓取]信息:抓取0页(以0页/分钟的速度),抓取0项(以0项/分钟的速度)
2017-03-10 16:06:39[scrapy]调试:Telnet控制台监听127.0.0.1:6023
2017-03-10 16:06:40[scrapy]错误:下载错误
回溯(最近一次呼叫最后一次):
文件“C:\Anaconda2\lib\site packages\twisted\internet\defer.py”,第1299行,在\u inlineCallbacks中
结果=结果。通过ExceptionToGenerator(g)
文件“C:\Anaconda2\lib\site packages\twisted\python\failure.py”,第393行,位于ThroweExceptionToGenerator中
返回g.throw(self.type、self.value、self.tb)
文件“C:\Anaconda2\lib\site packages\scrapy\core\downloader\middleware.py”,第43行,处理中\u请求
defer.returnValue((yield download_func(request=request,spider=spider)))
文件“C:\Anaconda2\lib\site packages\scrapy\utils\defer.py”,第45行,必须延迟
结果=f(*参数,**kw)
下载请求中第65行的文件“C:\Anaconda2\lib\site packages\scrapy\core\downloader\handlers\\uuuuu init\uuuuu.py”
返回处理程序。下载\u请求(请求,spider)
下载请求中第60行的文件“C:\Anaconda2\lib\site packages\scrapy\core\downloader\handlers\http11.py”
返回代理。下载请求(请求)
下载请求中第285行的文件“C:\Anaconda2\lib\site packages\scrapy\core\downloader\handlers\http11.py”
方法,到字节(url,encoding='ascii'),头,bodyproducer)
请求中第1631行的文件“C:\Anaconda2\lib\site packages\twisted\web\client.py”
parsedURI.原件)
文件“C:\Anaconda2\lib\site packages\twisted\web\client.py”,第1408行,在_requestWithEndpoint中
d=self.\u pool.getConnection(键,端点)
文件“C:\Anaconda2\lib\site packages\twisted\web\client.py”,第1294行,位于getConnection中
返回self.\u新连接(键,端点)
文件“C:\Anaconda2\lib\site packages\twisted\web\client.py”,第1306行,在\u newConnection中
返回端点.connect(工厂)
文件“C:\Anaconda2\lib\site packages\twisted\internet\endpoints.py”,第788行,在connect中
EndpointReceiver,self.\u主机文本,端口号=self.\u端口
resolveHostName中的文件“C:\Anaconda2\lib\site packages\twisted\internet\\u resolver.py”,第174行
onAddress=self.\u simpleResolver.getHostByName(主机名)
文件“C:\Anaconda2\lib\site packages\scrapy\resolver.py”,第21行,位于getHostByName中
d=super(CachingThreadedResolver,self).getHostByName(名称,超时)
文件“C:\Anaconda2\lib\site packages\twisted\internet\base.py”,第276行,位于getHostByName中
timeoutDelay=总和(超时)
TypeError:“float”对象不可编辑
2017-03-10 16:06:40[scrapy]信息:关闭卡盘(已完成)
2017-03-10 16:06:40[scrapy]信息:倾销scrapy统计数据:
{'downloader/exception_count':1,
“downloader/exception\u type\u count/exceptions.TypeError”:1,
“下载程序/请求_字节”:235,
“下载程序/请求计数”:1,
“downloader/request\u method\u count/GET”:1,
“完成原因”:“完成”,
“完成时间”:datetime.datetime(2017,3,10,8,6,40,117000),
“日志计数/调试”:1,
“日志计数/错误”:1,
“日志计数/信息”:7,
“调度程序/出列”:1,
“调度程序/出列/内存”:1,
“调度程序/排队”:1,
“调度程序/排队/内存”:1,
“开始时间”:datetime.datetime(2017,3,10,8,6,39797000)}
2017-03-10 16:06:40[scrapy]信息:蜘蛛网关闭(完成)

谢谢你的帮助

您的代码在python3中工作,但项目列表为空,我删除了索引并再次运行它:

2017-03-10 16:48:34 [scrapy.core.scraper] DEBUG: Scraped from <200 http://stackoverflow.com/questions/179123/how-to-modify-existing-unpushed-commits>
{'link': 'http://stackoverflow.com/questions/179123/how-to-modify-existing-unpushed-commits', 'title': ['How to modify existing, unpushed commits?'], 'votes': [], 'body': [], 'tags': []}
2017-03-10 16:48:34[scrapy.core.scraper]调试:从
{'link':'http://stackoverflow.com/questions/179123/how-to-modify-existing-unpushed-commits“,”标题“:[”如何修改现有的未推式提交?“],”投票“:[],”正文“:[],”标记“:[]]

我知道这是个老问题。但我发现了一个不同的解决方案:也许你应该尝试
conda install scrapy
而不是
pip install scrapy

这是运行命令后安装的依赖项:

The following NEW packages will be INSTALLED: attrs: 15.2.0-py27_0 automat: 0.5.0-py27_0 constantly: 15.1.0-py27_0 cssselect: 1.0.1-py27_0 hyperlink: 17.1.1-py27_0 incremental: 16.10.1-py27_0 parsel: 1.2.0-py27_0 pyasn1: 0.2.3-py27_0 pyasn1-modules: 0.0.8-py27_0 pydispatcher: 2.0.5-py27_0 queuelib: 1.4.2-py27_0 scrapy: 1.3.3-py27_0 service_identity: 17.0.0-py27_0 twisted: 17.5.0-py27_0 w3lib: 1.17.0-py27_0 zope: 1.0-py27_0 zope.interface: 4.4.2-py27_0 将安装以下新软件包: 属性:15.2.0-py27_0 自动机:0.5.0-py27_0 持续:15.1.0-py27_0 CSS选择:1.0.1-py27_0 超链接:17.1.1-py27_0 增量:16.10.1-py27_0 parsel:1.2.0-py27_0 pyasn1:0.2.3-py27_0 pyasn1模块:0.0.8-py27_0 pydispatcher:2.0.5-py27_0 queuelib:1.4.2-py27_0 刮痧:1.3.3-py27_0 服务_标识:17.0.0-py27_0 扭曲:17.5.0-py27_0 w3lib:1.17.0-py27_0 zope:1.0-py27_0 zope.interface:4.4.2-py27_0
谢谢,但我不太明白你的意思,你是说我的代码只在python3中工作?但是scrapy只支持windows中的Python2.7,而且在视频中老师也使用windows,你能告诉我具体应该怎么做吗?如何更改代码?@King.Lee您的代码在第一次请求时停止,但在我的环境中,它工作正常。唯一的问题是
response.css(“.question.post text”).extract()[0]
您正在为空列表编制索引。当我删除索引时,它会像我发布的那样返回一个空列表。很抱歉,即使你这样解释,我的问题仍然没有解决,我甚至将我的操作系统从windows更改为Ubuntu并使用python3.6,而我使用Anaconda设置scrapy,但代码仍然有相同的问题,我对这种情况非常困惑,我的环境有什么问题吗?非常感谢你的帮助!我已经解决了这个问题,它是由蟒蛇引起的,