Python 属性错误：'；列表'；对象没有属性'；摘录'；？_Python_Python 2.7_Web Crawler

Python 属性错误：'；列表'；对象没有属性'；摘录'；？

python python-2.7 web-crawler

Python 属性错误：'；列表'；对象没有属性'；摘录'；？,python,python-2.7,web-crawler,Python,Python 2.7,Web Crawler,我只想通过xpath从这个url（）中提取信息。当我运行以下代码时，会出现AttributeError:“list”对象没有属性“extract”？我的模块导入错误还是不匹配 # -*- coding: utf-8 -*- import urllib2 import sys import lxml.html as HTML reload(sys) sys.setdefaultencoding("utf-8") class spider(object): def __init__

我只想通过xpath从这个url（）中提取信息。当我运行以下代码时，会出现AttributeError:“list”对象没有属性“extract”？我的模块导入错误还是不匹配

# -*- coding: utf-8 -*-

import urllib2
import sys
import lxml.html as HTML
reload(sys)
sys.setdefaultencoding("utf-8")


class spider(object):
    def __init__(self):
        print u'开始爬取内容'

def getSource(self, url):
    html = urllib2.Request(url)
    pageContent = urllib2.urlopen(html,timeout=60).read()
    return pageContent

def getUrl(self, pageContent):
    htmlSource = HTML.fromstring(pageContent)
    urlInfo = htmlSource.xpath('//dd[@class="tqs"]/span/a/@href').extract()[0]
    return urlInfo


if __name__ == "__main__":
    url = "http://www.tuniu.com/g3300/whole-nj-0/list-l1602-h0-i-j0_0/"
    tuniu = spider()
    tuniu.getUrl(url)

以下是错误

 Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "D:\anzhuang\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
execfile(filename, namespace)
 File "D:\anzhuang\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 71, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
 File "D:/python/tuniu2/tuniu.py", line 34, in <module>
tuniu.getUrl(url)
 File "D:/python/tuniu2/tuniu.py", line 27, in getUrl
urlInfo = htmlSource.xpath('//dd[@class="tqs"]/span/a/@href').extract()[0]
 AttributeError: 'list' object has no attribute 'extract'

回溯（最近一次呼叫最后一次）：
文件“”，第1行，在
文件“D:\anzhuang\Anaconda\lib\site packages\spyderlib\widgets\externalshell\sitecustomize.py”，第682行，在runfile中
execfile（文件名、命名空间）
文件“D:\anzhuang\Anaconda\lib\site packages\spyderlib\widgets\externalshell\sitecustomize.py”，第71行，在execfile中
exec（编译（脚本文本，文件名，'exec'），glob，loc）
文件“D:/python/tuniu2/tuniu.py”，第34行，在
tuniu.getUrl（url）
getUrl中第27行的文件“D:/python/tuniu2/tuniu.py”
urlInfo=htmlSource.xpath（'//dd[@class=“tqs”]/span/a/@href'）.extract（）[0]
AttributeError:“list”对象没有属性“extract”

xpath

将返回URL中包含的标记列表，因此您尝试提取列表中的标记，而不是其中包含的任何标记。如果您只希望提取第一个标记，那么您可能希望将

[0]

放在提取调用之前，如下所示：

urlInfo = htmlSource.xpath('//dd[@class="tqs"]/span/a/@href')[0].extract()

不清楚您想要哪种信息，但是如果它不包含在第一个标记中，那么您可能需要在

urlInfo

上使用

对urlInfo

中的标记进行迭代。然后

tag.extract（）

首先，使用url调用

getUrl

。它不获取url的内容。修改它以获取页面内容

不需要

提取

。要获取

href

，只需从返回的列表中获取一项即可

def getUrl(self, url):
    pageContent = self.getSource(url)  # <---
    htmlSource = HTML.fromstring(pageContent)
    urlInfo = htmlSource.xpath('//dd[@class="tqs"]/span/a/@href')[0]
    return urlInfo

def getUrl（self，url）：
pageContent=self.getSource（url）#