无法提取<；的值；img alt>；用python_Python_Scrapy

无法提取<；的值；img alt>；用python

python scrapy

无法提取<；的值；img alt>；用python,python,scrapy,Python,Scrapy,我想从以下img标签中提取网站的品牌名称： <img src="http://i1.sdlcdn.com/img/brand/logo/2012-08-01-02-31-15-AOC.jpg" alt="Aoc" width="75" height="45"> 但我得到的是空值。请帮忙。这是你想要的吗 import re,requests url=requests.get(" http://www.snapdeal.com/product/aoc-e2060-swn-2

我想从以下img标签中提取网站的品牌名称：

      <img src="http://i1.sdlcdn.com/img/brand/logo/2012-08-01-02-31-15-AOC.jpg" alt="Aoc" width="75" height="45">

但我得到的是空值。请帮忙。

这是你想要的吗

import re,requests
url=requests.get(" http://www.snapdeal.com/product/aoc-e2060-swn-20-inch/622813?pos=0;85")
re.findall(r'\<img src=.* alt="(.*)" width',url.text)

import-re，请求
url=请求。获取（“http://www.snapdeal.com/product/aoc-e2060-swn-20-inch/622813?pos=0;85")
关于findall（r'\$scrapy shell
在[1]中：取数http://www.snapdeal.com/product/aoc-e2060-swn-20-inch/622813?pos=0;85')
2013-10-16 00:37:08+0000[默认]信息：蜘蛛网已打开
2013-10-16 00:37:08+0000[默认]调试：已爬网（200）（参考：无）
[2]中：hxs.select（'//a[contains（@class，“brandName”）]/img/@alt'）.extract（）[0]
Out[2]：u'Aoc'

最好始终使用XPath尽可能“接近”目标。所有这些div[1]/div[3]/span[1]废话都很脆弱，当页面更改时很可能会中断
您使用“img/@alt”
和.extract（）没有问题
获取alt属性。您到img节点的整体路径是错误的，仅此而已。这不提供问题的答案。若要评论或要求作者澄清，请在其帖子下方留下评论。明白了，但这确实提取了AOC，这正是他想要的。使用纯正则表达式从任意HTML提取信息是不可取的有责任感，所以这只适用于一次性工作。
import re,requests
url=requests.get(" http://www.snapdeal.com/product/aoc-e2060-swn-20-inch/622813?pos=0;85")
re.findall(r'\<img src=.* alt="(.*)" width',url.text)

$ scrapy shell
<SNIP>
In [1]: fetch('http://www.snapdeal.com/product/aoc-e2060-swn-20-inch/622813?pos=0;85')
2013-10-16 00:37:08+0000 [default] INFO: Spider opened
2013-10-16 00:37:08+0000 [default] DEBUG: Crawled (200) <GET http://www.snapdeal.com/product/aoc-e2060-swn-20-inch/622813?pos=0;85> (referer: None)
<SNIP>
In [2]: hxs.select('//a[contains(@class, "brandName")]/img/@alt').extract()[0]
Out[2]: u'Aoc'