Python 刮取：如何在<；缩写>；标签_Python_Web Scraping_Lxml

Python 刮取：如何在<；缩写>；标签

python web-scraping

Python 刮取：如何在<；缩写>；标签,python,web-scraping,lxml,Python,Web Scraping,Lxml,我正在使用lxml和python浏览页面。该页面的链接为。我现在面临的难题是如何获取标记中的属性。例如，页面顶部的3颗金星有一个html <abbr title="3" class="average rating large star3">★★★☆☆</abbr> ★★★☆☆ 在这里，我想获取标题，这样我就知道这个位置得到了多少星星我试过做一些事情，包括： response = urllib.urlopen('http://w

我正在使用lxml和python浏览页面。该页面的链接为。我现在面临的难题是如何获取标记中的属性。例如，页面顶部的3颗金星有一个html

<abbr title="3" class="average rating large star3">★★★☆☆</abbr>

★★★☆☆

在这里，我想获取标题，这样我就知道这个位置得到了多少星星

我试过做一些事情，包括：

response = urllib.urlopen('http://www.insiderpages.com/b/3721895833/central-kia-of-irving-irving').read()
mo = re.search(r'<div class="rating_box">.*?</div>', response)
div = html.fromstring(mo.group(0))
title = div.find("abbr").attrib["title"]
print title

response=urllib.urlopen（'http://www.insiderpages.com/b/3721895833/central-kia-of-irving-irving）。读（）
mo=重新搜索（r'.*？'，响应）
div=html.fromstring（mo.group（0））
title=div.find（“缩写”）.attrib[“title”]
印刷品标题

但对我来说不起作用。非常感谢您的帮助。

您有lxml，请使用它的电源（）

你有lxml，使用它的power（）

你试过xpath吗

In [38]: from lxml import etree

In [39]: import urllib2

In [40]: html = etree.fromstring(urllib2.urlopen('http://www.insiderpages.com/b/3721895833/central-kia-of-irving-irving').read(), parser)

In [41]: html.xpath('//abbr')[0].xpath('./@title')
Out[41]: ['3']

你试过xpath吗

In [38]: from lxml import etree

In [39]: import urllib2

In [40]: html = etree.fromstring(urllib2.urlopen('http://www.insiderpages.com/b/3721895833/central-kia-of-irving-irving').read(), parser)

In [41]: html.xpath('//abbr')[0].xpath('./@title')
Out[41]: ['3']

你的更好。我不知道lxml可以自己获取页面。你的更好。我不知道lxml可以自己获取页面。