Python 获取链接的链接_Python_Lxml

Python 获取链接的链接

python

Python 获取链接的链接,python,lxml,Python,Lxml,我正在使用lxml和python。我想获取链接的href，该链接阅读更多评论(‎40）在这方面。我基本上是放弃这个网站，并希望得到评论谢谢你的帮助。Thanx该链接是使用客户端javascript添加的。因此，您无法使用普通HTML解析获取href。但是，您可以查看javascript代码并从中获取链接： >>> import re >>> import urllib2 >>> import lxml.html >>> p

我正在使用lxml和python。我想获取链接的href，该链接阅读更多评论(‎40）在这方面。我基本上是放弃这个网站，并希望得到评论

谢谢你的帮助。Thanx

该链接是使用客户端javascript添加的。因此，您无法使用普通HTML解析获取href。但是，您可以查看javascript代码并从中获取链接：

>>> import re
>>> import urllib2
>>> import lxml.html
>>> page = urllib2.urlopen("http://maps.google.com/maps/place?cid=2860002122405830765").read()

# have to search the page source since the link is added in javascript
>>> mo = re.search(r'<div class="pp-more-reviews">.*?</div>', page)
>>> div = lxml.html.fromstring(mo.group(0))
>>> href = div.find("a").attrib["href"]

>>重新导入
>>>导入urllib2
>>>导入lxml.html
>>>page=urlib2.urlopen（“http://maps.google.com/maps/place?cid=2860002122405830765）改为
#必须搜索页面源代码，因为链接是在javascript中添加的
>>>mo=重新搜索（r'.*？'，第页）
>>>div=lxml.html.fromstring（mo.group（0））
>>>href=div.find（“a”）.attrib[“href”]

其他选择包括：

用于控制真实的浏览器

使用无头浏览器

我试着用下面的方法做。不是很优雅，但仍能解决问题

response = urllib.urlopen('http://maps.google.com/maps/place?cid=7101561317478851901').read()
dom = html.fromstring(response)
href = dom.find_class('pp-more-reviews')[0].find_class('pp-more-content-link')[0].xpath('@href')

如果你能在下一页帮助我解决类似的问题。我想从Y中找出X行。人们认为这篇评论很有帮助。这是在每次审查。塔克斯