Python 使用xpath或css提取特定HREF

Python 使用xpath或css提取特定HREF,python,html,css,xpath,scrapy,Python,Html,Css,Xpath,Scrapy,最近,我解决了一个不寻常的问题,这不是一件小事。请您建议如何检索href 我正在用python scrapy搜索Tripadvisor的一些餐厅,需要从位置和联系人部分检索Google地图的链接(href属性)。你能建议怎么做吗 例如() 元素的代码: <a data-encoded-url="S0k3X2h0dHBzOi8vbWFwcy5nb29nbGUuY29tL21hcHM/c2FkZHI9JmRhZGRyPVNjYWJlbGxzdHIuKzEwLTExJTJDKzE0MT

最近,我解决了一个不寻常的问题,这不是一件小事。请您建议如何检索href

我正在用python scrapy搜索Tripadvisor的一些餐厅,需要从位置和联系人部分检索Google地图的链接(href属性)。你能建议怎么做吗 例如()

元素的代码:

<a data-encoded-url="S0k3X2h0dHBzOi8vbWFwcy5nb29nbGUuY29tL21hcHM/c2FkZHI9JmRhZGRyPVNjYWJlbGxzdHIuKzEwLTExJTJDKzE0MTA5K0JlcmxpbitHZXJtYW55QDUyLjQyODgxOCwxMy4xODI0MjFfeVBw" class="_2wKz--mA _27M8V6YV" target="_blank" href="**https://maps.google.com/maps?saddr=&amp;daddr=Scabellstr.+10-11%2C+14109+Berlin+Germany@52.428818,13.182421**"><span class="_2saB_OSe">Scabellstr. 10-11, 14109 Berlin Germany</span><span class="ui_icon external-link-no-box _2OpUzCuO"></span></a>
输出:

['<a data-encoded-url="Z3pLX2h0dHBzOi8vbWFwcy5nb29nbGUuY29tL21hcHM/c2FkZHI9JmRhZGRyPVNjYWJlbGxzdHIuKzEwLTExJTJDKzE0MTA5K0JlcmxpbitHZXJtYW55QDUyLjQyODgxOCwxMy4xODI0MjFfMk1z" class="_2wKz--mA _27M8V6YV" target="_blank"><span class="_2saB_OSe">Scabellstr. 10-11, 14109 Berlin Germany</span><span class="ui_icon external-link-no-box _2OpUzCuO"></span></a>',
['Scabellstr.10-1114109德国柏林',

'Website']

您可以尝试使用特定的XPath查询来获取href,如
“//a[contains(@class,'foobar')]/@href”
来检索元素的特定属性。

使用您已经获得的
数据编码url
,并使用Base64对其进行解码。例如:

>>> import base64
>>> base64.b64decode("Z3pLX2h0dHBzOi8vbWFwcy5nb29nbGUuY29tL21hcHM/c2FkZHI9JmRhZGRyPVNjYWJlbGxzdHIuKzEwLTExJTJDKzE0MTA5K0JlcmxpbitHZXJtYW55QDUyLjQyODgxOCwxMy4xODI0MjFfMk1z").decode("utf-8")
'gzK_https://maps.google.com/maps?saddr=&daddr=Scabellstr.+10-11%2C+14109+Berlin+Germany@52.428818,13.182421_2Ms'
然后,您可以删除
gzK\uu
前缀和
\u2ms
后缀,您将拥有您的URL

>>> import base64
>>> base64.b64decode("Z3pLX2h0dHBzOi8vbWFwcy5nb29nbGUuY29tL21hcHM/c2FkZHI9JmRhZGRyPVNjYWJlbGxzdHIuKzEwLTExJTJDKzE0MTA5K0JlcmxpbitHZXJtYW55QDUyLjQyODgxOCwxMy4xODI0MjFfMk1z").decode("utf-8")
'gzK_https://maps.google.com/maps?saddr=&daddr=Scabellstr.+10-11%2C+14109+Berlin+Germany@52.428818,13.182421_2Ms'