Python 如何在div中的跨距中找到URL？_Python_Xpath_Scrapy

Python 如何在div中的跨距中找到URL？

python xpath scrapy

Python 如何在div中的跨距中找到URL？,python,xpath,scrapy,Python,Xpath,Scrapy,我正在试图找到一个位于一个跨度中的URL，它位于一个div中在本例中，它是与我要查找的类“company_url”的链接您需要使用两个/： './/span[@class="link"]/a[@class="company_url"]/@href' 完成后，您将获得您的url: In [2]: from lxml import html In [3]: x = html.fromstring(h) In [4]: d = x.xpath('//div[@class="links st

我正在试图找到一个位于一个跨度中的URL，它位于一个div中

在本例中，它是与我要查找的类“company_url”的链接

您需要使用两个

：

'.//span[@class="link"]/a[@class="company_url"]/@href'

完成后，您将获得您的url:

In [2]: from lxml import html

In [3]: x = html.fromstring(h)

In [4]: d = x.xpath('//div[@class="links standard"]')[0]

In [5]: d
Out[5]: <Element div at 0x7f13c0a00208>

In [6]: d.xpath('/span[@class="link"]/a[@class="company_url"]/@href')
Out[6]: []

In [7]: d.xpath('.//span[@class="link"]/a[@class="company_url"]/@href')
Out[7]: ['http://abacus.com']

然后运行上面的代码：

In [7]: d = response.xpath('//div[@class="links standard"]')[0]

In [8]:  d.xpath('/span[@class="link"]/a[@class="company_url"]/@href').extract_first()

In [9]:  d.xpath('.//span[@class="link"]/a[@class="company_url"]/@href').extract_first()
Out[9]: u'http://abacus.com'

这些XPath可以工作，但Scrapy仍然无法下载数据。在做了一些谷歌搜索（并通过在另一个网站上测试来证明Scrapy代码是好的）之后，我得出结论：Angel.co不允许机器人做scraping。谢谢你的帮助！我将有一个在网站上看，当我回到我的网站notebook@user1287245，请参见编辑。您需要在settings.py中添加一个用户代理

In [7]: d = response.xpath('//div[@class="links standard"]')[0]

In [8]:  d.xpath('/span[@class="link"]/a[@class="company_url"]/@href').extract_first()

In [9]:  d.xpath('.//span[@class="link"]/a[@class="company_url"]/@href').extract_first()
Out[9]: u'http://abacus.com'