使用Python请求提取href URL_Python_Python 3.x_Xpath_Python Requests_Lxml

使用Python请求提取href URL

python python-3.x xpath

使用Python请求提取href URL,python,python-3.x,xpath,python-requests,lxml,Python,Python 3.x,Xpath,Python Requests,Lxml,我想使用python中的requests包从xpath中提取URL。我可以得到文本，但我没有尝试提供URL。有人能帮忙吗 ipdb> webpage.xpath(xpath_url + '/text()') ['Text of the URL'] ipdb> webpage.xpath(xpath_url + '/a()') *** lxml.etree.XPathEvalError: Invalid expression ipdb> webpage.xpath(xpath_u

我想使用python中的requests包从xpath中提取URL。我可以得到文本，但我没有尝试提供URL。有人能帮忙吗

ipdb> webpage.xpath(xpath_url + '/text()')
['Text of the URL']
ipdb> webpage.xpath(xpath_url + '/a()')
*** lxml.etree.XPathEvalError: Invalid expression
ipdb> webpage.xpath(xpath_url + '/href()')
*** lxml.etree.XPathEvalError: Invalid expression
ipdb> webpage.xpath(xpath_url + '/url()')
*** lxml.etree.XPathEvalError: Invalid expression

我使用本教程开始学习：

看起来应该很容易，但在我的搜索过程中什么都没有出现

谢谢。

您最好使用：

您可以打印该行，将其添加到列表中，等等。要遍历该行，请使用：

links = soup.find_all('a href')
for link in links:
    print(link)

您是否尝试过

webpage.xpath（xpath\u url+'/@href'）

以下是完整的代码：

from lxml import html
import requests

page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
webpage = html.fromstring(page.content)

webpage.xpath('//a/@href')

结果应该是：

[
  'http://econpy.pythonanywhere.com/ex/002.html',
  'http://econpy.pythonanywhere.com/ex/003.html', 
  'http://econpy.pythonanywhere.com/ex/004.html',
  'http://econpy.pythonanywhere.com/ex/005.html'
]

使用上下文管理器的好处：

with requests_html.HTMLSession() as s:
    try:
        r = s.get('http://econpy.pythonanywhere.com/ex/001.html')
        links = r.html.links
        for link in links:
            print(link)
    except:
        pass

你可以很容易地用硒

link = webpage.find_elemnt_by_xpath(*xpath url to element with link)
url = link.get_attribute('href')

您能提供xpath\u url的值吗？在第一行，xpath的解释似乎正确，但下面的xpath语句可能不正确。@jeedo您的评论帮助我意识到我的xpath以“div/h2/a”结束，因此根据jeremija的回答添加

/@href

就足够了。谢谢，谢谢

@href

有效。现在我需要去了解为什么文本是

text（）

，href是

@href

。我相信这是因为

用于引用元素的属性，而

text（）

返回所选节点的内容。bs4似乎是一种流行的方法。在本例中，我希望继续使用python请求，但这对于将来的参考肯定很有用。非常感谢。

with requests_html.HTMLSession() as s:
    try:
        r = s.get('http://econpy.pythonanywhere.com/ex/001.html')
        links = r.html.links
        for link in links:
            print(link)
    except:
        pass

link = webpage.find_elemnt_by_xpath(*xpath url to element with link)
url = link.get_attribute('href')