Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/356.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Python请求提取href URL_Python_Python 3.x_Xpath_Python Requests_Lxml - Fatal编程技术网

使用Python请求提取href URL

使用Python请求提取href URL,python,python-3.x,xpath,python-requests,lxml,Python,Python 3.x,Xpath,Python Requests,Lxml,我想使用python中的requests包从xpath中提取URL。我可以得到文本,但我没有尝试提供URL。有人能帮忙吗 ipdb> webpage.xpath(xpath_url + '/text()') ['Text of the URL'] ipdb> webpage.xpath(xpath_url + '/a()') *** lxml.etree.XPathEvalError: Invalid expression ipdb> webpage.xpath(xpath_u

我想使用python中的requests包从xpath中提取URL。我可以得到文本,但我没有尝试提供URL。有人能帮忙吗

ipdb> webpage.xpath(xpath_url + '/text()')
['Text of the URL']
ipdb> webpage.xpath(xpath_url + '/a()')
*** lxml.etree.XPathEvalError: Invalid expression
ipdb> webpage.xpath(xpath_url + '/href()')
*** lxml.etree.XPathEvalError: Invalid expression
ipdb> webpage.xpath(xpath_url + '/url()')
*** lxml.etree.XPathEvalError: Invalid expression
我使用本教程开始学习:

看起来应该很容易,但在我的搜索过程中什么都没有出现


谢谢。

您最好使用:

您可以打印该行,将其添加到列表中,等等。要遍历该行,请使用:

links = soup.find_all('a href')
for link in links:
    print(link)

您是否尝试过
webpage.xpath(xpath\u url+'/@href')

以下是完整的代码:

from lxml import html
import requests

page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
webpage = html.fromstring(page.content)

webpage.xpath('//a/@href')
结果应该是:

[
  'http://econpy.pythonanywhere.com/ex/002.html',
  'http://econpy.pythonanywhere.com/ex/003.html', 
  'http://econpy.pythonanywhere.com/ex/004.html',
  'http://econpy.pythonanywhere.com/ex/005.html'
]

使用上下文管理器的好处:

with requests_html.HTMLSession() as s:
    try:
        r = s.get('http://econpy.pythonanywhere.com/ex/001.html')
        links = r.html.links
        for link in links:
            print(link)
    except:
        pass

你可以很容易地用硒

link = webpage.find_elemnt_by_xpath(*xpath url to element with link)
url = link.get_attribute('href')

您能提供xpath\u url的值吗?在第一行,xpath的解释似乎正确,但下面的xpath语句可能不正确。@jeedo您的评论帮助我意识到我的xpath以“div/h2/a”结束,因此根据jeremija的回答添加
/@href
就足够了。谢谢,谢谢
@href
有效。现在我需要去了解为什么文本是
text()
,href是
@href
。我相信这是因为
@
用于引用元素的属性,而
text()
返回所选节点的内容。bs4似乎是一种流行的方法。在本例中,我希望继续使用python请求,但这对于将来的参考肯定很有用。非常感谢。
with requests_html.HTMLSession() as s:
    try:
        r = s.get('http://econpy.pythonanywhere.com/ex/001.html')
        links = r.html.links
        for link in links:
            print(link)
    except:
        pass
link = webpage.find_elemnt_by_xpath(*xpath url to element with link)
url = link.get_attribute('href')