Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/asp.net-core/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python BeautifulSoup当img src有../时,如何从img src获取url。。?_Python_Beautifulsoup - Fatal编程技术网

Python BeautifulSoup当img src有../时,如何从img src获取url。。?

Python BeautifulSoup当img src有../时,如何从img src获取url。。?,python,beautifulsoup,Python,Beautifulsoup,假设我试图获得某个图像的链接,如下所示: from bs4 import BeautfiulSoup import urlparse soup = BeautifulSoup("http://examplesite.com") for image in soup.findAll("img"): srcd = urlparse.urlparse(src) path = srcd.path # gets the path fn = os.path.basename(path

假设我试图获得某个图像的链接,如下所示:

from bs4 import BeautfiulSoup
import urlparse

soup = BeautifulSoup("http://examplesite.com")
for image in soup.findAll("img"):
    srcd = urlparse.urlparse(src)
    path = srcd.path # gets the path
    fn = os.path.basename(path) # gets filename

# lets say the webpage i was scraping had their images like this:
# <img src="../..someimage.jpg" />
从bs4导入BeautfiulSoup
导入URL解析
汤=美汤http://examplesite.com")
对于soup.findAll(“img”)中的图像:
srcd=urlparse.urlparse(src)
path=srcd.path#获取路径
fn=os.path.basename(path)#获取文件名
#假设我正在抓取的网页上有这样的图片:
# 

有什么简单的方法可以从中获取完整的url吗?还是必须使用正则表达式?

使用
urlparse.urljoin

>>> import urlparse
>>> base_url = "http://example.com/foo/"
>>> urlparse.urljoin(base_url, "../bar")
'http://example.com/bar'
>>> urlparse.urljoin(base_url, "/baz")
'http://example.com/baz'

完整的URL依赖于基础URI,基础URI依赖于上下文(通常是从中检索页面的URL,但要小心iFrame和manual)