Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/316.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在beautifulsoup4中根据图像中的内容分离图像链接_Python_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 如何在beautifulsoup4中根据图像中的内容分离图像链接

Python 如何在beautifulsoup4中根据图像中的内容分离图像链接,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我是新来的BeautifulSoup4,我试图从一个网站获取所有图片链接,例如Unsplash,但我只想在url中包含单词“photo”的url,例如 我不希望URL包含单词“个人资料”,例如 我正在使用Pyhton 3.6和urllib3。您可以使用此脚本作为示例,如何筛选链接: import requests from bs4 import BeautifulSoup url = 'https://unsplash.com' soup = BeautifulSoup(requests

我是新来的BeautifulSoup4,我试图从一个网站获取所有图片链接,例如Unsplash,但我只想在url中包含单词“photo”的url,例如

我不希望URL包含单词“个人资料”,例如


我正在使用Pyhton 3.6和urllib3。您可以使用此脚本作为示例,如何筛选链接:

import requests
from bs4 import BeautifulSoup


url = 'https://unsplash.com'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for img in soup.find_all('img'):
    if 'photo' in img['src']:  # print only links with `photo` inside them
        print(img['src'])
印刷品:

https://images.unsplash.com/photo-1597649260558-e2bd7d35f043?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format%2Ccompress&fit=crop&w=1000&h=1000
https://images.unsplash.com/photo-1598929214025-d6bb6167d43b?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&w=1000&q=80
https://images.unsplash.com/photo-1599567513879-604247ea2bd3?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&w=1000&q=80
https://images.unsplash.com/photo-1599366611308-719895c34512?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&w=1000&q=80
https://images.unsplash.com/photo-1598929214025-d6bb6167d43b?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&w=1000&q=80
https://images.unsplash.com/photo-1599366611308-719895c34512?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&w=1000&q=80
https://images.unsplash.com/photo-1599567513879-604247ea2bd3?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&w=1000&q=80
https://images.unsplash.com/photo-1598929214025-d6bb6167d43b?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&w=1000&q=80
https://images.unsplash.com/photo-1599567513879-604247ea2bd3?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&w=1000&q=80
https://images.unsplash.com/photo-1599366611308-719895c34512?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&w=1000&q=80

使用
urllib

import urllib.request
from bs4 import BeautifulSoup


url = 'https://unsplash.com'
soup = BeautifulSoup(urllib.request.urlopen(url).read(), 'html.parser')

for img in soup.find_all('img'):
    if 'photo' in img['src']:
        print(img['src'])

你可以简单地把它们全部取出来,然后过滤掉代码中不需要的部分。你能用代码添加一个例子吗,因为我对它不熟悉。Andrej Kesely说了很多