Python bs4：仅获取包含特定字符串的URL_Python_Web Scraping_Beautifulsoup

Python bs4：仅获取包含特定字符串的URL

python web-scraping

Python bs4：仅获取包含特定字符串的URL,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我正在制作一个图像刮板，希望能够从此链接中拍摄一些照片，然后将它们保存在名为dribblephotos : 以下是我检索到的链接： https://static.dribbble.com/users/458522/screenshots/6040912/nike_air_huarache_1x.jpg https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329

我正在制作一个图像刮板，希望能够从此链接中拍摄一些照片，然后将它们保存在名为

dribblephotos

以下是我检索到的链接：

https://static.dribbble.com/users/458522/screenshots/6040912/nike_air_huarache_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/105681/screenshots/3944640/hype_1x.png
https://static.dribbble.com/users/105681/avatars/mini/avatar-01-01.png?1377980605
https://static.dribbble.com/users/923409/screenshots/7179093/basketball_marly_gallardo_1x.jpg
https://static.dribbble.com/users/923409/avatars/mini/bc17b2db165c31804e1cbb1d4159462a.jpg?1596192494
https://static.dribbble.com/users/458522/screenshots/6034458/nike_air_jordan_i_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/1237425/screenshots/5071294/customize_air_jordan_web_2x.png
https://static.dribbble.com/users/1237425/avatars/mini/87ae45ac7a07dd69fe59985dc51c7f0f.jpeg?1524130139
https://static.dribbble.com/users/1174720/screenshots/6187664/adidas_2x.png
https://static.dribbble.com/users/1174720/avatars/mini/9de08da40078e869f1a680d2e43cdb73.png?1588733495
https://static.dribbble.com/users/179617/screenshots/4426819/ultraboost_1x.png
https://static.dribbble.com/users/179617/avatars/mini/2d545dc6c0dffc930a2b20ca3be88802.jpg?1596735027
https://static.dribbble.com/users/458522/screenshots/6126041/nike_air_max_270_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/60266/screenshots/6698826/nike_shoe_2x.jpg
https://static.dribbble.com/users/60266/avatars/mini/64826d925db1d4178258d17d8826842b.png?1549028805
https://static.dribbble.com/users/78464/screenshots/4950025/8x600_1x.jpg
https://static.dribbble.com/users/78464/avatars/mini/a9ae6a559ab479d179e8bd22591e4028.jpg?1465908886
https://static.dribbble.com/users/458522/screenshots/6118702/adidas_nmd_r1_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/458522/screenshots/6098953/nike_lebron_10_je_icon_qs_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/879147/screenshots/7152093/img_0966_2x.png
https://static.dribbble.com/users/879147/avatars/mini/e095f3837f221bb2ef652dcc966b99f7.jpg?1568473177
https://static.dribbble.com/users/458522/screenshots/6128979/nerd_x_adidas_pharrell_hu_nmd_trail_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/879147/screenshots/11064235/26fa4a2d-9033-4953-b48f-4c0e8a93fc9d_2x.png
https://static.dribbble.com/users/879147/avatars/mini/e095f3837f221bb2ef652dcc966b99f7.jpg?1568473177
https://static.dribbble.com/users/458522/screenshots/6132938/nike_moon_racer_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/1823684/screenshots/5973495/jordannn1_2x.png
https://static.dribbble.com/users/1823684/avatars/mini/f6041c082aec67302d4b78b8d203f02b.png?1509719582
https://static.dribbble.com/users/552027/screenshots/4666241/airmax270_1x.jpg
https://static.dribbble.com/users/552027/avatars/mini/35bb0dcb5a6619f68816290898bff6cc.jpg?1535884243
https://static.dribbble.com/users/458522/screenshots/6044426/adidas_pharrell_hu_nmd_trail_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/220914/screenshots/11295053/woman_shoe_tree_floating2_2x.png
https://static.dribbble.com/users/220914/avatars/mini/d364a9c166edb6d96cc059a836219a7d.jpg?1590773568
https://static.dribbble.com/users/4040486/screenshots/7079508/___2x.png
https://static.dribbble.com/users/4040486/avatars/mini/f31e9b50df877df815177e2015135ff7.png?1582521697
https://static.dribbble.com/users/57602/screenshots/12909636/d2_2x.png
https://static.dribbble.com/users/57602/avatars/mini/b4c27f3be2c61d82fbc821433d058b04.jpg?1575089000
https://static.dribbble.com/users/458522/screenshots/6049522/nike_x_john_elliott_lebron_10_soldier_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/1025917/screenshots/9738550/vans-2020-pixelwolfie-dribbble_2x.png
https://static.dribbble.com/users/1025917/avatars/mini/87fdcb145eab0b47eda29fc873f25f8c.png?1594466719
https://static.dribbble.com/assets/icon-backtotop-1b04df73090f6b0f3192a3b71874ca3b3cc19dff16adc6cf365cd0c75897f6c0.png
https://static.dribbble.com/assets/dribbble-ball-icon-e94956d5f010d19607348176b0ae90def55d61871a43cb4bcb6d771d8d235471.svg
https://static.dribbble.com/assets/icon-shot-x-light-40c073cd65443c99d4ac129b69bf578c8cf97d69b78990c00c4f8c5873b0d601.png
https://static.dribbble.com/assets/icon-shot-prev-light-ca583c76838d54eca11832ebbcaba09ba8b2bf347de2335341d244ecb9734593.png
https://static.dribbble.com/assets/icon-shot-next-light-871a18220c4c5a0325d1353f8e4cc204c3b49beacc63500644556faf25ded617.png
https://static.dribbble.com/assets/dribbble-square-c8c7a278e96146ee5a9b60c3fa9eeba58d2e5063793e2fc5d32366e1b34559d3.png
https://static.dribbble.com/assets/dribbble-ball-192-ec064e49e6f63d9a5fa911518781bee0c90688d052a038f8876ef0824f65eaf2.png
https://static.dribbble.com/assets/icon-overlay-x-2x-b7df2526b4c26d4e8410a7c437c433908be0c7c8c3c3402c3e578af5c50cf5a5.png

但是，我只希望能够抓取其中包含字符串“屏幕截图”的URL。因此，我尝试制作一个函数来抓取某些在URL中有“截图”的图像。例如：

https://static.dribbble.com/users/923409/screenshots/7179093/basketball_marly_gallardo_1x.jpg

起初，我做了一个函数来打印我想要的特定链接，看看是否有效。然而，它没有起作用。这是我的功能代码：

def art_links():
    images = []
    for img in x:
        images.append(img['src'])
    images = soup2.find_all("screenshots")
    print(images)

这是我的全部代码：

from bs4 import BeautifulSoup
import requests as rq 
import os 

r2 = rq.get("https://dribbble.com/search/shots/popular/illustration?q=sneaker%20")
soup2 = BeautifulSoup(r2.text, "html.parser")

links = []

x = soup2.select('img[src^="https://static.dribbble.com"]')

for img in x: 
    links.append(img['src'])

def art_links():
    images = []
    for img in x:
        images.append(img['src'])
    images = soup2.find_all("screenshots")
    print(images)
    

os.mkdir('dribblephotos') 


for index, img_link in enumerate(links):
    if "screenshots" in images:
    img_data = r.get(img_link).content
        with open("dribblephotos/" + str(index + 1) + '.jpg', 'wb+') as f:
            f.write(img_data)
        
    else:
        break
art_links()

我注意到代码末尾的if语句（而不是if下的tabbed语句）的语法有一点问题，所以我对它进行了一些重新格式化，以使其符合您的要求。我认为可能发生的情况是，您在最后的for循环中加入了else语句。这样一来，只要一个条目的链接中没有截图，它就会完全停止循环，而不是继续。虽然有一个关键字“continue”可以使用，但不放置else语句就足够了。您也在检查图像中的“屏幕截图”，但您尝试检查的链接名称在for循环中声明为img_链接。在最后的for循环中尝试此选项，看看您得到了什么：

for index, img_link in enumerate(links):
if "screenshots" in img_link:
    img_data = rq.get(img_link).content
    with open("dribblephotos/" + str(index + 1) + '.jpg', 'wb+') as f:
        f.write(img_data)

如果您仍然需要链接而不是文件下载，那么您应该能够在for循环中遍历图像时检索它们，并将它们存储在新列表中（如果是屏幕截图链接）

更新：这件最新的适合我。我删除了将IP放入循环后过滤掉的函数，因为这在已经循环了两次之后是不必要的。第一个for循环是您所需要的，迭代两次是不必要的，所以我只在第一次迭代时检查它，并且仅在需要时将链接保存到links列表

from bs4 import BeautifulSoup
import requests as rq
import os

r2 = rq.get("https://dribbble.com/search/shots/popular/illustration?q=sneaker%20")
soup2 = BeautifulSoup(r2.text, "html.parser")

links = []

x = soup2.select('img[src^="https://static.dribbble.com"]')

os.mkdir('dribblephotos')

# Only one for loop required, shouldn't iterate twice if not required
for index, img in enumerate(x):
    # Store the current url from the image result
    url = img["src"]
    # Check the url for screenshot before putting in the links
    if "screenshot" in url:
        links.append(img['src'])
        # Download the image
        img_data = rq.get(url).content
        # Put the image into the file
        with open("dribblephotos/" + str(index + 1) + '.jpg', 'wb+') as f:
            f.write(img_data)

print(links)

@谢谢你的帮助，我试了你推荐的。但是，打印出来的是一个空列表

[]

，因此我如何更改if语句以检查在我的for a循环中声明的img_链接？文件也没有下载@brandonlodwick，它是在我的文件上下载的。我删除了for循环之后的函数调用，让我看一看，againI也删除了函数调用，它没有下载任何文件，它只创建了文件夹@BrandonLodwickah litty！！谢谢你一直支持我，帮助我。我希望你能好好休息一天@布兰登利德威克