List 我有下面的字符串列表,但我想应用过滤器,以便我可以从列表中删除某些项目。如何做到这一点?

List 我有下面的字符串列表,但我想应用过滤器,以便我可以从列表中删除某些项目。如何做到这一点?,list,web-scraping,filter,beautifulsoup,python-3.6,List,Web Scraping,Filter,Beautifulsoup,Python 3.6,我正在尝试从以下位置获取图像数据 但是,我得到了一个包含不需要的链接的数据列表。我想应用过滤器,以便只能获取以/PIAimages开头的数据。如何应用过滤器来实现这一点 import requests from bs4 import BeautifulSoup import csv result = [] response = requests.get("https://www.ikea.com/sa/en/catalog/products/0036

我正在尝试从以下位置获取图像数据

但是,我得到了一个包含不需要的链接的数据列表。我想应用过滤器,以便只能获取以/PIAimages开头的数据。如何应用过滤器来实现这一点


    import requests
    from bs4 import BeautifulSoup
    import csv 

    result = []
    response = requests.get("https://www.ikea.com/sa/en/catalog/products/00361049/")
    assert response.ok
    page = BeautifulSoup(response.text, "html.parser")

    for des in page.find_all('img'):
       image= des.get('src')
       print(image)


预期产出:

/PIAimages/0531313_PE647261_S1.JPG
/PIAimages/0513228_PE638849_S1.JPG
/PIAimages/0618875_PE688687_S1.JPG
/PIAimages/0325432_PE517964_S1.JPG
/PIAimages/0690287_PE723209_S1.JPG
/PIAimages/0513996_PE639275_S1.JPG
/PIAimages/0325450_PE517970_S1.JPG
实际产量:

/ms/img/header/ikea-logo.svg
/ms/en_SA/img/header/ikea-store.png
/ms/img/header/main_menu_shadow.gif
/sa/en/images/products/strandmon-wing-chair-beige__0513996_PE639275_S4.JPG
/PIAimages/0531313_PE647261_S1.JPG
/PIAimages/0513228_PE638849_S1.JPG
/PIAimages/0618875_PE688687_S1.JPG
/PIAimages/0325432_PE517964_S1.JPG
/PIAimages/0690287_PE723209_S1.JPG
/PIAimages/0513996_PE639275_S1.JPG
/PIAimages/0325450_PE517970_S1.JPG
/ms/img/static/loading.gif
/ms/img/static/stock_check_green.gif
/ms/img/ads/services/ways_to_shop/20172_otav20a_assembly_20x20.jpg
/ms/en_SA/img/icons/picking-with-delivery.jpg
/ms/img/ads/services/ways_to_shop/20172_otav24a_pickingdelivery_20x20.jpg
/sa/en/images/products/strandmon-wing-chair-beige__0739100_PH147003_S4.JPG
https://smetrics.ikea.com/b/ss/ikeaallnojavascriptprod/5/?c8=sa&pageName=nojavascript

使用If子句,然后将数据追加到列表中

import requests
from bs4 import BeautifulSoup

result = []
response = requests.get("https://www.ikea.com/sa/en/catalog/products/00361049/")
assert response.ok
page = BeautifulSoup(response.text, "html.parser")
for des in page.find_all('img'):
    image= des.get('src')
    if 'PIAimages' in image:
        result.append(image)

print(result)
或者使用正则表达式。这要快得多

import requests
import re
from bs4 import BeautifulSoup

result = []
response = requests.get("https://www.ikea.com/sa/en/catalog/products/00361049/")
assert response.ok
page = BeautifulSoup(response.text, "html.parser")
for des in page.find_all('img', src=re.compile("PIAimages")):
    image= des.get('src')
    result.append(image)

print(result)

我认为使用css attribute=value选择器和start with操作符可以更快、更简洁。在选择器中为src指定起始子字符串,以便只返回符合条件的元素

import requests
from bs4 import BeautifulSoup

response = requests.get("https://www.ikea.com/sa/en/catalog/products/00361049/")
page = BeautifulSoup(response.text, "lxml")   
images = [item['src'] for item in page.select('img[src^=\/PIAimages]')]
print(images)