Python 如何在beautifulsoup中刮取图像src
我试图在以下代码中获取图像src:Python 如何在beautifulsoup中刮取图像src,python,beautifulsoup,Python,Beautifulsoup,我试图在以下代码中获取图像src: <img alt='Original Xiaomi Redmi Note 5 4GB RAM 64GB ROM Snapdragon S636 Octa Core Mobile Phone MIUI9 5.99" 2160*1080 4000mAh 12.0+5.0MP(China)' class="picCore" id="limage_32856997152" image-src="//ae01.alicdn.com/kf/HTB1WDJZbE_rK
<img alt='Original Xiaomi Redmi Note 5 4GB RAM 64GB ROM Snapdragon S636 Octa Core Mobile Phone MIUI9 5.99" 2160*1080 4000mAh 12.0+5.0MP(China)' class="picCore" id="limage_32856997152" image-src="//ae01.alicdn.com/kf/HTB1WDJZbE_rK1Rjy0Fcq6zEvVXaS/Original-Xiaomi-Redmi-Note-5-4GB-RAM-64GB-ROM-Snapdragon-S636-Octa-Core-Mobile-Phone-MIUI9.jpg_220x220xz.jpg" itemprop="image"/>
我尝试了此代码,但不起作用:
images=soup.find('img').get('image-src'))
通常我使用的是
get('src')
,它可以工作,但问题是:我需要使用不起作用的image src。您可以通过将标记视为字典来访问标记的属性。您可以通过.attrs
soup.find('img').attrs['image-src']
查看文档,我找到了适用于本例的find_all
方法:
这对我很有用:
for link in soup.find_all('img'):
print(link.get('image-src'))
这是我的完整代码:
from bs4 import BeautifulSoup
html_doc = """
<img alt='Original Xiaomi Redmi Note 5 4GB RAM 64GB ROM Snapdragon S636 Octa Core Mobile Phone MIUI9 5.99" 2160*1080 4000mAh 12.0+5.0MP(China)' class="picCore" id="limage_32856997152" image-src="//ae01.alicdn.com/kf/HTB1WDJZbE_rK1Rjy0Fcq6zEvVXaS/Original-Xiaomi-Redmi-Note-5-4GB-RAM-64GB-ROM-Snapdragon-S636-Octa-Core-Mobile-Phone-MIUI9.jpg_220x220xz.jpg" itemprop="image"/>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
for link in soup.find_all('img'):
print(link.get('image-src'))
如果id是静态的,则可以使用css id选择器来选择元素,然后使用子集来获取img src属性
from bs4 import BeautifulSoup as bs
html = '''
<img alt='Original Xiaomi Redmi Note 5 4GB RAM 64GB ROM Snapdragon S636 Octa Core Mobile Phone MIUI9 5.99" 2160*1080 4000mAh 12.0+5.0MP(China)' class="picCore" id="limage_32856997152" image-src="//ae01.alicdn.com/kf/HTB1WDJZbE_rK1Rjy0Fcq6zEvVXaS/Original-Xiaomi-Redmi-Note-5-4GB-RAM-64GB-ROM-Snapdragon-S636-Octa-Core-Mobile-Phone-MIUI9.jpg_220x220xz.jpg" itemprop="image"/>
'''
soup = bs(html, 'lxml')
print(soup.select_one('#limage_32856997152')['image-src'])
srcs = [ img['image-src'] for img in soup.select('.picCore[image-src]')]
print(srcs)
任何图像src,只需使用属性选择器
srcs = [img['image-src'] for img in soup.select('[image-src]')]
如果你想参加src你可以这样做
new_var = soup.find(attrs={"attribute" : "name_attr"})
imageItem= new_var.get('src')
可能重复的
new_var = soup.find(attrs={"attribute" : "name_attr"})
imageItem= new_var.get('src')