Python 3.x 使用beautifulsoup提取长属性值

Python 3.x 使用beautifulsoup提取长属性值,python-3.x,parsing,beautifulsoup,python-requests,lxml,Python 3.x,Parsing,Beautifulsoup,Python Requests,Lxml,重新编辑 好的,我需要分析一些网站,你能帮我分析这个奇怪的网站吗 <div class="cloudzoom-gallery e-item-card-photos-small_item" data-cloudzoom=" useZoom:"#item_card_zoom", image:"/upload/66/66ef9b3de11aeaba1bc50a42a1c8b880.jpg", zoomImage:"/upload/66/66ef9b3de11aeaba1bc50a42a

重新编辑

好的,我需要分析一些网站,你能帮我分析这个奇怪的网站吗

    <div class="cloudzoom-gallery e-item-card-photos-small_item" data-cloudzoom="
 useZoom:"#item_card_zoom", image:"/upload/66/66ef9b3de11aeaba1bc50a42a1c8b880.jpg", zoomImage:"/upload/66/66ef9b3de11aeaba1bc50a42a1c8b880.jpg""> <img width="44" src="/upload/66/32x44/66ef9b3de11aeaba1bc50a42a1c8b880_32x44.jpg" title="Product1" alt="LGTV"></div>

所有我需要从这个部门,有关图像的信息,图像的链接,我如何才能做到这一点

from bs4 import BeautifulSoup 
x = '''
<div class="cloudzoom-gallery e-item-card-photos-small_item" data-cloudzoom="
 useZoom:"#item_card_zoom", image:"/upload/66/66ef9b3de11aeaba1bc50a42a1c8b880.jpg", zoomImage:"/upload/66/66ef9b3de11aeaba1bc50a42a1c8b880.jpg""> <img width="44" src="/upload/66/32x44/66ef9b3de11aeaba1bc50a42a1c8b880_32x44.jpg" title="Product1" alt="LGTV"></div> '''
soup = BeautifulSoup(x, 'html5lib')
div = soup.find('div', attrs = {'class':'cloudzoom-gallery e-item-card-photos-small_item'})
print(div.img['src'])
print(div.img['title'])
print(div.img['alt'])
发布您的评论,因为您想要更大的图像,有一种可以识别的模式,可以利用它来完成工作:

1) 文件名相同:
66ef9b3de11aeaba1bc50a42a1c8b880
除了在较小的图像后附加下划线和大小

2) 文件夹名称是文件名的前两个字母,在本例中为
66

< p>3)大图像的文件路径相同,除了附加在中间的大小,如<代码> 32×44 < /代码>

基于这些,我们可以轻松创建较大图像的路径,如:

from bs4 import BeautifulSoup 
x = '''
<div class="cloudzoom-gallery e-item-card-photos-small_item" data-cloudzoom="
 useZoom:"#item_card_zoom", image:"/upload/66/66ef9b3de11aeaba1bc50a42a1c8b880.jpg", zoomImage:"/upload/66/66ef9b3de11aeaba1bc50a42a1c8b880.jpg""> <img width="44" src="/upload/66/32x44/66ef9b3de11aeaba1bc50a42a1c8b880_32x44.jpg" title="Product1" alt="LGTV"></div> '''
soup = BeautifulSoup(x, 'html5lib')
div = soup.find('div', attrs = {'class':'cloudzoom-gallery e-item-card-photos-small_item'})
file_name = div.img['src'].split("_")[0].split("/")[-1]
extension = div.img['src'].split(".")[-1]
folder_name = file_name[0:2]
final_file_path = "/upload/" + folder_name + "/" + file_name + "." + extension
print(final_file_path)
另一个更简单的选项是简单地获取div字符串并将其适当地拆分,如下所示:

print(x.split("image:")[1].split(",")[0])
这将打印图像url:

"/upload/66/66ef9b3de11aeaba1bc50a42a1c8b880.jpg"
Beauty soup提供了一种获取数据属性的方法,如下所示:

div.attrs['data-cloudzoom']

但是,由于这里的数据属性在双引号中进一步取消了双引号转义,因此beautiful soup在这里无法工作。您还可以注意到,由于这个原因,您在问题中发布的html无法从stackoverflow中获得适当的颜色高光。

对不起,我的错误,我已经更改了代码并“扩展”了问题Python代码缩进仍然不好。请包含导入语句和变量定义(请参阅)。请再次检查url是什么?非常感谢,但当您看到img标记存储的url类型与div中的不同时,这是一个小图片url修改了我的答案
"/upload/66/66ef9b3de11aeaba1bc50a42a1c8b880.jpg"
div.attrs['data-cloudzoom']