Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/ms-access/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从特定的电子商务网站抓取图像';s链接_Python_Web Scraping_Beautifulsoup_Imageurl - Fatal编程技术网

Python 从特定的电子商务网站抓取图像';s链接

Python 从特定的电子商务网站抓取图像';s链接,python,web-scraping,beautifulsoup,imageurl,Python,Web Scraping,Beautifulsoup,Imageurl,我正在抓取一个电子商务网站的经验。我目前面临着一个问题,刮一个产品的图像。 我已经为一个产品的所有当前图像提取了html代码,但无法从该html代码中提取链接 我尝试的代码是: import requests from bs4 import BeautifulSoup import pandas as pd baseurl='https://www.preispirat24.com/neu-im-september/' baseforimages='https://www.preispirat2

我正在抓取一个电子商务网站的经验。我目前面临着一个问题,刮一个产品的图像。 我已经为一个产品的所有当前图像提取了html代码,但无法从该html代码中提取链接

我尝试的代码是:

import requests
from bs4 import BeautifulSoup
import pandas as pd
baseurl='https://www.preispirat24.com/neu-im-september/'
baseforimages='https://www.preispirat24.com/'
headers={
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
   
}   

productlinks=[]
for x in range(0,1,1):    
    r=requests.get('https://www.preispirat24.com/neu-im-september/?page={}'.format(x))
    soup=BeautifulSoup(r.content, 'html.parser')

    productlist=soup.find_all('div',class_='title-description')
    item='title-description'

    for item in productlist:
        for link in item.find_all('a',href=True):
            productlinks.append(link['href'])
            a=(link['href'])
            
            

#testlink='https://www.preispirat24.com/Lufterfrischer/axe-air-fresher/axe-mini-vent-dark-temptation-air-freshener-lufterfrischer-6er-t-dsp.html'
insultlist=[]
images=[]
for link in productlinks:
    b=link
    try:
        r=requests.get(link,headers=headers)
        soup=BeautifulSoup(r.content, 'html.parser')
        title=soup.find('h1',class_="product-info-title-desktop hidden-xs hidden-sm").text.strip()
        description=soup.find(class_='tab-body active',itemprop="description").text.strip()
        itemnumber=soup.find('span',itemprop="model").text.strip()

        images=soup.find_all(class_='align-vertical')
        print(images)
        #print (images['src'])
    except:
        print('----')
    insult={
        'title':title,
        'description':description,
        'itemnumber':itemnumber,
        'images':images,
        'productlink':b
    }
   
    insultlist.append(insult)
df=pd.DataFrame(insultlist)
print('Saving :',title)
print(df.head)
df.to_csv('3veerapreispirat24.csv')
我得到的输出类似于:

<img alt="Mobile Preview: 99671" data-magnifier-src="images/product_images/original_images/99671(1).jpg" src="images/product_images/gallery_images/99671(1).jpg" title="Mobile Preview: 99671"/>
</div>, <div class="align-vertical">
<img alt="Mobile Preview: 99671" data-magnifier-src="images/product_images/original_images/99671.jpg" src="images/product_images/gallery_images/99671.jpg" title="Mobile Preview: 99671"/>
</div>]
images/product_images/original_images/99671(1).jpg
images/product_images/gallery_images/99671(1).jpg
images/product_images/original_images/99671.jpg
images/product_images/gallery_images/99671.jpg"
注:我尝试过:
print(图像['src'])
它导致了异常打印---

要提取的产品图像中的示例产品链接


提前感谢您的帮助。

您的
图像
变量是我看到的HTML
元素数组。您应该迭代数组中的每个项,找到
标记,例如:

图像中元素的
:
url=element.find(“img”).get(“src”)

要从链接获取图像URL,可以使用以下示例:

import requests
from bs4 import BeautifulSoup


url = 'https://www.preispirat24.com/Verbrauchsartikel/Hygiene-Artikel-127/mund-nasen-maske-3-lagig-pink-mit-nasenbuegel-ohrschlaufen-einheitsgroesse-10-stuec.html'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for img in soup.select('#product_thumbnail_swiper [data-magnifier-src]'):
    print('https://www.preispirat24.com/' + img['data-magnifier-src'])
印刷品:

https://www.preispirat24.com/images/product_images/original_images/99649mix.jpg
https://www.preispirat24.com/images/product_images/original_images/99649.jpg
https://www.preispirat24.com/images/product_images/original_images/99649_0.jpg
https://www.preispirat24.com/images/product_images/original_images/99649_1.jpg
                                               title  ...                                        productlink
0  Mund Nasen Maske 3-lagig PINK mit Nasenbügel, ...  ...  https://www.preispirat24.com/Verbrauchsartikel...

[1 rows x 5 columns]

编辑:要将产品保存为csv,您可以执行以下操作:

import requests
import pandas as pd
from bs4 import BeautifulSoup


url = 'https://www.preispirat24.com/Verbrauchsartikel/Hygiene-Artikel-127/mund-nasen-maske-3-lagig-pink-mit-nasenbuegel-ohrschlaufen-einheitsgroesse-10-stuec.html'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')


title=soup.find('h1',class_="product-info-title-desktop hidden-xs hidden-sm").text.strip()
description=soup.find(class_='tab-body active',itemprop="description").text.strip()
itemnumber=soup.find('span',itemprop="model").text.strip()

images = []
for img in soup.select('#product_thumbnail_swiper [data-magnifier-src]'):
    images.append('https://www.preispirat24.com/' + img['data-magnifier-src'])
    # print('https://www.preispirat24.com/' + img['data-magnifier-src'])

df = pd.DataFrame({
        'title':title,
        'description':description,
        'itemnumber':itemnumber,
        'images':[images],
        'productlink':url
    })

df.to_csv('data.csv')
print(df)
印刷品:

https://www.preispirat24.com/images/product_images/original_images/99649mix.jpg
https://www.preispirat24.com/images/product_images/original_images/99649.jpg
https://www.preispirat24.com/images/product_images/original_images/99649_0.jpg
https://www.preispirat24.com/images/product_images/original_images/99649_1.jpg
                                               title  ...                                        productlink
0  Mund Nasen Maske 3-lagig PINK mit Nasenbügel, ...  ...  https://www.preispirat24.com/Verbrauchsartikel...

[1 rows x 5 columns]

并保存
data.csv

尊敬的,我真的很欣赏你的工作,它对我帮助很大。但问题是我必须将所有这些图像链接存储到一个.cvs文件。因为上面提到的代码不适用于将图像链接保存到cvs文件。非常感谢您的帮助,等待您的回复。谢谢你,谢谢你如此热爱你的工作。非常开胃。上帝保佑你。我正在循环浏览url的每一个产品:但它不再保存图像链接:(并且不添加数据。可以在下面找到源代码。请修改它。这对你来说几乎不需要30秒。寻求帮助@AndrejKesely感谢帮助。我从3天以来一直在尝试:(