Python BeautifulSoup在CSV中写入1行
我正在尝试获取页面上显示的产品名称、链接和价格的所有值。每一行占用一行并用逗号分隔 我已经在一个类似的网站上编写了这段代码,但出于某种原因,这里它只将第一个结果写入CSVPython BeautifulSoup在CSV中写入1行,python,csv,beautifulsoup,Python,Csv,Beautifulsoup,我正在尝试获取页面上显示的产品名称、链接和价格的所有值。每一行占用一行并用逗号分隔 我已经在一个类似的网站上编写了这段代码,但出于某种原因,这里它只将第一个结果写入CSV import requests from bs4 import BeautifulSoup from csv import writer response = requests.get('https://www.micoca-cola.cl/bebidas/coca-cola') soup = BeautifulSoup(r
import requests
from bs4 import BeautifulSoup
from csv import writer
response = requests.get('https://www.micoca-cola.cl/bebidas/coca-cola')
soup = BeautifulSoup(response.text, 'html.parser')
items = soup.find_all(class_='prateleira vitrine n12colunas')
with open('coca.csv', 'w', newline='') as csv_file:
csv_writer = writer(csv_file)
headers = ['Producto', 'Link', 'Precio']
csv_writer.writerow(headers)
for item in items:
producto = item.find(class_='product-block-name').get_text()
link = item.find('a')['href']
price = item.find(class_='bestPrice').get_text().replace('\n', '').replace('"', '').replace(' ', '')
csv_writer.writerow([producto, link, price])
这将产生以下结果:
Producto,Link,Precio
“重新加注8瓶可口可乐Sin Azúcar可再蒸馏2,0升(不包括包装)”,https://www.micoca-cola.cl/refill-8-coca-cola-sin-azucar-retornable-20-lt-no-incluye-envases/p“$9.520,00”
但是在那一页上还有其他的产品,我想包括在它们自己的行中
缺少什么?要加载所有产品标题、链接和价格并保存到CSV,您可以使用以下示例:
import re
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = 'https://www.micoca-cola.cl/bebidas/coca-cola'
html_doc = requests.get(url).text
page_url = 'https://www.micoca-cola.cl' + re.search(r"\.load\('(.*?)'", html_doc).group(1)
data = []
page = 1
while True:
soup = BeautifulSoup(requests.get(page_url + str(page)).content, 'html.parser')
if not soup.body:
break
for product in soup.select('.product-group'):
title = product.h4.text
link = product.h4.a['href']
print(title)
print(link)
price = product.find(class_="bestPrice")
price = price.get_text(strip=True) if price else 'Out of Stock'
print(price)
print('-' * 80)
data.append({
'title': title,
'link': link,
'price': price
})
page += 1
df = pd.DataFrame(data)
print(df)
df.to_csv('data.csv', index=False)
印刷品:
...
32 Coca-Cola Light 6 x 591 ml. ... $ 5.340,00
33 Coca-Cola Sin Azúcar 1,5 lt. ... $ 1.390,00
34 Coca-Cola Sin Azúcar 2,5 lt. ... $ 1.890,00
35 Starter Kit Coca-Cola Light retornable 9 x 1,2... ... $ 10.710,00
36 Starter Kit Coca-Cola Original retornable 8 x ... ... $ 11.920,00
37 Coca-Cola Original 6 x 3,0 lt. ... $ 13.140,00
38 Coca-Cola Energy Sin Azúcar 220 ml. ... $ 990,00
39 Starter Kit Coca-Cola Sin Azúcar retornable re... ... $ 1.490,00
40 Starter Kit Coca-Cola Sin Azúcar retornable 1,... ... $ 1.190,00
41 Coca-Cola Light 2,5 lt. ... $ 1.890,00
42 Coca-Cola Light 1,5 lt. ... $ 1.390,00
43 Coca-Cola Sin Azúcar 6 x 250 ml. ... $ 2.290,00
44 Coca-Cola Original 1,5 lt. ... $ 1.390,00
45 Coca-Cola Original 3,0 lt. ... $ 2.190,00
46 Coca-Cola Original 6 x 591 ml. ... $ 5.340,00
47 Starter Kit Coca-Cola Original retornable 9 x ... ... $ 10.710,00
48 Starter Kit Coca-Cola Light retornable 8 x 2,0... ... $ 11.920,00
49 Starter Kit Coca-Cola Original retornable 2,0 ... ... $ 1.490,00
50 Starter Kit Coca-Cola Light retornable retorna... ... $ 1.190,00
51 Coca-Cola Light 3,0 lt. ... $ 2.190,00
52 Coca-Cola Original 6 x 250 ml. ... $ 2.290,00
53 Starter Kit Coca-Cola Light retornable retorna... ... $ 1.490,00
54 Starter Kit Coca-Cola Original retornable 1,25... ... $ 1.190,00
55 Coca-Cola Original 2,5 lt. ... $ 1.890,00
56 Coca-Cola Original 1,0 lt. ... $ 990,00
57 Coca-Cola Light 1,0 lt. ... Out of Stock
[58 rows x 3 columns]
并保存data.csv
(来自LibreOffice的屏幕截图):
您是否尝试将其记入借方?
items
中有多少项?如果网站只向脚本返回了一行,那么可能什么都没有丢失,因为返回到脚本的数据可能与您在浏览器中看到的数据完全不同。。您需要显示返回的数据多于您的脚本写入CSV的数据。您在网页源中只有一个元素的类名为thst。尝试检查chrome开发工具中所有条目的通用定位器。用通用定位器替换此定位器将导致在项中列出所有记录
这完全解决了我的问题。它工作得很好。它比我的例子更先进,所以我会尽力去理解它。