Python 使用Selenium和BeautifulSoup解决lowes.com价格问题
我正在努力搜集lowes.com的产品详细信息,下面是我正在尝试运行的脚本Python 使用Selenium和BeautifulSoup解决lowes.com价格问题,python,selenium,web-scraping,beautifulsoup,Python,Selenium,Web Scraping,Beautifulsoup,我正在努力搜集lowes.com的产品详细信息,下面是我正在尝试运行的脚本 from bs4 import BeautifulSoup from webdriver_manager.chrome import ChromeDriverManager from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by impo
from bs4 import BeautifulSoup
from webdriver_manager.chrome import ChromeDriverManager
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_experimental_option('prefs', {
'geolocation': True
})
#driver = webdriver.Chrome(ChromeDriverManager().install(),options=chrome_options)
#driver.execute_cdp_cmd("Page.setGeolocationOverride", {
# "latitude": 34.052235,
# "longitude": -118.243683,
# "accuracy": 98
#})
driver.get("https://www.lowes.com/pd/Therma-Tru-Benchmark-Doors-Craftsman-Simulated-Divided-Light-Right-Hand-Inswing-Ready-To-Paint-Fiberglass-Prehung-Entry-Door-with-Insulating-Core-Common-36-in-x-80-in-Actual-37-5-in-x-81-5-in/1000157897")
driver.execute_script("window.scrollTo(0,document.body.scrollHeight/5)")
time.sleep(1)
driver.execute_script("window.scrollTo(0,(document.body.scrollHeight/5)*2)")
time.sleep(1)
driver.execute_script("window.scrollTo(0,(document.body.scrollHeight/5)*3)")
time.sleep(1)
driver.execute_script("window.scrollTo(0,(document.body.scrollHeight/5)*4)")
time.sleep(1)
driver.execute_script("window.scrollTo(0,(document.body.scrollHeight/5)*5)")
time.sleep(1)
content = driver.page_source
soup = BeautifulSoup(content,'html.parser')
imgs = soup.findAll("img", attrs={"class":"met-epc-item"})
for img in imgs:
print(img.get("src"))
print("Price: "+soup.find("span", attrs={"class":"aPrice large"}).text)
brand = soup.find("a", attrs={"class":"Link__LinkStyled-RC__sc-b3hjw8-0 bYfcYt"})
print("brand url: "+ brand.get("href"))
print("brand name: "+ brand.get("text"))
print("brand desc: "+soup.find("h1", attrs={"class":"style__HeaderStyle-PDP__y7vp5g-12 iMECxW"}).text)
driver.close()
当我尝试执行此脚本时,price元素会导致一个错误,即该元素不存在,当我查看使用selenium打开的chrome实例中的页面时,我发现price没有出现,一个文本框要求zipcode或city或state显示价格和可用性,当尝试输入任何zipcode或city或state时,什么都没有当我尝试刷新或输入lowes网站中的任何其他URL时,它表示访问被拒绝,并且要在lowes中重新输入任何其他URL,需要使用selemnium打开新的chrome实例。有没有建议如何修复此问题并正确刮除产品?另外,我想提醒一下,当我从我的普通浏览器chrome打开网站时,它会正确打开,显示价格,并且不会拒绝任何访问,因为我们您要查找的数据以Json格式嵌入页面中:
import re
import json
import requests
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0"
}
url = "https://www.lowes.com/pd/Therma-Tru-Benchmark-Doors-Craftsman-Simulated-Divided-Light-Right-Hand-Inswing-Ready-To-Paint-Fiberglass-Prehung-Entry-Door-with-Insulating-Core-Common-36-in-x-80-in-Actual-37-5-in-x-81-5-in/1000157897"
t = requests.get(url, headers=headers).text
data = re.search(r"window\['__PRELOADED_STATE__'\] = (\{.*?\})<", t)
data = json.loads(data.group(1))
# uncomment to print all data:
# print(json.dumps(data, indent=4))
item_id = url.split("/")[-1]
print("Name:", data["productDetails"][item_id]["product"]["brand"])
print("Desc:", data["productDetails"][item_id]["product"]["description"])
print("Price:", data["productDetails"][item_id]["price"]["itemPrice"])
数据可通过发送
GET
请求到:
https://www.lowes.com/pd/1000157897/productdetail/1674/Guest
您可以尝试此解决方案,以在不使用Selenium的情况下获取数据。(与@Andrej Kesely类似,但此处的URL不同)
输出:
Product price: $ 370
Product price: Therma-Tru Benchmark Doors
Product description: 36-in x 80-in Fiberglass Craftsman Right-Hand Inswing Ready to paint Unfinished Prehung Single Front Door with Brickmould
好的,谢谢你,它起作用了,但我想问一下。我可以执行的请求数量是否有限制,以便我的IP地址不会被阻止?因为我想刮几百甚至几千块products@SarahEldawody我不确定。但是,由于我们添加了
标题
,它应该有助于不被阻止。非常感谢您,它很有效
Product price: $ 370
Product price: Therma-Tru Benchmark Doors
Product description: 36-in x 80-in Fiberglass Craftsman Right-Hand Inswing Ready to paint Unfinished Prehung Single Front Door with Brickmould