Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/selenium/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用Selenium和BeautifulSoup解决lowes.com价格问题_Python_Selenium_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 使用Selenium和BeautifulSoup解决lowes.com价格问题

Python 使用Selenium和BeautifulSoup解决lowes.com价格问题,python,selenium,web-scraping,beautifulsoup,Python,Selenium,Web Scraping,Beautifulsoup,我正在努力搜集lowes.com的产品详细信息,下面是我正在尝试运行的脚本 from bs4 import BeautifulSoup from webdriver_manager.chrome import ChromeDriverManager from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by impo

我正在努力搜集lowes.com的产品详细信息,下面是我正在尝试运行的脚本

from bs4 import BeautifulSoup
from webdriver_manager.chrome import ChromeDriverManager
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_experimental_option('prefs', {
    'geolocation': True
})

#driver = webdriver.Chrome(ChromeDriverManager().install(),options=chrome_options)
#driver.execute_cdp_cmd("Page.setGeolocationOverride", {
#    "latitude": 34.052235,
#    "longitude": -118.243683,
#    "accuracy": 98
#})
driver.get("https://www.lowes.com/pd/Therma-Tru-Benchmark-Doors-Craftsman-Simulated-Divided-Light-Right-Hand-Inswing-Ready-To-Paint-Fiberglass-Prehung-Entry-Door-with-Insulating-Core-Common-36-in-x-80-in-Actual-37-5-in-x-81-5-in/1000157897")
driver.execute_script("window.scrollTo(0,document.body.scrollHeight/5)")
time.sleep(1)
driver.execute_script("window.scrollTo(0,(document.body.scrollHeight/5)*2)")
time.sleep(1)
driver.execute_script("window.scrollTo(0,(document.body.scrollHeight/5)*3)")
time.sleep(1)
driver.execute_script("window.scrollTo(0,(document.body.scrollHeight/5)*4)")
time.sleep(1)
driver.execute_script("window.scrollTo(0,(document.body.scrollHeight/5)*5)")
time.sleep(1)
content = driver.page_source
soup = BeautifulSoup(content,'html.parser')
imgs = soup.findAll("img", attrs={"class":"met-epc-item"})
for img in imgs:
    print(img.get("src"))
print("Price: "+soup.find("span", attrs={"class":"aPrice large"}).text)
brand = soup.find("a", attrs={"class":"Link__LinkStyled-RC__sc-b3hjw8-0 bYfcYt"})
print("brand url: "+ brand.get("href"))
print("brand name: "+ brand.get("text"))
print("brand desc: "+soup.find("h1", attrs={"class":"style__HeaderStyle-PDP__y7vp5g-12 iMECxW"}).text)
driver.close()

当我尝试执行此脚本时,price元素会导致一个错误,即该元素不存在,当我查看使用selenium打开的chrome实例中的页面时,我发现price没有出现,一个文本框要求zipcode或city或state显示价格和可用性,当尝试输入任何zipcode或city或state时,什么都没有当我尝试刷新或输入lowes网站中的任何其他URL时,它表示访问被拒绝,并且要在lowes中重新输入任何其他URL,需要使用selemnium打开新的chrome实例。有没有建议如何修复此问题并正确刮除产品?另外,我想提醒一下,当我从我的普通浏览器chrome打开网站时,它会正确打开,显示价格,并且不会拒绝任何访问,因为我们

您要查找的数据以Json格式嵌入页面中:

import re
import json
import requests

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0"
}

url = "https://www.lowes.com/pd/Therma-Tru-Benchmark-Doors-Craftsman-Simulated-Divided-Light-Right-Hand-Inswing-Ready-To-Paint-Fiberglass-Prehung-Entry-Door-with-Insulating-Core-Common-36-in-x-80-in-Actual-37-5-in-x-81-5-in/1000157897"
t = requests.get(url, headers=headers).text

data = re.search(r"window\['__PRELOADED_STATE__'\] = (\{.*?\})<", t)
data = json.loads(data.group(1))

# uncomment to print all data:
# print(json.dumps(data, indent=4))

item_id = url.split("/")[-1]

print("Name:", data["productDetails"][item_id]["product"]["brand"])
print("Desc:", data["productDetails"][item_id]["product"]["description"])
print("Price:", data["productDetails"][item_id]["price"]["itemPrice"])

数据可通过发送
GET
请求到:

https://www.lowes.com/pd/1000157897/productdetail/1674/Guest
您可以尝试此解决方案,以在不使用Selenium的情况下获取数据。(与@Andrej Kesely类似,但此处的URL不同)

输出:

Product price: $ 370
Product price: Therma-Tru Benchmark Doors
Product description: 36-in x 80-in Fiberglass Craftsman Right-Hand Inswing Ready to paint Unfinished Prehung Single Front Door with Brickmould

好的,谢谢你,它起作用了,但我想问一下。我可以执行的请求数量是否有限制,以便我的IP地址不会被阻止?因为我想刮几百甚至几千块products@SarahEldawody我不确定。但是,由于我们添加了
标题
,它应该有助于不被阻止。非常感谢您,它很有效
Product price: $ 370
Product price: Therma-Tru Benchmark Doors
Product description: 36-in x 80-in Fiberglass Craftsman Right-Hand Inswing Ready to paint Unfinished Prehung Single Front Door with Brickmould