Python-针对目标的商品价格Web抓取_Python_Selenium_Beautifulsoup

Python-针对目标的商品价格Web抓取

python selenium

Python-针对目标的商品价格Web抓取,python,selenium,beautifulsoup,Python,Selenium,Beautifulsoup,我试图从网站上获取任何商品的价格。我使用selenium和redskyapi为这个网站做了一些示例，但现在我尝试在下面编写bs4代码： import requests from bs4 import BeautifulSoup url = "https://www.target.com/p/ruffles-cheddar-38-sour-cream-potato-chips-2-5oz/-/A-14930847#lnk=sametab" r= requests.get(ur

我试图从网站上获取任何商品的价格。我使用

selenium

和

redskyapi

为这个网站做了一些示例，但现在我尝试在下面编写

bs4

代码：

import requests
from bs4 import BeautifulSoup

url = "https://www.target.com/p/ruffles-cheddar-38-sour-cream-potato-chips-2-5oz/-/A-14930847#lnk=sametab"
r= requests.get(url)
soup = BeautifulSoup(r.content, "lxml")

price = soup.find("div",class_= "web-migration-tof__PriceFontSize-sc-14z8sos-14 elGGzp")
print(price)

但是它返回我

None

我尝试了

soup.find（“div”，“class”：“web-migration-tof_upricefontsize-sc-14z8sos-14elggzp”）

我错过了什么

我可以接受任何selenium代码或Redsky API代码，但我的优先权是bs4

您只是使用了错误的定位器。
试试这个

或者使用XPath样式

price_xpath_locator = '//div[@data-test="product-price"]'

对于bs4，它应该是这样的：

soup.select('div[data-test="product-price"]')

要获取元素get，只需添加

.text

price = soup.select('div[data-test="product-price"]').text
print(price)

price = soup.find("div",class_= "web-migration-tof__PriceFontSize-sc-14z8sos-14 elGGzp")
print(price.text)

使用

.text

price = soup.select('div[data-test="product-price"]').text
print(price)

price = soup.find("div",class_= "web-migration-tof__PriceFontSize-sc-14z8sos-14 elGGzp")
print(price.text)

页面是动态的。数据在发出初始请求后呈现。您可以使用selenium加载页面，一旦呈现页面，就可以拉出相关标记。不过，如果API可用，它总是首选的方法

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')

# If you don't want to open a browser, comment out the line above and uncomment below
#options = webdriver.ChromeOptions()
#options.add_argument('headless')
#driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe', options=options)

url = "https://www.target.com/p/ruffles-cheddar-38-sour-cream-potato-chips-2-5oz/-/A-14930847#lnk=sametab"
driver.get(url)
r = driver.page_source
soup = BeautifulSoup(r, "lxml")

price = soup.find("div",class_= "web-migration-tof__PriceFontSize-sc-14z8sos-14 elGGzp")
print(price.text)

输出：

$1.99

谢谢你的回答，但是我怎样才能把它添加到我的代码中呢？你能帮助我吗？据我所知，我们不能在bs4xpath中使用xpath，因为不允许使用xpath@Prophet@cruisepandey我懂了。我只是为硒提供的。因为bs4是最后一个line@Wicaledon更新后的答案对你有用吗？@Prophet如果我使用最后一行，它会返回我

[]

谢谢你的答案，但我得到

属性错误：“非类型”对象没有属性“文本”

@Wicaledon:刚才看到，是

。应在价格上使用text

。将简单请求与

请求一起使用时，此数据不在源html中。它是通过js呈现的，这就是为什么它适用于Selenium（并且是从api获取的），这就是为什么api也适用于Selenium的原因如果您愿意，您可以使用Selenium和bs4的组合。你想看看这个解决方案吗？@chitown88当然可以。在xhr中查找数据怎么样？你试过吗？