使用Beauty soup的python web抓取不起作用
我正试图从沃尔玛网站上删除一些数据进行研究 我想把所有的产品分类都删掉。每个产品类别都有这个html容器使用Beauty soup的python web抓取不起作用,python,web-scraping,Python,Web Scraping,我正试图从沃尔玛网站上删除一些数据进行研究 我想把所有的产品分类都删掉。每个产品类别都有这个html容器 <div class="TempoCategoryTileV2-tile"><img alt="" aria-hidden="true" tabindex="-1" itemprop="image" src="//i5.walmartimages.com/dfw/4ff9c6c9-deda/k2-_c3162a27-dbb6-46df-8b9f-b5b52ea657b
<div class="TempoCategoryTileV2-tile"><img alt="" aria-hidden="true" tabindex="-1" itemprop="image" src="//i5.walmartimages.com/dfw/4ff9c6c9-deda/k2-_c3162a27-dbb6-46df-8b9f-b5b52ea657b2.v1.jpg?odnWidth=168&odnHeight=210&odnBg=ffffff" class="TempoCategoryTileV2-tile-img display-block">
<div class="TempoCategoryTileV2-tile-content-one text-center">
<div class="TempoCategoryTileV2-tile-linkText">
<div style="overflow: hidden;">
<div>Toyland</div>
</div>
</div>
</div><a class="TempoCategoryTileV2-tile-overlay" id="HomePage-contentZone12-FeaturedCategoriesCuratedV2-tileLink-1" aria-label="Toyland" href="/cp/toys/4171?povid=14503+%257C+contentZone12+%257C+2017-11-01+%257C+1+%257C+HP+FC+Toys" data-uid="zir3SFhh" tabindex="" data-tl-id="HomePage-contentZone12-FeaturedCategoriesCuratedV2-categoryTile-1-link" style="background-image: url("about:blank");"></a></div>
但当我运行它时,我得到的都是这些
"Relax we are getting the data..."
[]
由于某些原因,它无法从页面获取内容。我做错了什么?我该如何解决这个问题?该页面的项目是动态生成的,因此您需要使用任何浏览器模拟器来捕获它。试试这个
import time
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
Walmarthome = 'https://www.walmart.com/?povid=14503+%7C+contentZone1+%7C+2017-10-27+%7C+1+%7C+header+logo'
driver.get(Walmarthome)
page = driver.find_element_by_tag_name('body')
for i in range(3):
page.send_keys(Keys.PAGE_DOWN)
time.sleep(2)
soup = BeautifulSoup(driver.page_source,"lxml")
driver.quit()
for item in soup.select(".TempoCategoryTileV2-tile"):
title = item.select(".TempoCategoryTileV2-tile-overlay")[0]['aria-label']
image = item.select("[itemprop='image']")[0]['src']
print(title,image)
谢谢,这是可行的,但它得到的是电脑。我想从电脑(视频游戏、食品、电子产品)下载产品类别,查看编辑后的代码。如果它符合你的目的,一定要接受它作为一个答案。谢谢。
import time
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
Walmarthome = 'https://www.walmart.com/?povid=14503+%7C+contentZone1+%7C+2017-10-27+%7C+1+%7C+header+logo'
driver.get(Walmarthome)
page = driver.find_element_by_tag_name('body')
for i in range(3):
page.send_keys(Keys.PAGE_DOWN)
time.sleep(2)
soup = BeautifulSoup(driver.page_source,"lxml")
driver.quit()
for item in soup.select(".TempoCategoryTileV2-tile"):
title = item.select(".TempoCategoryTileV2-tile-overlay")[0]['aria-label']
image = item.select("[itemprop='image']")[0]['src']
print(title,image)