使用Beauty soup的python web抓取不起作用_Python_Web Scraping

使用Beauty soup的python web抓取不起作用

python web-scraping

使用Beauty soup的python web抓取不起作用,python,web-scraping,Python,Web Scraping,我正试图从沃尔玛网站上删除一些数据进行研究我想把所有的产品分类都删掉。每个产品类别都有这个html容器 <div class="TempoCategoryTileV2-tile"><img alt="" aria-hidden="true" tabindex="-1" itemprop="image" src="//i5.walmartimages.com/dfw/4ff9c6c9-deda/k2-_c3162a27-dbb6-46df-8b9f-b5b52ea657b

我正试图从沃尔玛网站上删除一些数据进行研究

我想把所有的产品分类都删掉。每个产品类别都有这个html容器

  <div class="TempoCategoryTileV2-tile"><img alt="" aria-hidden="true" tabindex="-1" itemprop="image" src="//i5.walmartimages.com/dfw/4ff9c6c9-deda/k2-_c3162a27-dbb6-46df-8b9f-b5b52ea657b2.v1.jpg?odnWidth=168&amp;odnHeight=210&amp;odnBg=ffffff" class="TempoCategoryTileV2-tile-img display-block">
<div class="TempoCategoryTileV2-tile-content-one text-center">
    <div class="TempoCategoryTileV2-tile-linkText">
        <div style="overflow: hidden;">
            <div>Toyland</div>
        </div>
    </div>
</div><a class="TempoCategoryTileV2-tile-overlay" id="HomePage-contentZone12-FeaturedCategoriesCuratedV2-tileLink-1" aria-label="Toyland" href="/cp/toys/4171?povid=14503+%257C+contentZone12+%257C+2017-11-01+%257C+1+%257C+HP+FC+Toys" data-uid="zir3SFhh" tabindex="" data-tl-id="HomePage-contentZone12-FeaturedCategoriesCuratedV2-categoryTile-1-link" style="background-image: url(&quot;about:blank&quot;);"></a></div>

但当我运行它时，我得到的都是这些

 "Relax we are getting the data..." 
 []

由于某些原因，它无法从页面获取内容。我做错了什么？我该如何解决这个问题？

该页面的项目是动态生成的，因此您需要使用任何浏览器模拟器来捕获它。试试这个

import time
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
Walmarthome = 'https://www.walmart.com/?povid=14503+%7C+contentZone1+%7C+2017-10-27+%7C+1+%7C+header+logo'
driver.get(Walmarthome)
page = driver.find_element_by_tag_name('body')
for i in range(3):
    page.send_keys(Keys.PAGE_DOWN)
    time.sleep(2)

soup = BeautifulSoup(driver.page_source,"lxml")
driver.quit()
for item in soup.select(".TempoCategoryTileV2-tile"):
    title = item.select(".TempoCategoryTileV2-tile-overlay")[0]['aria-label']
    image = item.select("[itemprop='image']")[0]['src']
    print(title,image)

谢谢，这是可行的，但它得到的是电脑。我想从电脑（视频游戏、食品、电子产品）下载产品类别，查看编辑后的代码。如果它符合你的目的，一定要接受它作为一个答案。谢谢。

import time
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
Walmarthome = 'https://www.walmart.com/?povid=14503+%7C+contentZone1+%7C+2017-10-27+%7C+1+%7C+header+logo'
driver.get(Walmarthome)
page = driver.find_element_by_tag_name('body')
for i in range(3):
    page.send_keys(Keys.PAGE_DOWN)
    time.sleep(2)

soup = BeautifulSoup(driver.page_source,"lxml")
driver.quit()
for item in soup.select(".TempoCategoryTileV2-tile"):
    title = item.select(".TempoCategoryTileV2-tile-overlay")[0]['aria-label']
    image = item.select("[itemprop='image']")[0]['src']
    print(title,image)