Web scraping beautifulsoup href返回空字符串
我确信这很简单,但不知何故,我一直无法在跳到每个产品详细信息页面的Web scraping beautifulsoup href返回空字符串,web-scraping,beautifulsoup,Web Scraping,Beautifulsoup,我确信这很简单,但不知何故,我一直无法在跳到每个产品详细信息页面的a标签下获得href链接。我也没有看到任何javascript被包装起来。我错过了什么 import requests from bs4 import BeautifulSoup as bs from selenium import webdriver from selenium.webdriver.common.by import By from selenium.common.exceptions import Timeout
a
标签下获得href
链接。我也没有看到任何javascript被包装起来。我错过了什么
import requests
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import pandas as pd
urls = [
'https://undefeated.com/search?type=product&q=nike'
]
final = []
with requests.Session() as s:
for url in urls:
driver = webdriver.Chrome('/Users/Documents/python/Selenium/bin/chromedriver')
driver.get(url)
products = [element for element in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='product-grid-item ']")))]
soup = bs(driver.page_source, 'lxml')
time.sleep(1)
href = soup.find_all['href']
print(href)
输出:
[]
然后我尝试了
soup.find_all('a')
,它确实吐出了一大堆,包括我要找的href
,但仍然无法专门提取href…你只需找到所有a
标记,然后尝试打印href
属性
您可以请求。会话代码应如下所示:
with requests.Session() as s:
for url in urls:
driver = webdriver.Firefox()
driver.get(url)
products = [element for element in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='product-grid-item ']")))]
soup = bs(driver.page_source, 'lxml')
time.sleep(1)
a_links = soup.find_all('a')
for a in a_links:
print(a.get('href'))
然后所有的链接都会被打印出来