Urllib.request不';我不能在Python3上工作。如何使用beautifulsoup?
我试图学习如何刮网站,但我不断地碰到urllib.request,这对我来说不起作用Urllib.request不';我不能在Python3上工作。如何使用beautifulsoup?,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我试图学习如何刮网站,但我不断地碰到urllib.request,这对我来说不起作用 import urllib.request import bs4 as bs sauce = urllib.request.urlopen('https://www.goat.com/collections/just-dropped').read() soup = bs.BeautifulSoup(sauce, 'lxml') print(soup) 试一试 您必须设置用户代理标题,但不幸的是页面是动态内容,
import urllib.request
import bs4 as bs
sauce = urllib.request.urlopen('https://www.goat.com/collections/just-dropped').read()
soup = bs.BeautifulSoup(sauce, 'lxml')
print(soup)
试一试
您必须设置用户代理标题,但不幸的是页面是动态内容,您必须使用selenium
from urllib.request import Request, urlopen
import bs4 as bs
req = Request('https://www.goat.com/collections/just-dropped')
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0')
sauce = urlopen(req).read()
soup = bs.BeautifulSoup(sauce, 'lxml')
print(soup)
使用Selenium,要使用它,您需要安装Selenium、Chrome和chromedriver
pip install selenium
pip install chromedriver-binary
代码
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
import chromedriver_binary # Adds chromedriver binary to path
driver = webdriver.Chrome()
driver.get('https://www.goat.com/collections/just-dropped')
# wait until the product rendered
products = WebDriverWait(driver, 15).until(
lambda d: d.find_element_by_css_selector('.goat-clean-product-template ')
)
for p in products:
name = p.get_attribute('title')
url = p.get_attribute('href')
print('%s: %s' % (name, url))
如前所述,您可以使用该库获取页面内容 首先,您必须通过
pip
安装请求
和bs4
。这将解决您得到的ModuleNotFoundError
pip install bs4
pip install requests
然后,他是您获取数据的代码:
import requests
from bs4 import BeautifulSoup
sauce = requests.get('https://www.goat.com/collections/just-dropped')
soup = BeautifulSoup(sauce.text, 'lxml')
print(soup)
ModuleNotFoundError:没有名为“requests”的模块单击我在解决方案中留下的链接。你需要安装它你是个可爱的人谢谢you@TudorPopica,很高兴它帮助了你。
import requests
from bs4 import BeautifulSoup
sauce = requests.get('https://www.goat.com/collections/just-dropped')
soup = BeautifulSoup(sauce.text, 'lxml')
print(soup)