Python 美丽的汤刮电影标题和图像
我试着按照这门课去做,我被困在一个例子里,因为网站内容和标签都被改变了。 在课程中,标记看起来: 现在是,但即使我更改了类,我也不能返回任何内容。 我想把电影的片名和图片删掉。html图片在这里。Python 美丽的汤刮电影标题和图像,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我试着按照这门课去做,我被困在一个例子里,因为网站内容和标签都被改变了。 在课程中,标记看起来: 现在是,但即使我更改了类,我也不能返回任何内容。 我想把电影的片名和图片删掉。html图片在这里。 此网站是动态的,因此使用bs4在此处不起作用(请参阅)。我建议您使用selenium获取页面源代码并将其传递到soup对象中。下面是执行此操作的示例代码: from bs4 import BeautifulSoup from selenium import webdriver from seleni
此网站是动态的,因此使用
bs4
在此处不起作用(请参阅)。我建议您使用selenium
获取页面源代码并将其传递到soup对象中。下面是执行此操作的示例代码:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
url = 'https://www.empireonline.com/movies/features/best-movies-2'
chrome_driver_path = 'chromedriver'
chrome_options = Options()
chrome_options.add_argument('--headless')
webdriver = webdriver.Chrome(ChromeDriverManager().install())
with webdriver as driver:
# Set timeout time
wait = WebDriverWait(driver, 10)
# Retrieve url in headless browser
driver.get(url)
html = driver.page_source
driver.close()
soup = BeautifulSoup(html, 'html.parser')
titles = soup.find_all(name='h3', class_='jsx-2692754980')
titles = [i.text for i in titles if i.text is not None]
print(titles)
imgs = soup.find('div', class_='jsx-3821216435').find_all('img')
print(imgs)
标题和IMG的结果如下:
titles -- ['100) Stand By Me', '99) Raging Bull', '98) Amelie', '97) Titanic', '96) Good Will Hunting', '95) Arrival', '94) Lost In Translation' ... ]
imgs --- [<img alt="Stand By Me" class="jsx-952983560 loading" data-src="//cdn.onebauer.media/one/media/5e62/24d4/08ba/aa5a/8143/279c/stand-by-me.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit" src="" title=""/>, <img alt="Raging Bull" class="jsx-952983560 loading" data-src="//cdn.onebauer.media/one/media/5d2d/d990/853e/7cd6/60cc/fa2e/raging-bull.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit" src="" title=""/>, ... ]
标题--['100)站在我身边,'99)愤怒的公牛,'98)艾米莉,'97)泰坦尼克号,'96)善意狩猎,'95)抵达,'94)迷失在翻译中'…]
imgs-[,…]
注意您需要
pip安装selenium
,然后下载chromedriver
,并将其与script放在同一个目录中。您可能希望探索selenium
以及beautifulsou
以下是方法:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = True
driver = webdriver.Chrome(options=options)
driver.get("https://www.empireonline.com/movies/features/best-movies-2")
soup = BeautifulSoup(driver.page_source, "html.parser").find_all("img")
movies = []
for image in soup:
try:
if image["alt"]:
movies.append([image["alt"], f"https:{image['data-src']}"])
except KeyError:
continue
for movie in movies[1:]:
title, link = movie
print(f"{title}\n{link}\n{'-' * 80}")
输出:
Stand By Me
https://cdn.onebauer.media/one/media/5e62/24d4/08ba/aa5a/8143/279c/stand-by-me.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
Raging Bull
https://cdn.onebauer.media/one/media/5d2d/d990/853e/7cd6/60cc/fa2e/raging-bull.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
Amelie
https://cdn.onebauer.media/one/empire-images/features/59395a49f68e659c7aa3a1a8/Amelie.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
Leonardo DiCaprio and Kate Winslet in Titanic
https://cdn.onebauer.media/one/lifestyle-images/celebrity/59d4ac2c07c78ace382c4735/kate-winslet-leonardo-dicaprio-titanic.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
Good Will Hunting
https://cdn.onebauer.media/one/media/5e62/2a32/2cd5/547b/bf0f/6416/good-will-hunting.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
Arrival
https://cdn.onebauer.media/one/media/5e62/2ac7/2eea/4450/3534/4b45/Arrival.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
Lost In Translation
https://cdn.onebauer.media/one/media/5e62/2b5f/232f/f064/694b/c738/lost-in-translation.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
The Princess Bride
https://cdn.onebauer.media/one/media/5e62/2bf3/08ba/aa7b/8f43/27e0/the-princess-bride.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
The Terminator
https://cdn.onebauer.media/one/empire-images/features/59395a49f68e659c7aa3a1a8/The%2520Terminator.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
and so on ...
Stand By Me
https://cdn.onebauer.media/one/media/5e62/24d4/08ba/aa5a/8143/279c/stand-by-me.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
Raging Bull
https://cdn.onebauer.media/one/media/5d2d/d990/853e/7cd6/60cc/fa2e/raging-bull.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
Amelie
https://cdn.onebauer.media/one/empire-images/features/59395a49f68e659c7aa3a1a8/Amelie.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
Leonardo DiCaprio and Kate Winslet in Titanic
https://cdn.onebauer.media/one/lifestyle-images/celebrity/59d4ac2c07c78ace382c4735/kate-winslet-leonardo-dicaprio-titanic.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
Good Will Hunting
https://cdn.onebauer.media/one/media/5e62/2a32/2cd5/547b/bf0f/6416/good-will-hunting.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
Arrival
https://cdn.onebauer.media/one/media/5e62/2ac7/2eea/4450/3534/4b45/Arrival.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
Lost In Translation
https://cdn.onebauer.media/one/media/5e62/2b5f/232f/f064/694b/c738/lost-in-translation.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
The Princess Bride
https://cdn.onebauer.media/one/media/5e62/2bf3/08ba/aa7b/8f43/27e0/the-princess-bride.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
The Terminator
https://cdn.onebauer.media/one/empire-images/features/59395a49f68e659c7aa3a1a8/The%2520Terminator.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
and so on ...