Python 美丽的汤刮电影标题和图像

Python 美丽的汤刮电影标题和图像,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我试着按照这门课去做,我被困在一个例子里,因为网站内容和标签都被改变了。 在课程中,标记看起来: 现在是,但即使我更改了类,我也不能返回任何内容。 我想把电影的片名和图片删掉。html图片在这里。 此网站是动态的,因此使用bs4在此处不起作用(请参阅)。我建议您使用selenium获取页面源代码并将其传递到soup对象中。下面是执行此操作的示例代码: from bs4 import BeautifulSoup from selenium import webdriver from seleni

我试着按照这门课去做,我被困在一个例子里,因为网站内容和标签都被改变了。 在课程中,标记看起来:

现在是,但即使我更改了类,我也不能返回任何内容。 我想把电影的片名和图片删掉。html图片在这里。


此网站是动态的,因此使用
bs4
在此处不起作用(请参阅)。我建议您使用
selenium
获取页面源代码并将其传递到soup对象中。下面是执行此操作的示例代码:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager

url = 'https://www.empireonline.com/movies/features/best-movies-2'
chrome_driver_path = 'chromedriver'

chrome_options = Options()
chrome_options.add_argument('--headless')

webdriver = webdriver.Chrome(ChromeDriverManager().install())


with webdriver as driver:
    # Set timeout time
    wait = WebDriverWait(driver, 10)

    # Retrieve url in headless browser
    driver.get(url)

    html = driver.page_source

    driver.close()


soup = BeautifulSoup(html, 'html.parser')

titles = soup.find_all(name='h3', class_='jsx-2692754980')
titles = [i.text for i in titles if i.text is not None]
print(titles)

imgs = soup.find('div', class_='jsx-3821216435').find_all('img')
print(imgs)
标题和IMG的结果如下:

titles -- ['100) Stand By Me', '99) Raging Bull', '98) Amelie', '97) Titanic', '96) Good Will Hunting', '95) Arrival', '94) Lost In Translation' ... ]

imgs --- [<img alt="Stand By Me" class="jsx-952983560 loading" data-src="//cdn.onebauer.media/one/media/5e62/24d4/08ba/aa5a/8143/279c/stand-by-me.jpg?format=jpg&amp;quality=80&amp;width=500&amp;ratio=1-1&amp;resize=aspectfit" src="" title=""/>, <img alt="Raging Bull" class="jsx-952983560 loading" data-src="//cdn.onebauer.media/one/media/5d2d/d990/853e/7cd6/60cc/fa2e/raging-bull.jpg?format=jpg&amp;quality=80&amp;width=500&amp;ratio=1-1&amp;resize=aspectfit" src="" title=""/>, ... ]
标题--['100)站在我身边,'99)愤怒的公牛,'98)艾米莉,'97)泰坦尼克号,'96)善意狩猎,'95)抵达,'94)迷失在翻译中'…]
imgs-[,…]

注意您需要
pip安装selenium
,然后下载
chromedriver
,并将其与script放在同一个目录中。

您可能希望探索
selenium
以及
beautifulsou

以下是方法:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.headless = True
driver = webdriver.Chrome(options=options)

driver.get("https://www.empireonline.com/movies/features/best-movies-2")
soup = BeautifulSoup(driver.page_source, "html.parser").find_all("img")

movies = []
for image in soup:
    try:
        if image["alt"]:
            movies.append([image["alt"], f"https:{image['data-src']}"])
    except KeyError:
        continue

for movie in movies[1:]:
    title, link = movie
    print(f"{title}\n{link}\n{'-' * 80}")

输出:

Stand By Me
https://cdn.onebauer.media/one/media/5e62/24d4/08ba/aa5a/8143/279c/stand-by-me.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
Raging Bull
https://cdn.onebauer.media/one/media/5d2d/d990/853e/7cd6/60cc/fa2e/raging-bull.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
Amelie
https://cdn.onebauer.media/one/empire-images/features/59395a49f68e659c7aa3a1a8/Amelie.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
Leonardo DiCaprio and Kate Winslet in Titanic
https://cdn.onebauer.media/one/lifestyle-images/celebrity/59d4ac2c07c78ace382c4735/kate-winslet-leonardo-dicaprio-titanic.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
Good Will Hunting
https://cdn.onebauer.media/one/media/5e62/2a32/2cd5/547b/bf0f/6416/good-will-hunting.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
Arrival
https://cdn.onebauer.media/one/media/5e62/2ac7/2eea/4450/3534/4b45/Arrival.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
Lost In Translation
https://cdn.onebauer.media/one/media/5e62/2b5f/232f/f064/694b/c738/lost-in-translation.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
The Princess Bride
https://cdn.onebauer.media/one/media/5e62/2bf3/08ba/aa7b/8f43/27e0/the-princess-bride.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
The Terminator
https://cdn.onebauer.media/one/empire-images/features/59395a49f68e659c7aa3a1a8/The%2520Terminator.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------

and so on ...
Stand By Me
https://cdn.onebauer.media/one/media/5e62/24d4/08ba/aa5a/8143/279c/stand-by-me.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
Raging Bull
https://cdn.onebauer.media/one/media/5d2d/d990/853e/7cd6/60cc/fa2e/raging-bull.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
Amelie
https://cdn.onebauer.media/one/empire-images/features/59395a49f68e659c7aa3a1a8/Amelie.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
Leonardo DiCaprio and Kate Winslet in Titanic
https://cdn.onebauer.media/one/lifestyle-images/celebrity/59d4ac2c07c78ace382c4735/kate-winslet-leonardo-dicaprio-titanic.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
Good Will Hunting
https://cdn.onebauer.media/one/media/5e62/2a32/2cd5/547b/bf0f/6416/good-will-hunting.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
Arrival
https://cdn.onebauer.media/one/media/5e62/2ac7/2eea/4450/3534/4b45/Arrival.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
Lost In Translation
https://cdn.onebauer.media/one/media/5e62/2b5f/232f/f064/694b/c738/lost-in-translation.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
The Princess Bride
https://cdn.onebauer.media/one/media/5e62/2bf3/08ba/aa7b/8f43/27e0/the-princess-bride.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------
The Terminator
https://cdn.onebauer.media/one/empire-images/features/59395a49f68e659c7aa3a1a8/The%2520Terminator.jpg?format=jpg&quality=80&width=500&ratio=1-1&resize=aspectfit
--------------------------------------------------------------------------------

and so on ...