Python 属性错误：'；列表'；对象没有属性'；h3'；（美丽群芳）_Python_Html_Web Scraping_Beautifulsoup

Python 属性错误：'；列表'；对象没有属性'；h3'；（美丽群芳）

python html web-scraping

Python 属性错误：'；列表'；对象没有属性'；h3'；（美丽群芳）,python,html,web-scraping,beautifulsoup,Python,Html,Web Scraping,Beautifulsoup,我是一个网络抓取的初学者，我正在按照本教程（）来提取电影数据，我认为我对“第一部电影”的定义很糟糕这是密码 from requests import get from bs4 import BeautifulSoup first_movie =[] url = 'http://www.imdb.com/search/title? release_date=2017&sort=num_votes,desc&page=1' response = get

我是一个网络抓取的初学者，我正在按照本教程（）来提取电影数据，我认为我对“第一部电影”的定义很糟糕

这是密码

  from requests import get
  from bs4 import BeautifulSoup

  first_movie =[]

  url = 'http://www.imdb.com/search/title? 
  release_date=2017&sort=num_votes,desc&page=1'
  response = get(url)
  html_soup = BeautifulSoup(response.text, 'html.parser')
  type(html_soup)

  movie_containers = html_soup.find_all('div', class_ = 'lister-item mode-advanced')

  first_name = first_movie.h3.a.text

我得到这个错误：

Traceback (most recent call last):
File "mov1.py", line 13, in <module>
first_name = first_movie.h3.a.text
AttributeError: 'list' object has no attribute 'h3'

回溯（最近一次呼叫最后一次）：
文件“mov1.py”，第13行，在
first_name=first_movie.h3.a.text
AttributeError:“list”对象没有属性“h3”

find_all

始终返回一个列表

替换您的代码：

first_name = first_movie.h3.a.text

到

O/p:

请尝试以下代码

import requests
from bs4 import BeautifulSoup
url = 'https://www.imdb.com/search/title?release_date=2017&sort=num_votes,desc&page=1'
r = requests.get(url, headers = {'User-Agent' : 'Mozilla/5.0'})
soup = BeautifulSoup(r.content, 'html.parser')
items=soup.find_all('h3',class_='lister-item-header')
for item in items:
    print(item.find('a').text)

输出：

Logan
Wonder Woman
Guardians of the Galaxy: Vol. 2
Thor: Ragnarok
Dunkirk
Star Wars: Episode VIII - The Last Jedi
Spider-Man: Homecoming
Get Out
Blade Runner 2049
Baby Driver
It
Three Billboards Outside Ebbing, Missouri
Justice League
The Shape of Water
John Wick: Chapter 2
Coco
Jumanji: Welcome to the Jungle
Beauty and the Beast
Kong: Skull Island
Kingsman: The Golden Circle
Pirates of the Caribbean: Salazar's Revenge
Alien: Covenant
13 Reasons Why
War for the Planet of the Apes
The Greatest Showman
Life
Fast & Furious 8
Murder on the Orient Express
Lady Bird
Ghost in the Shell
King Arthur: Legend of the Sword
Wind River
The Hitman's Bodyguard
Mother!
The Mummy
Call Me by Your Name
Atomic Blonde
The Punisher
Bright
I, Tonya
Valerian and the City of a Thousand Planets
Baywatch
Darkest Hour
American Made
La Casa de Papel
Mindhunter
Transformers: The Last Knight
The Handmaid's Tale
The Lego Batman Movie
The Disaster Artist

未分配第一部电影，请用它替换电影容器。使用

find（）

选择第一个元素

first_movie = html_soup.find('div', class_ = 'lister-item mode-advanced')
first_name = first_movie.h3.a.text

或者使用带有索引的

find_all（）

first_movie = html_soup.find_all('div', class_ = 'lister-item mode-advanced')[0]
first_name = first_movie.h3.a.text

一个很好的短选择器，它利用相邻的兄弟组合符来获取类旁边的

标记
import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://www.imdb.com/search/title?release_date=2017&sort=num_votes,desc&page=1')
soup = bs(r.content, 'lxml')
titles = [item.text for item in soup.select('.lister-item-index + a')]
print(titles)

你想用h3做什么？@Jeppe那不行，因为第一部电影没有元素，它是一个空列表。@Matiascero抱歉，我看错了html\u soup.find\u all
返回一个列表。每一个都可能包含h3。例如，电影容器[0].h3.a.text。
first_movie = html_soup.find_all('div', class_ = 'lister-item mode-advanced')[0]
first_name = first_movie.h3.a.text

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://www.imdb.com/search/title?release_date=2017&sort=num_votes,desc&page=1')
soup = bs(r.content, 'lxml')
titles = [item.text for item in soup.select('.lister-item-index + a')]
print(titles)