Python 我正在尝试使用beautifulsoup4和请求库刮取一个网站
我想从这个网站上提取电影的名称、年份和长度 代码如下:Python 我正在尝试使用beautifulsoup4和请求库刮取一个网站,python,beautifulsoup,Python,Beautifulsoup,我想从这个网站上提取电影的名称、年份和长度 代码如下: import requests from bs4 import BeautifulSoup URL = 'https://www4.f2movies.to' page = requests.get(URL) soup = BeautifulSoup(page.content, 'html.parser') #Trending Movies Movies = [] Year = [] Length = [] for a in soup.
import requests
from bs4 import BeautifulSoup
URL = 'https://www4.f2movies.to'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
#Trending Movies
Movies = []
Year = []
Length = []
for a in soup.findAll('a', href=True, attrs={'class':"film-detail film-detail-fix"}):
name=data.find('div', href=True, attrs={'class':'film-name'})
year=data.find('span', href=True, attrs={'class':'fdi-item'})
length=data.find('span', href=True, attrs={'class':'fdi-item fdi-duration'})
Movies.append(name.text)
Year.append(year.text)
Length.append(length.text)
print(Movies)
print(Year)
print(Length)
我得到的结果如下:
(Projects) anildhage@xxx-MacBook-Air WebScrape % python scrape.py
[]
[]
[]
(Projects) anildhage@xxx-MacBook-Air WebScrape %
有人能告诉我哪里出了问题吗?TIA使用
find()
时,您的一些选择器不正确。要获取所有数据,请使用以下示例:
import requests
from bs4 import BeautifulSoup
URL = "https://www4.f2movies.to"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
# Trending Movies
Movies = []
Year = []
Length = []
for data in soup.findAll("div", attrs={"class": "film-detail film-detail-fix"}):
name = data.find("h3", attrs={"class": "film-name"})
year = data.find("span", attrs={"class": "fdi-item"})
length = data.find("span", attrs={"class": "fdi-item fdi-duration"})
if not length:
continue
Movies.append(name.text.strip())
Year.append(year.text)
Length.append(length.text)
print(Movies)
print(Year)
print(Length)
输出:
["Tom Clancy's Without Remorse", 'The Mitchells vs. The Machines', 'Mortal Kombat', 'Things Heard & Seen', 'Demon Slayer the Movie: Mugen Train', 'Voyagers', 'Tom & Jerry', 'Godzilla vs. Kong', 'Justice Society: World War II', 'Nomadland', 'The Virtuoso', 'Shadow in the Cloud', 'Nobody', 'Skylines', "Zack Snyder's Justice League", 'Stowaway', '22 vs. Earth', 'The Marksman', 'The Little Things', 'Wonder Woman 1984', 'Raya and the Last Dragon', 'The Father', 'SAS: Red Notice', 'Come True', 'The Lockdown Hauntings', 'The Bike Thief', 'Generation Por Que', 'Adolescents of Chymera', 'The Darkness', 'The Rise of Sir Longbottom', 'Mexican Moon', "She was the Deputy's Wife", '100m Criminal Conviction', 'Percy', 'The Mitchells vs. The Machines', 'Zombie with a Shotgun', 'Things Heard & Seen', 'Golden Arm', 'Bang! Bang!', 'Colors of Love', 'Three Pints and a Rabbi', 'Eat Wheaties!', "Before I'm Dead", '22 vs. Earth', 'The Outside Story', 'Voyagers', 'Ape vs. Monster', 'Pipeline']
['2021', '2021', '2021', '2021', '2020', '2021', '2021', '2021', '2021', '2020', '2021', '2020', '2021', '2020', '2021', '2021', '2021', '2021', '2021', '2020', '2021', '2020', '2021', '2021', '2021', '2020', '2021', '2021', '2021', '2021', '2021', '2021', '2021', '2021', '2021', '2019', '2021', '2021', '2020', '2021', '2021', '2020', '0000', '2021', '2021', '2021', '2021', '2021']
['109m', '113m', '110m', '121m', '117m', '108m', '90m', '113m', 'N/A', '108m', '105m', '83m', '92m', '110m', '242m', '116m', '5m', '108m', '127m', '151m', '112m', '97m', '120m', '105m', '101m', '79m', 'N/A', '81m', 'N/A', '73m', '84m', '95m', '92m', '109m', '113m', '79m', '121m', '90m', '71m', '110m', '85m', 'N/A', '83m', '5m', '85m', '108m', '90m', '85m']
预计产量是多少?太好了,谢谢你。。。尽管您能提到为什么需要使用-If语句和.strip()属性吗?@AnilDhage 1。有时
length
是None
,因此我们需要检查该值是否为空。2.使用.strip()
将删除额外的空白(请参见删除时的情况)。