使用beautifulsoup python进行刮取时出现Internel服务错误
我试图使用下面的代码,通过遍历一些电影的URL,在循环中删除它们的类型使用beautifulsoup python进行刮取时出现Internel服务错误,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我试图使用下面的代码,通过遍历一些电影的URL,在循环中删除它们的类型 import random import time import csv import bs4 import requests from bs4 import BeautifulSoup from csv import reader values=[] with open('mubi_movie_data.csv', 'r') as read_obj: # pass the file object to reade
import random
import time
import csv
import bs4
import requests
from bs4 import BeautifulSoup
from csv import reader
values=[]
with open('mubi_movie_data.csv', 'r') as read_obj:
# pass the file object to reader() to get the reader object
csv_reader = reader(read_obj)
# Iterate over each row in the csv using reader object
for row in csv_reader:
# row variable is a list that represents a row in csv
url = row[3]
if url == 'movie_url':
continue
else:
print(url)
r: requests.Response = requests.get(url)
soup: bs4.BeautifulSoup = BeautifulSoup(r.content, "html.parser")
for soup in soup.find_all("div", {"class": "css-1wuve65 eyplj6810"}):
print(soup)
print(type(soup))
# Parse company name.
name = soup.text
values.append(name)
thecsv = csv.writer(open("your.csv", 'wb'))
for value in values:
thecsv.writerow(value)
在最初的10到15次擦伤中,一切都很顺利。然后网站阻止了我访问它,阻止了我的抓取,并在我尝试重新加载页面时给了我内部服务错误。我已经做了一些搜索,我必须更改用户代理来解决这个问题,但不知道如何使用python。我对网页抓取是完全陌生的。有人可以帮忙吗?URL的网站请?这是电影的URL,而不仅仅是一个URL。基本url为,之后是许多其他电影。如。