Html 用于刮网的靓汤一无所获_Html_Web Scraping_Beautifulsoup_Python Requests_Web Crawler

Html 用于刮网的靓汤一无所获

html web-scraping web-crawler

Html 用于刮网的靓汤一无所获,html,web-scraping,beautifulsoup,python-requests,web-crawler,Html,Web Scraping,Beautifulsoup,Python Requests,Web Crawler,我试图从谷歌搜索引擎中提取电影的imdb评级。每次它都返回none，尽管id是正确的。如果您尝试在DOM中的appbar之前查找： import requests from bs4 import BeautifulSoup #Finds the imdb rating of a given movie or TV series search_term1="What is the imdb rating of " search_term2=input("Enter the name of the

我试图从谷歌搜索引擎中提取电影的imdb评级。每次它都返回none，尽管id是正确的。

如果您尝试在DOM中的appbar之前查找

：
import requests
from bs4 import BeautifulSoup
#Finds the imdb rating of a given movie or TV series
search_term1="What is the imdb rating of "
search_term2=input("Enter the name of the movie or TV Series : ")
search_term=search_term1+search_term2
response=requests.get("https://www.google.co.in/search?q="+search_term)

soup = BeautifulSoup(response.text, 'html5lib')
match=soup.find('div.slp.f')
#i tried 'div',_class="slp.f"
print(match) #this line is returning none

输出为假
很明显，“在appbar之前”不是这里任何元素的Id
我猜您正试图通过从浏览器中检查DOM元素来确定它。然而，在大多数情况下，JS对DOM做了很多更改，因此它与python中使用请求所得到的结果不匹配
我可以向您推荐两种可能的解决方案：
将响应保存在html文件中，在浏览器中打开，然后
检查需要查找的元素
import requests
from bs4 import BeautifulSoup
#Finds the imdb rating of a given movie or TV series
search_term1="What is the imdb rating of "
search_term2=input("Enter the name of the movie or TV Series : ")
search_term=search_term1+search_term2
response=requests.get("https://www.google.co.in/search?q="+search_term)
print("before-appbar" in response.text)


使用和无头浏览器
问题取决于您试图搜索id的方式，而不是
打印（soup.find（id=“在appbar之前”）
使用print（soup.find（{“id”：“在appbar之前”}））

希望这能解决问题。
这是因为您的脚本会引导您进入验证码页面。尝试使用print（response.url）
进行检查。我猜返回的url和请求的url不一样。此外，您的搜索词应该正确编码，我在您的脚本中没有看到任何这样的尝试。如果您请求的url尚未重定向，请尝试使用quote\u plus（搜索词）
，然后再从urllib.parse import quote\u plus
使用。
f = open("response.html", "w")
f.write(response.text)
f.close()