如何使用python根据网站标题刮取网站?
我正在抓取包含特定标题的网站。如何使用python根据网站标题刮取网站?,python,http,screen-scraping,Python,Http,Screen Scraping,我正在抓取包含特定标题的网站。 例如,如果example.com/xxxxxxxxx中的x是一个随机数,如果它的标题是404或不是404,我将如何设置它?这将查找页面的标题: import requests from lxml.html import fromstring def Get_PageTitle(url): req = requests.get(url) tree = fromstring(req.content) title = tree.findtext
例如,如果example.com/xxxxxxxxx中的x是一个随机数,如果它的标题是404或不是404,我将如何设置它?这将查找页面的标题:
import requests
from lxml.html import fromstring
def Get_PageTitle(url):
req = requests.get(url)
tree = fromstring(req.content)
title = tree.findtext('.//title')
return title
url = "http://www.google.com"
title = Get_PageTitle(url)
if "404" in title:
#title has 404
print("Title has 404 in it")
else:
#no 404 in title
pass
编辑:
上面的代码检查标题中是否有404。如果您想知道标题是否为404,请使用以下代码:
import requests
from lxml.html import fromstring
def Get_PageTitle(url):
req = requests.get(url)
tree = fromstring(req.content)
title = tree.findtext('.//title')
return title
url = "http://www.google.com"
title = Get_PageTitle(url)
if "404" is title:
#title is 404
print("Title is 404 in it")
print(title)
else:
#title is not 404
pass
你的代码在哪里?我不是在找网站代码,我是在找网页标题好的…这检查标题,看看里面是否有404…这不是你想要的吗?