如何使用python根据网站标题刮取网站？_Python_Http_Screen Scraping

如何使用python根据网站标题刮取网站？

python http

如何使用python根据网站标题刮取网站？,python,http,screen-scraping,Python,Http,Screen Scraping,我正在抓取包含特定标题的网站。例如，如果example.com/xxxxxxxxx中的x是一个随机数，如果它的标题是404或不是404，我将如何设置它？这将查找页面的标题： import requests from lxml.html import fromstring def Get_PageTitle(url): req = requests.get(url) tree = fromstring(req.content) title = tree.findtext

我正在抓取包含特定标题的网站。

例如，如果example.com/xxxxxxxxx中的x是一个随机数，如果它的标题是404或不是404，我将如何设置它？

这将查找页面的标题：

import requests
from lxml.html import fromstring

def Get_PageTitle(url):
    req = requests.get(url)
    tree = fromstring(req.content)
    title = tree.findtext('.//title')
    return title


url = "http://www.google.com"
title = Get_PageTitle(url)

if "404" in title:
    #title has 404
    print("Title has 404 in it")

else:
    #no 404 in title
    pass

编辑：

上面的代码检查标题中是否有404。如果您想知道标题是否为404，请使用以下代码：

import requests
from lxml.html import fromstring

def Get_PageTitle(url):
    req = requests.get(url)
    tree = fromstring(req.content)
    title = tree.findtext('.//title')
    return title


url = "http://www.google.com"
title = Get_PageTitle(url)

if "404" is title:
    #title is 404
    print("Title is 404 in it")
    print(title)

else:
    #title is not 404
    pass

你的代码在哪里？我不是在找网站代码，我是在找网页标题好的…这检查标题，看看里面是否有404…这不是你想要的吗？