如何使用python请求获取网站的服务器信息？_Python_Server_Web Crawler_Python Requests

如何使用python请求获取网站的服务器信息？

python server web-crawler

如何使用python请求获取网站的服务器信息？,python,server,web-crawler,python-requests,Python,Server,Web Crawler,Python Requests,我想制作一个网络爬虫来统计保加利亚网站中最流行的服务器软件，如Apache、nginx等。以下是我的想法： import requests r = requests.get('http://start.bg') print(r.headers) 返回以下内容： {'Debug': 'unk', 'Content-Type': 'text/html; charset=utf-8', 'X-Powered-By': 'PHP/5.3.3', 'Content-Length': '29761'

我想制作一个网络爬虫来统计保加利亚网站中最流行的服务器软件，如Apache、nginx等。以下是我的想法：

import requests
r = requests.get('http://start.bg')
print(r.headers)

返回以下内容：

{'Debug': 'unk', 
'Content-Type': 'text/html; charset=utf-8', 
'X-Powered-By': 'PHP/5.3.3', 
'Content-Length': '29761', 
'Connection': 'close', 
'Set-Cookie': 'fbnr=1; expires=Sat, 13-Feb-2016 22:00:01 GMT; path=/; domain=.start.bg', 
'Date': 'Sat, 13 Feb 2016 13:43:50 GMT', 
'Vary': 'Accept-Encoding', 
'Server': 'Apache/2.2.15 (CentOS)', 
'Content-Encoding': 'gzip'}

在这里，您可以很容易地看到它在Apache/2.2.15上运行，只需说

r.headers['Server']

就可以得到这个结果。我在几个保加利亚网站上试过，它们都有服务器键
但是，当我请求更复杂的网站（如www.teslamotors.com）的标题时，我得到以下信息：

{'Content-Type': 'text/html; charset=utf-8', 'X-Cache-Hits': '9', 'Cache-Control': 'max-age=0, no-cache, no-store', 'X-Content-Type-Options': 'nosniff', 'Connection': 'keep-alive', 'X-Varnish-Server': 'sjc04p1wwwvr11.sjc05.teslamotors.com', 'Content-Language': 'en', 'Pragma': 'no-cache', 'Last-Modified': 'Sat, 13 Feb 2016 13:07:50 GMT', 'X-Server': 'web03a', 'Expires': 'Sat, 13 Feb 2016 13:37:55 GMT', 'Content-Length': '10290', 'Date': 'Sat, 13 Feb 2016 13:37:55 GMT', 'Vary': 'Accept-Encoding', 'ETag': '"1455368870-1"', 'X-Frame-Options': 'SAMEORIGIN', 'Accept-Ranges': 'bytes', 'Content-Encoding': 'gzip'}
正如您所看到的，这本词典中没有任何
['Server']
键（虽然有
X-Server
和
X-Varnish-Server
，我不确定它们的意思，但它的值不是像Apache那样的服务器名称
因此，我认为我可以发送另一个请求，以获得所需的服务器信息，或者他们可能有自己的特定服务器软件（这对facebook来说似乎是合理的）。我还尝试了其他的.com网站，比如，它确实有一个
['Server']
键

那么，有没有办法找到Facebook和特斯拉汽车公司使用的服务器的信息呢？
与python无关，由于安全问题，大多数配置良好的web服务器不会在“服务器”http头中返回信息

任何理智的开发人员都不想让您知道他们正在运行未修补版本的xxx产品。
web服务器可能返回也可能不返回服务器头。不要指望它。请参阅以下问题：好的，有意义。：）但是，Spotify提供了此类信息，但没有版本号。它只写着‘nginx’