Python 检查是否存在请求为Isn'；行不通_Python_Web Scraping

Python 检查是否存在请求为Isn'；行不通

python web-scraping

Python 检查是否存在请求为Isn'；行不通,python,web-scraping,Python,Web Scraping,所以，几天前我就知道了网络抓取是如何工作的，今天我在胡闹。我想知道如何测试页面是否存在。所以，我查了一下，发现了。我正在使用请求模块，我从答案中获得了以下代码： import requests request = requests.get('http://www.example.com') if request.status_code == 200: print('Web site exists') else: print('Web site does not exist')

所以，几天前我就知道了网络抓取是如何工作的，今天我在胡闹。我想知道如何测试页面是否存在。所以，我查了一下，发现了。我正在使用

请求

模块

，我从答案中获得了以下代码：

import requests
request = requests.get('http://www.example.com')
if request.status_code == 200:
    print('Web site exists')
else:
    print('Web site does not exist')

我试过了，因为example.com存在，所以它打印了“网站存在”。然而，我尝试了一些我确信不存在的东西，比如exampleww.com，它给了我一个答案。它为什么要这样做？我如何防止它打印出错误（而不是说该网站不存在）？

您必须附上

请求。使用try/except获得调用，并处理可能出现的各种异常，其中之一是ConnectionError

之所以会出现这种情况，是因为响应status\u code
不等于200和无法连接到所需的HTTP地址是两件不同的事情
是使用请求库发出请求时可能遇到的异常。
您必须附上请求。使用获取调用try/except
并处理可能出现的各种异常，其中之一是ConnectionError

之所以会出现这种情况，是因为响应status\u code
不等于200和无法连接到所需的HTTP地址是两件不同的事情
是使用请求库发出请求时可能遇到的例外情况。
您可以像这样使用try/except：
import requests
from requests.exceptions import ConnectionError

try:
    request = requests.get('http://www.example.com')
except ConnectionError:
    print('Web site does not exist')
else:
    print('Web site exists')

您可以像这样使用try/except：
import requests
from requests.exceptions import ConnectionError

try:
    request = requests.get('http://www.example.com')
except ConnectionError:
    print('Web site does not exist')
else:
    print('Web site exists')

您之所以会收到错误，是因为您想要获取的url无效，但是您可以使用try
-来轻松检查此错误，除了此块之外：
import requests
from requests.exceptions import MissingSchema

try:
    request = requests.get('examplewwwwwww.com')
except MissingSchema:
    print('The provided URL is invalid.')

您之所以会收到错误，是因为您想要获取的url无效，但是您可以使用try
-来轻松检查此错误，除了此块之外：
import requests
from requests.exceptions import MissingSchema

try:
    request = requests.get('examplewwwwwww.com')
except MissingSchema:
    print('The provided URL is invalid.')

仅列出我的做法，也许对某人有价值：
  try:
     response = requests.get('https://github.com')
     if response.ok:
        ready = 1
        break
  except requests.exceptions.RequestException:
     print("Website not availabe...")

仅列出我的做法，也许对某人有价值：
  try:
     response = requests.get('https://github.com')
     if response.ok:
        ready = 1
        break
  except requests.exceptions.RequestException:
     print("Website not availabe...")

正如该页面所示，它抛出一个ConnectionError，因为那里没有服务器提供状态信息。阅读你发布的链接的评论，并使用类似于try。。。除了ConnectionError
。有些网站会阻止你认为这是一次刮取尝试，因为你知道你不是一个真正的浏览器，因为你的用户代理和其他功能。这就解释了为什么一些被404拒绝的URL在页面显示的浏览器中确实起作用，它抛出了一个ConnectionError，或者没有服务器给你一个状态。阅读你发布的链接的评论，并使用类似于try。。。除了ConnectionError
。有些网站会阻止你认为这是一次刮取尝试，因为你知道你不是一个真正的浏览器，因为你的用户代理和其他功能。这就解释了为什么一些被404拒绝的URL在浏览器中可以正常工作