Python 3.x 使用python3检查网站是否存在_Python 3.x

Python 3.x 使用python3检查网站是否存在

python-3.x

Python 3.x 使用python3检查网站是否存在,python-3.x,Python 3.x,对不起，如果这是重复的，我已经找了大约一个小时的答案，但似乎找不到任何答案。无论如何，我有一个充满URL的文本文件，我想检查每个URL，看看它是否存在。我需要一些帮助来理解错误消息，如果有任何方法来修复它，或者我可以使用不同的方法这是我的密码 import requests filepath = 'url.txt' with open(filepath) as fp: url = fp.readline() count = 1 while count != 677:

对不起，如果这是重复的，我已经找了大约一个小时的答案，但似乎找不到任何答案。无论如何，我有一个充满URL的文本文件，我想检查每个URL，看看它是否存在。我需要一些帮助来理解错误消息，如果有任何方法来修复它，或者我可以使用不同的方法

这是我的密码

import requests

filepath = 'url.txt'  
with open(filepath) as fp:  
   url = fp.readline()
   count = 1
   while count != 677: #Runs through each line of my txt file
      print(url)
      request = requests.get(url) #Here is where im getting the error
      if request.status_code == 200:
          print('Web site exists')
      else:
        print('Web site does not exist')
      url = url.strip()
      count += 1

这就是输出

http://www.pastaia.co

Traceback (most recent call last):
File "python", line 9, in <module>
requests.exceptions.ConnectionError: 
HTTPConnectionPool(host='www.pastaia.co%0a', port=80): Max retries exceeded 
with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection 
object at 0x7fca82769e10>: Failed to establish a new connection: [Errno -2] 
Name or service not known',))

http://www.pastaia.co
回溯（最近一次呼叫最后一次）：
文件“python”，第9行，在
requests.exceptions.ConnectionError：
HTTPConnectionPool（host='www.pastaia.co%0a'，port=80）：超过最大重试次数
url为：/（由NewConnectionError（'）：无法建立新连接：[Errno-2]
名称或服务未知'，））

我会给你一些想法让你开始，整个职业生涯都是围绕着攀爬而建立的：）顺便说一句，似乎刚刚开始。这是一个很大的技巧，如何处理意外时，爬网。准备好的我们走吧

import requests

filepath = 'url.txt'
with open(filepath) as fp:
    for url in fp:
        print(url)
        try:
            request = requests.get(url) #Here is where im getting the error
            if request.status_code == 200:
                print('Web site exists')
        except:
            print('Web site does not exist')

将其设为
```
for
```
循环，您只想循环整个文件，对吗

做一个

尝试

和

除非它因为任何原因爆炸，比如坏的DNS
，非200
返回，可能是.pdf
页面，网络是狂野的西部。这样，代码就不会崩溃，您可以检查列表中的下一个站点，并根据需要记录错误


你也可以在那里添加其他类型的条件，也许页面需要一定的长度？仅仅因为它是一个响应代码
200
并不总是意味着页面是有效的，只是网站返回了成功
，但这是一个很好的开始
考虑在您的请求中添加一个用户代理
，您可能希望模拟浏览器，或者让您的程序将自己标识为super-bot 9000
如果您想进一步了解文本的爬网和解析，请查看使用beautifulsoup
：

该网站似乎不提供网络流量：
最可能的情况是，请求模块的get（）
函数正在尝试多次连接到url。它最终达到了自己的内部重试限制，此时它抛出了一个ConnectionError
异常
我会将这一行包装在try-catch块中，以捕获错误（因此表示该网站不存在：
try:
    request = requests.get(url)
    if request.status_code == 200:
        print('Web site exists')
    else:
        print("Website returned response code: {code}".format(code=request.status_code))
except ConnectionError:
    print('Web site does not exist')
    continue;

谢谢！我修复了错误，但由于某些原因，请求.get（url）似乎不起作用。我将url替换为，它运行良好，但在阅读时url=fp.readline（）不起作用。你知道为什么吗？别介意我使用request=requests.get（url.strip（））修复了它