Python 3.x 如何在使用for循环请求时忽略HTTP错误?
这是我的代码,用于检查多个URL中的特定关键字,并在是否找到该关键字时写入输出文件Python 3.x 如何在使用for循环请求时忽略HTTP错误?,python-3.x,error-handling,python-requests,python-requests-html,Python 3.x,Error Handling,Python Requests,Python Requests Html,这是我的代码,用于检查多个URL中的特定关键字,并在是否找到该关键字时写入输出文件 import requests import pandas as pd from bs4 import BeautifulSoup df = pd.read_csv('/path/to/input.csv') urls = df.T.values.tolist()[2] myList= [] for url in urls: url_1 = url keyword ='myKeyword'
import requests
import pandas as pd
from bs4 import BeautifulSoup
df = pd.read_csv('/path/to/input.csv')
urls = df.T.values.tolist()[2]
myList= []
for url in urls:
url_1 = url
keyword ='myKeyword'
res = requests.get(url_1)
finalresult= print(keyword in res.text)
if finalresult == False:
myList.append("NOT OK")
else:
myList.append("OK")
df["myList"] = pd.DataFrame(myList, columns=['myList'])
df.to_csv('/path/to/output.csv', index=False)
但是,一旦我的多个URL中的任何一个关闭并且出现HTTP错误,脚本将停止并显示以下错误:
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='argos-yoga.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x122582d90>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))
raiseconnectionError(e,request=request)
requests.exceptions.ConnectionError:HTTPSConnectionPool(host='argos-yoga.com',port=443):url:/(由NewConnectionError引起(':未能建立新连接:[Errno 8]提供了节点名或服务名,或未知])超过了最大重试次数
如何忽略这些错误并让脚本继续扫描?有人能帮我吗?thx您只需使用“除此之外尝试”方式即可 例如:
import requests
import pandas as pd
from bs4 import BeautifulSoup
df = pd.read_csv('/path/to/input.csv')
urls = df.T.values.tolist()[2]
myList= []
for url in urls:
url_1 = url
keyword ='myKeyword'
try:
res = requests.get(url_1)
finalresult = keyword in res.text
print(finalresult)
if finalresult == False:
myList.append("NOT OK")
else:
myList.append("OK")
except Exception as e:
print(f"There was an error, error = {e}")
pass
df["myList"] = pd.DataFrame(myList, columns=['myList'])
df.to_csv('/path/to/output.csv', index=False)
尝试将
Try..except
仅放在requests.get()和res.text
周围
例如:
import requests
import pandas as pd
from bs4 import BeautifulSoup
df = pd.read_csv('/path/to/input.csv')
urls = df.T.values.tolist()[2]
myList= []
for url in urls:
url_1 = url
keyword ='myKeyword'
try: # <-- put try..except here
res = requests.get(url_1)
finalresult = keyword in res.text # <-- remove print()
except:
finalresult = False
if finalresult == False:
myList.append("NOT OK")
else:
myList.append("OK")
df["myList"] = pd.DataFrame(myList, columns=['myList'])
df.to_csv('/path/to/output.csv', index=False)
谢谢你,艾哈迈德!我已经尝试了上面的代码,但是如果finalresult==False,它不会添加'notOK'。我得到所有URL的“OK”。你知道我该如何解决这个问题吗?我不确定,所以当最终结果等于False时,它会说OK?好的,是的,我发现你需要添加finalresult=关键字is res.text因为你刚刚将它分配给一个print语句,我编辑了代码。试试看,告诉我它是否有效。这是可行的!谢谢你,安德烈。我只是想知道,如何为HTTP错误添加一个标志。例如,如果这是我在输入文件“argos yoga.com”中的URL,我希望将其标记为“Down”,而不是“OK”,因为页面不工作。我可以在代码中的某个地方添加这样的内容吗:例外情况除外,例如e:print(f“有一个错误,error={e}”)myList(“Down”)pass
我询问的原因是,当我从终端运行此脚本时,最好知道哪些URL也会抛出HTTP错误,并将它们保存在我的errorLog.txt中。非常感谢。
for url in urls:
url_1 = url
keyword ='myKeyword'
try: # <-- put try..except here
res = requests.get(url_1)
if keyword in res.text:
myList.append("OK")
else:
myList.append("NOT OK")
except:
myList.append("Down")