Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/329.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 对于循环web抓取,网站会显示timeouterror、newconnectionerror和requests.exceptions.ConnectionError_Python_Web Scraping_Beautifulsoup_Python Requests_Timeoutexception - Fatal编程技术网

Python 对于循环web抓取,网站会显示timeouterror、newconnectionerror和requests.exceptions.ConnectionError

Python 对于循环web抓取,网站会显示timeouterror、newconnectionerror和requests.exceptions.ConnectionError,python,web-scraping,beautifulsoup,python-requests,timeoutexception,Python,Web Scraping,Beautifulsoup,Python Requests,Timeoutexception,抱歉,我是Python和Web垃圾的开始 我正在抓取网页以提取输入字符的读数。我制作了一个10273个字符的列表,将其格式化为URL,并打开带有读数的页面,然后我使用Requests模块返回源代码,然后Beauty Soup返回所有音频ID(因为它们的字符串包含输入字符的读数-我无法使用表中出现的文本,因为它们是SVG)。然后我尝试将字符及其读数输出到out.txt # -*- coding: utf-8 -*- import requests, time from bs4 import Bea

抱歉,我是Python和Web垃圾的开始

我正在抓取网页以提取输入字符的读数。我制作了一个10273个字符的列表,将其格式化为URL,并打开带有读数的页面,然后我使用Requests模块返回源代码,然后Beauty Soup返回所有音频ID(因为它们的字符串包含输入字符的读数-我无法使用表中出现的文本,因为它们是SVG)。然后我尝试将字符及其读数输出到out.txt

# -*- coding: utf-8 -*-
import requests, time
from bs4 import BeautifulSoup
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

characters = [
#characters go here
]

output = open("out.txt", "a", encoding="utf-8")

tic = time.perf_counter()

for char in characters:
    # Characters from the list are formatted into the url 
    url = "https://wugniu.com/search?char=%s&table=wenzhou" % char

    page = requests.get(url, verify=False)
    soup = BeautifulSoup(page.text, 'html.parser')

    for audio_tag in soup.find_all('audio'):
        audio_id = audio_tag.get('id').replace("0-","")
        #output.write(char)
        #output.write("  ")
        #output.write(audio_id)
        #output.write("\n")
        print(i)
        time.sleep(60)

output.close()
toc = time.perf_counter()
duration = int(toc) - int(tic)
print("Took %d seconds" % duration)
out.txt
是我试图将结果输出到的输出文件。我衡量了衡量绩效的过程所用的时间

但是,经过50次左右的循环后,我在cmd中得到了以下结果:

Traceback (most recent call last):                                                                                       
 File "C:\Users\[user]\Documents\wenzhou-ime\env\lib\site-packages\urllib3\connection.py", line 169, in _new_conn           
conn = connection.create_connection(                                                                                 
 File "C:\Users\[user]\Documents\wenzhou-ime\env\lib\site-packages\urllib3\util\connection.py", line 96, in create_connection                                                                                                                       
raise err                                                                                                             
File "C:\Users\[user]\Documents\wenzhou-ime\env\lib\site-packages\urllib3\util\connection.py", line 86, in create_connection                                                                                                                           
sock.connect(sa)                                                                                                    
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond                                                                                                                                                   
During handling of the above exception, another exception occurred:                                                                                                                                                                                     
Traceback (most recent call last):                                                                                       
File"C:\Users\[user]\Documents\wenzhou-ime\env\lib\site-packages\urllib3\connectionpool.py", line 699, in urlopen         httplib_response = self._make_request(                                                                                
File "C:\Users\[user]\Documents\wenzhou-ime\env\lib\site-packages\urllib3\connectionpool.py", line 382, in _make_request                                                                                                                               
self._validate_conn(conn)                                                                                            
 File "C:\Users\[user]\Documents\wenzhou-ime\env\lib\site-packages\urllib3\connectionpool.py", line 1010, in _validate_conn                                                                                                                             
conn.connect()                                                                                                       
File "C:\Users\[user]\Documents\wenzhou-ime\env\lib\site-packages\urllib3\connection.py", line 353, in connect             
conn = self._new_conn()                                                                                                   
File "C:\Users\[user]\Documents\wenzhou-ime\env\lib\site-packages\urllib3\connection.py", line 181, in _new_conn           
raise NewConnectionError(urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x000002035D5F9040>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond                                                                                                                                         
During handling of the above exception, another exception occurred:                                                                                                                                                                                     
Traceback (most recent call last):                                                                                        
File "C:\Users\[user]\Documents\wenzhou-ime\env\lib\site-packages\requests\adapters.py", line 439, in send                 
resp = conn.urlopen(                                                                                                  
File "C:\Users\[user]\Documents\wenzhou-ime\env\lib\site-packages\urllib3\connectionpool.py", line 755, in urlopen         
retries = retries.increment(                                                                                          
File "C:\Users\[user]\Documents\wenzhou-ime\env\lib\site-packages\urllib3\util\retry.py", line 573, in increment           
raise MaxRetryError(_pool, url, error or ResponseError(cause))                                                      urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='wugniu.com', port=443): Max retries exceeded with url: /search?char=%E8%87%B4&table=wenzhou (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002035D5F9040>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))                                                                                                                                                                                                                                      
During handling of the above exception, another exception occurred:                                                                                                                                                                             
Traceback (most recent call last):                                                                                        
File "C:\Users\[user]\Documents\wenzhou-ime\test.py", line 3282, in <module>                                               page = requests.get(url, verify=False)                                                                                
File "C:\Users\[user]\Documents\wenzhou-ime\env\lib\site-packages\requests\api.py", line 76, in get                        
return request('get', url, params=params, **kwargs)                                                                   File "C:\Users\[user]\Documents\wenzhou-ime\env\lib\site-packages\requests\api.py", line 61, in request                    
return session.request(method=method, url=url, **kwargs)                                                              File "C:\Users\[user]\Documents\wenzhou-ime\env\lib\site-packages\requests\sessions.py", line 542, in request              
resp = self.send(prep, **send_kwargs)                                                                                 File "C:\Users\[user]\Documents\wenzhou-ime\env\lib\site-packages\requests\sessions.py", line 655, in send                 
r = adapter.send(request, **kwargs)                                                                                   File "C:\Users\[user]\Documents\wenzhou-ime\env\lib\site-packages\requests\adapters.py", line 516, in send                 
raise ConnectionError(e, request=request)                                                                                                                                   
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='wugniu.com', port=443): Max retries exceeded with url: /search?char=%E8%87%B4&table=wenzhou (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002035D5F9040>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))     
回溯(最近一次呼叫最后一次):
文件“C:\Users\[user]\Documents\ime\env\lib\site packages\urllib3\connection.py”,第169行,位于康涅狄格州新州
conn=连接。创建连接(
文件“C:\Users\[user]\Documents\ime\env\lib\site packages\urllib3\util\connection.py”,第96行,位于create\u connection中
提出错误
文件“C:\Users\[user]\Documents\ime\env\lib\site packages\urllib3\util\connection.py”,第86行,位于创建\u连接中
sock.connect(sa)
TimeoutError:[WinError 10060]由于连接方在一段时间后没有正确响应,连接尝试失败;或者由于连接的主机未能响应,建立的连接失败
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
文件“C:\Users\[user]\Documents\ime\env\lib\site packages\urllib3\connectionpool.py”,第699行,位于urlopen httplib\u response=self.\u发出请求(
文件“C:\Users\[user]\Documents\ime\env\lib\site packages\urllib3\connectionpool.py”,第382行,在请求中
自我验证连接(连接)
文件“C:\Users\[user]\Documents\ime\env\lib\site packages\urllib3\connectionpool.py”,第1010行,在\u validate\u conn中
连接
文件“C:\Users\[user]\Documents\ime\env\lib\site packages\urllib3\connection.py”,第353行,在connect中
conn=自我。_new_conn()
文件“C:\Users\[user]\Documents\ime\env\lib\site packages\urllib3\connection.py”,位于康涅狄格州新州第181行
raise NewConnectionError(urllib3.exceptions.NewConnectionError::未能建立新连接:[WinError 10060]由于连接方在一段时间后没有正确响应,连接尝试失败;或者由于连接的主机未能响应,建立的连接失败
在处理上述异常期间,发生了另一个异常:
回溯(最近一次呼叫最后一次):
文件“C:\Users\[user]\Documents\ime\env\lib\site packages\requests\adapters.py”,第439行,在send中
resp=conn.urlopen(
文件“C:\Users\[user]\Documents\ime\env\lib\site packages\urlib3\connectionpool.py”,第755行,在urlopen中
重试次数=重试次数。增量(
文件“C:\Users\[user]\Documents\ime\env\lib\site packages\urlib3\util\retry.py”,第573行,增量
raise MaxRetryError(_pool,url,error or ResponseError(cause))urllib3.exceptions.MaxRetryError:HTTPSConnectionPool(host='wugniu.com',port=443):url超过最大重试次数:/search?char=%E8%87%B4&table=温州(由NewConnectionError引起(':未能建立新连接:[WinError 10060]由于连接方在一段时间后没有正确响应,连接尝试失败,或者由于连接的主机未能响应,建立的连接失败
output = open("out.txt", "a", encoding="utf-8")
output.close()
with open('out.txt', 'w', newline='', encoding='utf-8') as output:
    # here you can do your operation.
url = "https://wugniu.com/search?char=%s&table=wenzhou" % char
"https://wugniu.com/search?char={}&table=wenzhou".format(char)
import requests
from bs4 import BeautifulSoup
import urllib3

urllib3.disable_warnings()


def main(url, chars):
    with open('result.txt', 'w', newline='', encoding='utf-8') as f, requests.Session() as req:
        req.verify = False
        for char in chars:
            print(f"Extracting {char}")
            r = req.get(url.format(char))
            soup = BeautifulSoup(r.text, 'lxml')
            target = [x['id'][2:] for x in soup.select('audio[id^="0-"]')]
            print(target)
            f.write(f'{char}\n{str(target)}\n')


if __name__ == "__main__":
    chars = ['核']
    main('https://wugniu.com/search?char={}&table=wenzhou', chars)