Python 3.x 499-500次代码运行后的递归。。。为什么？谢谢你的帮助。新颖的网络刮擦_Python 3.x_Recursion_Web Scraping_Beautifulsoup_Selenium Chromedriver

Python 3.x 499-500次代码运行后的递归。。。为什么？谢谢你的帮助。新颖的网络刮擦

python-3.x recursion web-scraping

Python 3.x 499-500次代码运行后的递归。。。为什么？谢谢你的帮助。新颖的网络刮擦,python-3.x,recursion,web-scraping,beautifulsoup,selenium-chromedriver,Python 3.x,Recursion,Web Scraping,Beautifulsoup,Selenium Chromedriver,使用Chrome90和Python3.9。所有导入都已完全更新，因为我刚刚安装了它们由于我有一个坏的ISP，我制作了这个脚本，将小说从互联网复制到文本文件，以便在我的互联网断开时离线观看。在递归错误出现之前，这个脚本基本上都可以工作，然后我必须手动进入并在设置之后更改章节。我对代码的预期结果是运行，直到小说完全复制（从第1章到第3章）到文本文件，无论有多少章在我复制了499或500章之后，总是会出现递归错误。我不知道为什么它会这么低，也不知道它是如何得到这个错误的。我已经读到递归错误通常在9

使用Chrome90和Python3.9。所有导入都已完全更新，因为我刚刚安装了它们

由于我有一个坏的ISP，我制作了这个脚本，将小说从互联网复制到文本文件，以便在我的互联网断开时离线观看。在递归错误出现之前，这个脚本基本上都可以工作，然后我必须手动进入并在设置之后更改章节。我对代码的预期结果是运行，直到小说完全复制（从第1章到第3章）到文本文件，无论有多少章

在我复制了499或500章之后，总是会出现递归错误。我不知道为什么它会这么低，也不知道它是如何得到这个错误的。我已经读到递归错误通常在999次迭代之后出现

错误：：（前两行重复了很长时间）

文件“C:\Users\james\Documents\novers\PEERLESS武术神灵\novel.py”，第42行，在CopyChapter中
下一章（）
下一章第49行的文件“C:\Users\james\Documents\novers\PEERLESS武神\novel.py”
link=驱动程序。通过链接文本查找元素（cLink）
文件“C:\Program Files\Python39\lib\site packages\selenium\webdriver\remote\webdriver.py”，第428行，按链接查找元素
返回self.find\u元素（by=by.LINK\u TEXT，value=LINK\u TEXT）
文件“C:\Program Files\Python39\lib\site packages\selenium\webdriver\remote\webdriver.py”，第976行，在find\u元素中
返回self.execute（Command.FIND_元素{
文件“C:\Program Files\Python39\lib\site packages\selenium\webdriver\remote\webdriver.py”，执行中第319行
响应=self.command\u executor.execute（driver\u command，params）
文件“C:\Program Files\Python39\lib\site packages\selenium\webdriver\remote\remote\u connection.py”，执行中第374行
返回self.\u请求（命令信息[0]，url，正文=数据）
文件“C:\Program Files\Python39\lib\site packages\selenium\webdriver\remote\remote\u connection.py”，第397行，在\u请求中
resp=self.\u conn.request（方法、url、body=body、headers=headers）
文件“C:\Program Files\Python39\lib\site packages\urllib3\request.py”，第78行，在请求中
返回self.request\u encode\u body(
文件“C:\Program Files\Python39\lib\site packages\urllib3\request.py”，第170行，在请求\u编码\u正文中
返回self.urlopen（方法，url，**额外\u kw）
文件“C:\Program Files\Python39\lib\site packages\urllib3\poolmanager.py”，第375行，在urlopen中
response=conn.urlopen（方法，u.request\u uri，**kw）
文件“C:\Program Files\Python39\lib\site packages\urllib3\connectionpool.py”，第699行，在urlopen中
httplib\u response=self.\u发出请求(
文件“C:\Program Files\Python39\lib\site packages\urllib3\connectionpool.py”，第445行，在请求中
六、从（e，无）中提高
文件“”，第3行，从
文件“C:\Program Files\Python39\lib\site packages\urllib3\connectionpool.py”，第440行，在请求中
httplib_response=conn.getresponse（）
getresponse中的文件“C:\Program Files\Python39\lib\http\client.py”，第1347行
response.begin（）
文件“C:\Program Files\Python39\lib\http\client.py”，第331行，在begin中
self.headers=self.msg=parse_headers（self.fp）
文件“C:\Program Files\Python39\lib\http\client.py”，第225行，在parse_头中
返回email.parser.parser（_class=_class）.parsestr（hstring）
parsestr中的文件“C:\Program Files\Python39\lib\email\parser.py”，第67行
返回self.parse（StringIO（text），headersonly=headersonly）
文件“C:\Program Files\Python39\lib\email\parser.py”，第56行，在parse中
feedparser.feed（数据）
文件“C:\Program Files\Python39\lib\email\feedparser.py”，第176行，在提要中
self.\u调用\u parse（）
文件“C:\Program Files\Python39\lib\email\feedparser.py”，第180行，在调用解析中
self._parse（）
文件“C:\Program Files\Python39\lib\email\feedparser.py”，第295行，在_parsegen中
如果self.\u cur.get\u content\u maintype（）=“消息”：
文件“C:\Program Files\Python39\lib\email\message.py”，第594行，在get\u content\u maintype中
ctype=self.get\u content\u type（）
文件“C:\Program Files\Python39\lib\email\message.py”，第578行，get\U内容类型
value=self.get（'content-type'，缺少）
get中第471行的文件“C:\Program Files\Python39\lib\email\message.py”
返回self.policy.header\u fetch\u parse（k，v）
文件“C:\Program Files\Python39\lib\email\\u policybase.py”，第316行，在标题\u fetch\u parse中
返回self.\u清理\u头（名称、值）
文件“C:\Program Files\Python39\lib\email\\u policybase.py”，第287行，在\u sanitize\u标题中
如果_有_代理项（值）：
文件“C:\Program Files\Python39\lib\email\utils.py”，第57行，在\u中有\u代理项
s、 编码（）
RecursionError:调用Python对象时超出了最大递归深度

代码：：

#! python3
import requests
import bs4 as BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.chrome.options import Options
from unidecode import unidecode

CHROMEDRIVER_PATH = 'C:\Program Files\Python39\chromedriver.exe'

NovelChapter = 'peerless-martial-god/chapter-1-spirit-awakening.html'
BaseURL = 'https://novelfull.com'
url = '%(U)s/%(N)s' % {'U': BaseURL, "N": NovelChapter}

options = Options()
options.add_argument("--headless") # Runs Chrome in headless mode.
driver = webdriver.Chrome(CHROMEDRIVER_PATH, options=options)
driver.get(url)

def Close():
    driver.stop_client()
    driver.close()
    driver.quit()

# start copy of chapter and add to a file
def CopyChapter():
    global soup
    soup = BeautifulSoup.BeautifulSoup(driver.page_source, 'html.parser')
    readables = soup.find(id='chapter-content')
    name = driver.title
    filename = name.replace('<',' ').replace('"',' ').replace('>',' ').replace('/',' ').replace("|",' ').replace("?",' ').replace("*",' ').replace(":", ' -').replace('Read ',"").replace(' online free from your Mobile, Table, PC... Novel Updates Daily ',"").replace(' online free - Novel Full',"")
    file_name = (filename + '.txt')
    print(file_name)
    data = ''
    for data in soup.find_all("p"):
        myfile = open(file_name, 'a+')
        myfile.write(unidecode(data.get_text())+'\n'+'\n')
        myfile.close()
    global lastURL
    lastURL = driver.current_url
    print('**********Chapter Copied!**********')
    NextChapter()
# end copy of chapter and add to a file

# start goto next chapter if exists then return to copy chapter else Close()
def NextChapter():
    bLink = soup.find(id = "next_chap")
    cLink = 'Next Chapter'
    link = driver.find_element_by_link_text(cLink)
    link.click()
    global currentURL
    currentURL = driver.current_url
    if currentURL != lastURL:
        CopyChapter()
    else:
        print('Finished!!!')
        Close()
# end goto next chapter if exists then return to copy chapter else Close()

CopyChapter()
#EOF

#！蟒蛇3
导入请求
将bs4作为BeautifulSoup导入
从selenium导入webdriver
从selenium.webdriver.support.wait导入WebDriverWait
从selenium.webdriver.chrome.options导入选项
从unidecode导入unidecode
CHROMEDRIVER\u路径='C:\Program Files\Python39\CHROMEDRIVER.exe'
NovelChapter='无与伦比的武神/chapter-1-spirit-awaking.html'
BaseURL=https://novelfull.com'
url='%（U）s/%（N）s'{'U'：BaseURL，“N”：NovelChapter}
选项=选项（）
选项。添加参数（“--headless”）#在headless模式下运行Chrome。
driver=webdriver.Chrome（CHROMEDRIVER\u路径，options=options）
获取驱动程序（url）
def Close（）：
driver.stop_client（）
驱动程序关闭（）
driver.quit（）
#开始复制章节并添加到文件中
def CopyChapter（）：
全球汤
soup=beautifulsou.beautifulsou（driver.page_source'html.parser'）
readables=soup.find（id='chapter-content'）
name=driver.title
filename=name.replace（“看起来不像defs那么好，但它可以完美地满足我的需要。添加了一些东西，例如为文本文件创建文件夹，并从章节列表页面开始。可能有很多东西可以优化，但它可以工作，这就是我所需要的
#! python3
import requests
import bs4 as BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.chrome.options import Options
from unidecode import unidecode

CHROMEDRIVER_PATH = 'C:\Program Files\Python39\chromedriver.exe'

NovelChapter = 'peerless-martial-god/chapter-1-spirit-awakening.html'
BaseURL = 'https://novelfull.com'
url = '%(U)s/%(N)s' % {'U': BaseURL, "N": NovelChapter}

options = Options()
options.add_argument("--headless") # Runs Chrome in headless mode.
driver = webdriver.Chrome(CHROMEDRIVER_PATH, options=options)
driver.get(url)

def Close():
    driver.stop_client()
    driver.close()
    driver.quit()

# start copy of chapter and add to a file
def CopyChapter():
    global soup
    soup = BeautifulSoup.BeautifulSoup(driver.page_source, 'html.parser')
    readables = soup.find(id='chapter-content')
    name = driver.title
    filename = name.replace('<',' ').replace('"',' ').replace('>',' ').replace('/',' ').replace("|",' ').replace("?",' ').replace("*",' ').replace(":", ' -').replace('Read ',"").replace(' online free from your Mobile, Table, PC... Novel Updates Daily ',"").replace(' online free - Novel Full',"")
    file_name = (filename + '.txt')
    print(file_name)
    data = ''
    for data in soup.find_all("p"):
        myfile = open(file_name, 'a+')
        myfile.write(unidecode(data.get_text())+'\n'+'\n')
        myfile.close()
    global lastURL
    lastURL = driver.current_url
    print('**********Chapter Copied!**********')
    NextChapter()
# end copy of chapter and add to a file

# start goto next chapter if exists then return to copy chapter else Close()
def NextChapter():
    bLink = soup.find(id = "next_chap")
    cLink = 'Next Chapter'
    link = driver.find_element_by_link_text(cLink)
    link.click()
    global currentURL
    currentURL = driver.current_url
    if currentURL != lastURL:
        CopyChapter()
    else:
        print('Finished!!!')
        Close()
# end goto next chapter if exists then return to copy chapter else Close()

CopyChapter()
#EOF

#! python3
import os
import requests
import bs4 as BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.chrome.options import Options
from unidecode import unidecode

CHROMEDRIVER_PATH = 'C:\Program Files\Python39\chromedriver.exe'

def Close():
    driver.stop_client()
    driver.close()
    driver.quit()

global NovelName
NovelName = ['']
global DIR
global baseDIR
baseDIR = "C:/Users/james/Documents/Novels"
    
while NovelName:
    NN = NovelName.pop(-1)
    NNx = NN.replace('.html', '').replace('-', ' ').upper()
    DIR = '%(B)s/%(N)s' % {'B': baseDIR, "N": NNx}
    os.mkdir(DIR)

    BaseURL = 'https://novelfull.com'
    url = '%(U)s/%(N)s' % {'U': BaseURL, "N": NN}
    options = Options()
    options.add_argument("--headless")
    driver = webdriver.Chrome(CHROMEDRIVER_PATH, options=options)
    driver.get(url)
    print(url)
    global currentURL
    currentURL = driver.current_url
    global lastURL
    lastURL = ''
    
    soupx = BeautifulSoup.BeautifulSoup(driver.page_source, 'html.parser')
    ChapterList = soupx.find(id='list-chapter')
    CL = []
    for i in ChapterList.find_all("li"):
        CL.append(i)
    NovelChapter1Raw = CL[0]
    xx=[]
    for i in NovelChapter1Raw.find_all("a"):
        for x in i.find_all("span"):
            xx.append(x)
            ChapterTextX = ' '.join(map(str, xx))
    ChapterText = ChapterTextX.replace('<span class="chapter-text">','').replace('</span>','')
    BaseURL = 'https://novelfull.com'
    link = driver.find_element_by_link_text(ChapterText)
    url = '%(U)s/%(N)s' % {'U': BaseURL, "N": link}
    link.click()
    currentURL = driver.current_url

    while currentURL != lastURL:
        global soup
        soup = BeautifulSoup.BeautifulSoup(driver.page_source, 'html.parser')
        readables = soup.find(id='chapter-content')
        name = driver.title
        filename = name.replace('<',' ').replace('"',' ').replace('>',' ').replace('/',' ').replace("|",' ').replace("?",' ').replace("*",' ').replace(":", ' -').replace('Read ',"").replace(' online free from your Mobile, Table, PC... Novel Updates Daily ',"").replace(' online free - Novel Full',"")
        file_name = (filename + '.txt')
        print(file_name)
        data = ''
        for data in soup.find_all("p"):
            myfile = open(DIR +'/'+ file_name, 'a+')
            myfile.write(unidecode(data.get_text())+'\n'+'\n')
            myfile.close()
        lastURL = driver.current_url
        print('**********Chapter Copied!**********')
        bLink = soup.find(id = "next_chap")
        cLink = 'Next Chapter'
        link = driver.find_element_by_link_text(cLink)
        link.click()
        currentURL = driver.current_url
        
    print('Finished!!!')
    Close()
print('Finished!!!')
Close() #<- throws a bunch of errors but makes sure everything closes.

#EOF