Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 499-500次代码运行后的递归。。。为什么?谢谢你的帮助。新颖的网络刮擦_Python 3.x_Recursion_Web Scraping_Beautifulsoup_Selenium Chromedriver - Fatal编程技术网

Python 3.x 499-500次代码运行后的递归。。。为什么?谢谢你的帮助。新颖的网络刮擦

Python 3.x 499-500次代码运行后的递归。。。为什么?谢谢你的帮助。新颖的网络刮擦,python-3.x,recursion,web-scraping,beautifulsoup,selenium-chromedriver,Python 3.x,Recursion,Web Scraping,Beautifulsoup,Selenium Chromedriver,使用Chrome90和Python3.9。所有导入都已完全更新,因为我刚刚安装了它们 由于我有一个坏的ISP,我制作了这个脚本,将小说从互联网复制到文本文件,以便在我的互联网断开时离线观看。在递归错误出现之前,这个脚本基本上都可以工作,然后我必须手动进入并在设置之后更改章节。我对代码的预期结果是运行,直到小说完全复制(从第1章到第3章)到文本文件,无论有多少章 在我复制了499或500章之后,总是会出现递归错误。我不知道为什么它会这么低,也不知道它是如何得到这个错误的。我已经读到递归错误通常在9

使用Chrome90和Python3.9。所有导入都已完全更新,因为我刚刚安装了它们

由于我有一个坏的ISP,我制作了这个脚本,将小说从互联网复制到文本文件,以便在我的互联网断开时离线观看。在递归错误出现之前,这个脚本基本上都可以工作,然后我必须手动进入并在设置之后更改章节。我对代码的预期结果是运行,直到小说完全复制(从第1章到第3章)到文本文件,无论有多少章

在我复制了499或500章之后,总是会出现递归错误。我不知道为什么它会这么低,也不知道它是如何得到这个错误的。我已经读到递归错误通常在999次迭代之后出现

错误::(前两行重复了很长时间)

文件“C:\Users\james\Documents\novers\PEERLESS武术神灵\novel.py”,第42行,在CopyChapter中
下一章()
下一章第49行的文件“C:\Users\james\Documents\novers\PEERLESS武神\novel.py”
link=驱动程序。通过链接文本查找元素(cLink)
文件“C:\Program Files\Python39\lib\site packages\selenium\webdriver\remote\webdriver.py”,第428行,按链接查找元素
返回self.find\u元素(by=by.LINK\u TEXT,value=LINK\u TEXT)
文件“C:\Program Files\Python39\lib\site packages\selenium\webdriver\remote\webdriver.py”,第976行,在find\u元素中
返回self.execute(Command.FIND_元素{
文件“C:\Program Files\Python39\lib\site packages\selenium\webdriver\remote\webdriver.py”,执行中第319行
响应=self.command\u executor.execute(driver\u command,params)
文件“C:\Program Files\Python39\lib\site packages\selenium\webdriver\remote\remote\u connection.py”,执行中第374行
返回self.\u请求(命令信息[0],url,正文=数据)
文件“C:\Program Files\Python39\lib\site packages\selenium\webdriver\remote\remote\u connection.py”,第397行,在\u请求中
resp=self.\u conn.request(方法、url、body=body、headers=headers)
文件“C:\Program Files\Python39\lib\site packages\urllib3\request.py”,第78行,在请求中
返回self.request\u encode\u body(
文件“C:\Program Files\Python39\lib\site packages\urllib3\request.py”,第170行,在请求\u编码\u正文中
返回self.urlopen(方法,url,**额外\u kw)
文件“C:\Program Files\Python39\lib\site packages\urllib3\poolmanager.py”,第375行,在urlopen中
response=conn.urlopen(方法,u.request\u uri,**kw)
文件“C:\Program Files\Python39\lib\site packages\urllib3\connectionpool.py”,第699行,在urlopen中
httplib\u response=self.\u发出请求(
文件“C:\Program Files\Python39\lib\site packages\urllib3\connectionpool.py”,第445行,在请求中
六、从(e,无)中提高
文件“”,第3行,从
文件“C:\Program Files\Python39\lib\site packages\urllib3\connectionpool.py”,第440行,在请求中
httplib_response=conn.getresponse()
getresponse中的文件“C:\Program Files\Python39\lib\http\client.py”,第1347行
response.begin()
文件“C:\Program Files\Python39\lib\http\client.py”,第331行,在begin中
self.headers=self.msg=parse_headers(self.fp)
文件“C:\Program Files\Python39\lib\http\client.py”,第225行,在parse_头中
返回email.parser.parser(_class=_class).parsestr(hstring)
parsestr中的文件“C:\Program Files\Python39\lib\email\parser.py”,第67行
返回self.parse(StringIO(text),headersonly=headersonly)
文件“C:\Program Files\Python39\lib\email\parser.py”,第56行,在parse中
feedparser.feed(数据)
文件“C:\Program Files\Python39\lib\email\feedparser.py”,第176行,在提要中
self.\u调用\u parse()
文件“C:\Program Files\Python39\lib\email\feedparser.py”,第180行,在调用解析中
self._parse()
文件“C:\Program Files\Python39\lib\email\feedparser.py”,第295行,在_parsegen中
如果self.\u cur.get\u content\u maintype()=“消息”:
文件“C:\Program Files\Python39\lib\email\message.py”,第594行,在get\u content\u maintype中
ctype=self.get\u content\u type()
文件“C:\Program Files\Python39\lib\email\message.py”,第578行,get\U内容类型
value=self.get('content-type',缺少)
get中第471行的文件“C:\Program Files\Python39\lib\email\message.py”
返回self.policy.header\u fetch\u parse(k,v)
文件“C:\Program Files\Python39\lib\email\\u policybase.py”,第316行,在标题\u fetch\u parse中
返回self.\u清理\u头(名称、值)
文件“C:\Program Files\Python39\lib\email\\u policybase.py”,第287行,在\u sanitize\u标题中
如果_有_代理项(值):
文件“C:\Program Files\Python39\lib\email\utils.py”,第57行,在\u中有\u代理项
s、 编码()
RecursionError:调用Python对象时超出了最大递归深度
代码::

#! python3
import requests
import bs4 as BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.chrome.options import Options
from unidecode import unidecode

CHROMEDRIVER_PATH = 'C:\Program Files\Python39\chromedriver.exe'

NovelChapter = 'peerless-martial-god/chapter-1-spirit-awakening.html'
BaseURL = 'https://novelfull.com'
url = '%(U)s/%(N)s' % {'U': BaseURL, "N": NovelChapter}

options = Options()
options.add_argument("--headless") # Runs Chrome in headless mode.
driver = webdriver.Chrome(CHROMEDRIVER_PATH, options=options)
driver.get(url)

def Close():
    driver.stop_client()
    driver.close()
    driver.quit()

# start copy of chapter and add to a file
def CopyChapter():
    global soup
    soup = BeautifulSoup.BeautifulSoup(driver.page_source, 'html.parser')
    readables = soup.find(id='chapter-content')
    name = driver.title
    filename = name.replace('<',' ').replace('"',' ').replace('>',' ').replace('/',' ').replace("|",' ').replace("?",' ').replace("*",' ').replace(":", ' -').replace('Read ',"").replace(' online free from your Mobile, Table, PC... Novel Updates Daily ',"").replace(' online free - Novel Full',"")
    file_name = (filename + '.txt')
    print(file_name)
    data = ''
    for data in soup.find_all("p"):
        myfile = open(file_name, 'a+')
        myfile.write(unidecode(data.get_text())+'\n'+'\n')
        myfile.close()
    global lastURL
    lastURL = driver.current_url
    print('**********Chapter Copied!**********')
    NextChapter()
# end copy of chapter and add to a file

# start goto next chapter if exists then return to copy chapter else Close()
def NextChapter():
    bLink = soup.find(id = "next_chap")
    cLink = 'Next Chapter'
    link = driver.find_element_by_link_text(cLink)
    link.click()
    global currentURL
    currentURL = driver.current_url
    if currentURL != lastURL:
        CopyChapter()
    else:
        print('Finished!!!')
        Close()
# end goto next chapter if exists then return to copy chapter else Close()

CopyChapter()
#EOF
#!蟒蛇3
导入请求
将bs4作为BeautifulSoup导入
从selenium导入webdriver
从selenium.webdriver.support.wait导入WebDriverWait
从selenium.webdriver.chrome.options导入选项
从unidecode导入unidecode
CHROMEDRIVER\u路径='C:\Program Files\Python39\CHROMEDRIVER.exe'
NovelChapter='无与伦比的武神/chapter-1-spirit-awaking.html'
BaseURL=https://novelfull.com'
url='%(U)s/%(N)s'{'U':BaseURL,“N”:NovelChapter}
选项=选项()
选项。添加参数(“--headless”)#在headless模式下运行Chrome。
driver=webdriver.Chrome(CHROMEDRIVER\u路径,options=options)
获取驱动程序(url)
def Close():
driver.stop_client()
驱动程序关闭()
driver.quit()
#开始复制章节并添加到文件中
def CopyChapter():
全球汤
soup=beautifulsou.beautifulsou(driver.page_source'html.parser')
readables=soup.find(id='chapter-content')
name=driver.title

filename=name.replace(“看起来不像defs那么好,但它可以完美地满足我的需要。添加了一些东西,例如为文本文件创建文件夹,并从章节列表页面开始。可能有很多东西可以优化,但它可以工作,这就是我所需要的
#! python3
import requests
import bs4 as BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.chrome.options import Options
from unidecode import unidecode

CHROMEDRIVER_PATH = 'C:\Program Files\Python39\chromedriver.exe'

NovelChapter = 'peerless-martial-god/chapter-1-spirit-awakening.html'
BaseURL = 'https://novelfull.com'
url = '%(U)s/%(N)s' % {'U': BaseURL, "N": NovelChapter}

options = Options()
options.add_argument("--headless") # Runs Chrome in headless mode.
driver = webdriver.Chrome(CHROMEDRIVER_PATH, options=options)
driver.get(url)

def Close():
    driver.stop_client()
    driver.close()
    driver.quit()

# start copy of chapter and add to a file
def CopyChapter():
    global soup
    soup = BeautifulSoup.BeautifulSoup(driver.page_source, 'html.parser')
    readables = soup.find(id='chapter-content')
    name = driver.title
    filename = name.replace('<',' ').replace('"',' ').replace('>',' ').replace('/',' ').replace("|",' ').replace("?",' ').replace("*",' ').replace(":", ' -').replace('Read ',"").replace(' online free from your Mobile, Table, PC... Novel Updates Daily ',"").replace(' online free - Novel Full',"")
    file_name = (filename + '.txt')
    print(file_name)
    data = ''
    for data in soup.find_all("p"):
        myfile = open(file_name, 'a+')
        myfile.write(unidecode(data.get_text())+'\n'+'\n')
        myfile.close()
    global lastURL
    lastURL = driver.current_url
    print('**********Chapter Copied!**********')
    NextChapter()
# end copy of chapter and add to a file

# start goto next chapter if exists then return to copy chapter else Close()
def NextChapter():
    bLink = soup.find(id = "next_chap")
    cLink = 'Next Chapter'
    link = driver.find_element_by_link_text(cLink)
    link.click()
    global currentURL
    currentURL = driver.current_url
    if currentURL != lastURL:
        CopyChapter()
    else:
        print('Finished!!!')
        Close()
# end goto next chapter if exists then return to copy chapter else Close()

CopyChapter()
#EOF
#! python3
import os
import requests
import bs4 as BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.chrome.options import Options
from unidecode import unidecode

CHROMEDRIVER_PATH = 'C:\Program Files\Python39\chromedriver.exe'

def Close():
    driver.stop_client()
    driver.close()
    driver.quit()

global NovelName
NovelName = ['']
global DIR
global baseDIR
baseDIR = "C:/Users/james/Documents/Novels"
    
while NovelName:
    NN = NovelName.pop(-1)
    NNx = NN.replace('.html', '').replace('-', ' ').upper()
    DIR = '%(B)s/%(N)s' % {'B': baseDIR, "N": NNx}
    os.mkdir(DIR)

    BaseURL = 'https://novelfull.com'
    url = '%(U)s/%(N)s' % {'U': BaseURL, "N": NN}
    options = Options()
    options.add_argument("--headless")
    driver = webdriver.Chrome(CHROMEDRIVER_PATH, options=options)
    driver.get(url)
    print(url)
    global currentURL
    currentURL = driver.current_url
    global lastURL
    lastURL = ''
    
    soupx = BeautifulSoup.BeautifulSoup(driver.page_source, 'html.parser')
    ChapterList = soupx.find(id='list-chapter')
    CL = []
    for i in ChapterList.find_all("li"):
        CL.append(i)
    NovelChapter1Raw = CL[0]
    xx=[]
    for i in NovelChapter1Raw.find_all("a"):
        for x in i.find_all("span"):
            xx.append(x)
            ChapterTextX = ' '.join(map(str, xx))
    ChapterText = ChapterTextX.replace('<span class="chapter-text">','').replace('</span>','')
    BaseURL = 'https://novelfull.com'
    link = driver.find_element_by_link_text(ChapterText)
    url = '%(U)s/%(N)s' % {'U': BaseURL, "N": link}
    link.click()
    currentURL = driver.current_url

    while currentURL != lastURL:
        global soup
        soup = BeautifulSoup.BeautifulSoup(driver.page_source, 'html.parser')
        readables = soup.find(id='chapter-content')
        name = driver.title
        filename = name.replace('<',' ').replace('"',' ').replace('>',' ').replace('/',' ').replace("|",' ').replace("?",' ').replace("*",' ').replace(":", ' -').replace('Read ',"").replace(' online free from your Mobile, Table, PC... Novel Updates Daily ',"").replace(' online free - Novel Full',"")
        file_name = (filename + '.txt')
        print(file_name)
        data = ''
        for data in soup.find_all("p"):
            myfile = open(DIR +'/'+ file_name, 'a+')
            myfile.write(unidecode(data.get_text())+'\n'+'\n')
            myfile.close()
        lastURL = driver.current_url
        print('**********Chapter Copied!**********')
        bLink = soup.find(id = "next_chap")
        cLink = 'Next Chapter'
        link = driver.find_element_by_link_text(cLink)
        link.click()
        currentURL = driver.current_url
        
    print('Finished!!!')
    Close()
print('Finished!!!')
Close() #<- throws a bunch of errors but makes sure everything closes.

#EOF