Python（Selenium）脚本覆盖文件_Python_Selenium

Python（Selenium）脚本覆盖文件

python selenium

Python（Selenium）脚本覆盖文件,python,selenium,Python,Selenium,使用下面的代码，它在提取方面是“有效的”，输出覆盖了主html输出“文件”中的每一个新页面。我不熟悉这一点，并且确信这是一个愚蠢的编码错误，但我只是没有看到它换句话说，它正在处理页面并提取信息，但每次完成一个页面时，它都会覆盖html中已经存在的内容，因此在任何给定的时间，我只有p。2或p。16等。我需要它要么继续添加到页面，要么为每个页面创建一个html文件（我认为后者是首选？）任何帮助都将不胜感激这只是一个更大的脚本的一部分，但在运行整个脚本之前，我要确保每个部分都正常工作谢谢你的时

使用下面的代码，它在提取方面是“有效的”，输出覆盖了主html输出“文件”中的每一个新页面。我不熟悉这一点，并且确信这是一个愚蠢的编码错误，但我只是没有看到它

换句话说，它正在处理页面并提取信息，但每次完成一个页面时，它都会覆盖html中已经存在的内容，因此在任何给定的时间，我只有p。2或p。16等。我需要它要么继续添加到页面，要么为每个页面创建一个html文件（我认为后者是首选？）

任何帮助都将不胜感激

这只是一个更大的脚本的一部分，但在运行整个脚本之前，我要确保每个部分都正常工作

谢谢你的时间

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from time import sleep
import os

allpages=[]
for i in range(2,1575): *** the main page is a different url so starting on p. 2
    allpages.append("url here"+str(i))

completedlist=[]

for eachpage in allpages[0:2]: *** just testing; will change to :1575
#options = Options()
options.headless = True
driver = webdriver.Chrome(options=options, executable_path='mypath')
driver.get(eachpage)
print ('Headless Chrome Initialized: '+eachpage)

with open("./capture/filenamehere"+str(i)+".html", "w") as f:
    f.write(driver.page_source)

completedlist.append(eachpage)

您正在以写入模式打开文件，因此每次输出都会被覆盖。将“w”在“打开”中更改为“a”，这意味着附加模式，现在您的文件将不会被覆盖，新内容将被附加到末尾。

非常感谢！我知道我错过了一些愚蠢的东西。你成就了我的一天。我做到了，但显然没有表现出来，因为我还没有足够的声望点。很抱歉。我将努力获得一些代表点，这样它就会出现。