Python 无法在包含旧数据的现有csv文件中追加新结果*first*_Python_Python 3.x_Csv_Web Scraping

Python 无法在包含旧数据的现有csv文件中追加新结果*first*

python python-3.x csv web-scraping

Python 无法在包含旧数据的现有csv文件中追加新结果*first*,python,python-3.x,csv,web-scraping,Python,Python 3.x,Csv,Web Scraping,我已经用python编写了一个脚本，它能够从网页中获取不同帖子的标题并将其写入csv文件。由于网站经常更新其内容，我喜欢首先将新结果附加到csv文件中，其中已有可用的旧标题列表我试过： import csv import time import requests from bs4 import BeautifulSoup url = "https://stackoverflow.com/questions/tagged/python" def get_information(url):

我已经用python编写了一个脚本，它能够从网页中获取不同帖子的标题并将其写入csv文件。由于网站经常更新其内容，我喜欢首先将新结果附加到csv文件中，其中已有可用的旧标题列表

我试过：

import csv
import time
import requests
from bs4 import BeautifulSoup

url = "https://stackoverflow.com/questions/tagged/python"

def get_information(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'lxml')
    for title in soup.select(".summary .question-hyperlink"):
        yield title.text

if __name__ == '__main__':
    while True:
        with open("output.csv","a",newline="") as f:
            writer = csv.writer(f)
            writer.writerow(['posts'])
            for items in get_information(url):
                writer.writerow([items])
                print(items)

        time.sleep(300)

上面的脚本在运行两次时可以在旧结果之后追加新结果

旧数据如下：

A
F
G
T

新数据包括

，

重新运行脚本时，csv文件应如下所示：

W
Q
U
A
F
G
T

如何首先将新结果附加到包含旧数据的现有csv文件中？

由于要更改表中每个元素的位置，需要将表读入内存并重写整个文件，从新元素开始

您可能会发现：（1）将新元素写入新文件，（2）打开旧文件并将其内容附加到新文件，以及（3）将新文件移动到原始（旧）文件名更容易。

在文件中除末尾以外的任何位置插入数据都需要重写整个过程。要做到这一点而不首先将其全部内容读入内存，您可以创建一个包含新数据的临时csv文件，将现有文件中的数据附加到该文件中，删除旧文件并重命名新文件

下面是我的意思的示例（使用一个伪get_information（）函数简化测试）

从你的评论到我的回答，似乎你对你的问题有了更新。插入时使用的代码是什么？如果我让脚本运行5分钟sleep@Prune，那么会有很多新的csv文件。你会有多少新文件？一次最多只能存在两个文件。请看答案：最后一步是将新文件移回旧文件名。不再有任何“第二个csv文件”。正如所建议的，如果数据太大，您必须将旧数据存储在内存中，或者使用临时的第二个文件。另一种方法是在文件末尾正常写入数据，然后在读取数据时查找最后一个条目开始的位置。但在这种情况下，这并不容易或可靠，因为每个条目都有任意大小，因此您必须猜测其大小，在文件末尾查找它，然后读取并查找，直到找到标记。我知道这不是你想要的，但你必须按照答案中的建议去做。你的建议似乎奏效了，除了重命名这个东西。检查脚本在用旧脚本重命名新脚本时抛出的错误<代码>回溯（上次调用）：文件“C:\Users\WCS\Desktop\demo File\demo\u script.py”，第36行，在os.rename（“new\u output.csv”，“output.csv”）文件existerror:[WinError 183]已经存在时无法创建文件：“new\u output.csv”->“output.csv”您未能发布失败的代码。您是如何尝试用新文件替换旧文件的？你需要用你的操作系统来解决这个问题。看起来您试图从Python

os

命令执行此操作，只需

rename

ing而不首先删除原始文件。是的--请查看

rename

的文档。看

martieau

答案的最后两行。另外，请注意，作为堆栈溢出问题的一部分，场外链接是不可接受的。

import csv
import os
from tempfile import NamedTemporaryFile

url = 'https://stackoverflow.com/questions/tagged/python'
csv_filepath = 'updated.csv'

# For testing, create a existing file.
if not os.path.exists(csv_filepath):
    with open(csv_filepath, 'w', newline='') as f:
        writer = csv.writer(f)
        writer.writerows([item] for item in 'AFGT')

# Dummy for testing.
def get_information(url):
    for item in 'WQU':
        yield item


if __name__ == '__main__':
    folder = os.path.abspath(os.path.dirname(csv_filepath))  # Get dir of existing file.

    with NamedTemporaryFile(mode='w', newline='', suffix='.csv',
                            dir=folder, delete=False) as newf:
        temp_filename = newf.name  # Save filename.
        # Put new data into the temporary file.
        writer = csv.writer(newf)
        for item in get_information(url):
            writer.writerow([item])
            print([item])

        # Append contents of existing file to new one.
        with open(csv_filepath, 'r', newline='') as oldf:
            reader = csv.reader(oldf)
            for row in reader:
                writer.writerow(row)
                print(row)

    os.remove(csv_filepath)  # Delete old file.
    os.rename(temp_filename, csv_filepath)  # Rename temporary file.