Python 导出连续数据的JSON文件_Python_Python 3.x

Python 导出连续数据的JSON文件

python python-3.x

Python 导出连续数据的JSON文件,python,python-3.x,Python,Python 3.x,我写了一个网页抓取脚本，它成功地抓取了数据。唯一的问题是将数据导出到JSON文件 def scrape_post_info(url): content = get_page_content(url) title, description, post_url = get_post_details(content, url) job_dict = {} job_dict['title'] = title job_dict['Description'] = de

我写了一个网页抓取脚本，它成功地抓取了数据。唯一的问题是将数据导出到JSON文件

def scrape_post_info(url):
    content = get_page_content(url)
    title, description, post_url = get_post_details(content, url)
    job_dict = {}
    job_dict['title'] = title
    job_dict['Description'] = description
    job_dict['url'] = post_url

    #here json machanism
    json_job = json.dumps(job_dict)
    with open('data.json', 'r+') as f:
        f.write("[")
        f.seek(0)
        f.write(json_job)
        txt = f.readline()
        if txt.endswith("}"):
            f.write(",")

def crawl_web(url):
    while True:
        post_url = get_post_url(url)
        for urls in post_url:
            urls = urls
            scrape_post_info(urls)

# Execute the main fuction 'crawl_web'
if __name__ == '__main__':
    crawl_web('www.examp....com')

数据导出为JSON，但不是JSON的正确格式。我希望数据应该如下所示：

[
{
    "title": "this is title",
    "Description": " Fendi is an Italian luxury labelarin. ",
    "url": "https:/~"
},

{
    "title": " - Furrocious Elegant Style", 
    "Description": " the Italian luxare vast. ", 
    "url": "https://www.s"
},

{
    "title": "Rome, Fountains and Fendi Sunglasses",
    "Description": " Fendi started off as a store. ",
    "url": "https://www.~"
},

{
    "title": "Tipsnglasses",
    "Description": "Whether irregular orn season.", 
    "url": "https://www.sooic"
},

]

我怎样才能做到这一点呢？

那么：

def scrape_post_info(url):
    content = get_page_content(url)
    title, description, post_url = get_post_details(content, url)
    return {"title": title, "Description": description, "url": post_url}


def crawl_web(url):
    while True:
        jobs = []
        post_urls = get_post_url(url)
        for url in post_urls:
            jobs.append(scrape_post_info(url))
            with open("data.json", "w") as f:
                json.dumps(jobs)


# Execute the main fuction 'crawl_web'
if __name__ == "__main__":
    crawl_web("www.examp....com")

请注意，这将在每次迭代“post_url”时重写整个文件，因此对于大文件和缓慢的I/O来说可能会变得非常缓慢

根据作业的运行时间和内存大小，您可能希望将文件写入移出for循环，并且只写入一次

注意：如果你真的想写JSON流，你可能想看看这个包：，但是我建议选择另一种格式，比如CSV，它更适合流式写入。

你的“JSON机制”没有任何意义。我建议：1。将现有文件内容读取到列表中（如果为空，则创建新文件）；2.将新内容附加到列表中；三,。在现有内容上写下整个列表。实际的JSON是什么样子的？似乎每个条目都打开了一个新数组。同样使用

f.seek（0）

和

f.write（json_作业）

覆盖上一个条目。我如何实现这一点？我试了很多。你能帮我解决这个问题吗？