Python 2.7 如何以JSON文件格式保存Python web scraper输出？_Python 2.7_Web Scraping_Beautifulsoup_Scrapy_Scrapy Spider

Python 2.7 如何以JSON文件格式保存Python web scraper输出？

python-2.7 web-scraping scrapy

Python 2.7 如何以JSON文件格式保存Python web scraper输出？,python-2.7,web-scraping,beautifulsoup,scrapy,scrapy-spider,Python 2.7,Web Scraping,Beautifulsoup,Scrapy,Scrapy Spider,我最近开始编写和学习Python，目前正在开发一个web刮板。我想从多个网站抓取数据，并将其保存为JSON文件格式。所以它目前只是打印出搜索结果。我希望该网站的刮数据保存在JSON文件。我正在编写这段代码，但得到一个错误“IsnotJSON serializable”。它没有写入文件名文件。在Mac终端上使用Python 2.7.14。下面是Scraper.py文件 from bs4 import BeautifulSoup import requests import pprint impor

我最近开始编写和学习Python，目前正在开发一个web刮板。我想从多个网站抓取数据，并将其保存为JSON文件格式。所以它目前只是打印出搜索结果。我希望该网站的刮数据保存在JSON文件。我正在编写这段代码，但得到一个错误“IsnotJSON serializable”。它没有写入文件名文件。在Mac终端上使用Python 2.7.14。下面是Scraper.py文件

from bs4 import BeautifulSoup
import requests
import pprint
import re
import pyperclip
import json

urls = ['http://www.ctex.cn', 'http://www.ss-gate.org/']
#scrape elements
for url in urls:
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")
    #open the file "filename" in write ("w") mode
    file = open("filename", "w")
    json_data = json.dumps(my_list,file)
    #json.dump(soup, file)
    file.close()

我也在使用不同的代码，但它仍然没有写入文件名文件。错误“不可JSON序列化”。下面是Scraper2.py文件

from bs4 import BeautifulSoup
import requests
import pprint
import re
import pyperclip

urls = ['http://www.ctex.cn', 'http://www.ss-gate.org/']
#scrape elements
for url in urls:
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")
    #print(soup)

import json
# open the file "filename" in write ("w") mode
file = open("filename", "w")
#output = soup
# dumps "output" encoded in the JSON format into "filename"
json.dump(soup, file)
file.close()

合乎逻辑你的问题有点模棱两可
因为我不确定您是否要执行请求或解析器？
最好不要把他们弄糊涂

在技术上 html格式不完全适合json
我建议用两种方法来解决这个问题

将每个文本另存为html文件您可以将

response.text

（而不是

response.content

）保存到html文件
像这样

for url in urls:
    url = A_URL
    res = requests.get(url)
    html_file = open('FILENAME.html','w')
    html_file.write(res.text)
    html_file.close()

或

将多个结果保存到单个json文件然后编写另一个程序来解析它们

加油

您的

请求.get（url）.content

返回HTML文档。你到底想把什么保存为JSON？如果你想用Python做网页抓取，你可能想看看@Andersson我想保存网站上的内容/信息，这些有趣的内容在搜索和推荐引擎中很有用。

out_list = []
for url in urls:
    res = requests.get(url)
    out_list.append(res.text)
json_file = open('out.json','w')
json.dump(out_list,json_file)
json_file.close()