使用Python将大型CSV文件转换为多个JSON文件_Python_Json_Csv

使用Python将大型CSV文件转换为多个JSON文件

python json csv

使用Python将大型CSV文件转换为多个JSON文件,python,json,csv,Python,Json,Csv,我目前正在使用以下代码将大型CSV文件转换为JSON文件 import csv import json def csv_to_json(csvFilePath, jsonFilePath): jsonArray = [] with open(csvFilePath, encoding='utf-8') as csvf: csvReader = csv.DictReader(csvf) for row in csvReade

我目前正在使用以下代码将大型CSV文件转换为JSON文件

import csv 
import json 

def csv_to_json(csvFilePath, jsonFilePath):
    jsonArray = []
      
    with open(csvFilePath, encoding='utf-8') as csvf: 
        csvReader = csv.DictReader(csvf) 

        for row in csvReader: 
            jsonArray.append(row)
    with open(jsonFilePath, 'w', encoding='utf-8') as jsonf: 
        jsonString = json.dumps(jsonArray, indent=4)
        jsonf.write(jsonString)
          
csvFilePath = r'test_data.csv'
jsonFilePath = r'test_data.json'
csv_to_json(csvFilePath, jsonFilePath)

这段代码运行良好，我能够将CSV转换为JSON，没有任何问题。但是，由于CSV文件包含600000多行，因此在我的JSON中包含的项目也同样多，因此管理JSON文件变得非常困难

import csv 
import json 

def csv_to_json(csvFilePath, jsonFilePath):
    jsonArray = []
      
    with open(csvFilePath, encoding='utf-8') as csvf: 
        csvReader = csv.DictReader(csvf) 

        for row in csvReader: 
            jsonArray.append(row)
    with open(jsonFilePath, 'w', encoding='utf-8') as jsonf: 
        jsonString = json.dumps(jsonArray, indent=4)
        jsonf.write(jsonString)
          
csvFilePath = r'test_data.csv'
jsonFilePath = r'test_data.json'
csv_to_json(csvFilePath, jsonFilePath)

我想修改我上面的代码，这样每5000行CSV，数据就会写入一个新的JSON文件。理想情况下，在本例中，我将拥有120（600000/5000）个JSON文件

我如何才能做到这一点？

将您的读写方法拆分，并添加一个简单的阈值：

JSON_ENTRIES_THRESHOLD=5000#根据您认为合适的内容进行修改
def write_json（json_数组，文件名）：
将open（filename'w'，encoding='utf-8'）作为jsonf:
dump（json_数组，jsonf）#注意.dump直接用于文件描述符
def csv_to_json（csvFilePath，jsonFilePath）：
jsonArray=[]
将open（csvFilePath，encoding='utf-8'）作为csvf：
csvReader=csv.DictReader（csvf）
文件名\索引=0
对于csvReader中的行：
jsonArray.append（行）
如果len（jsonArray）>=JSON\u条目\u阈值：
#如果我们到了树林，写下来
编写json（jsonArray，f“jsonFilePath-{filename\u index}.json”）
文件名\u索引+=1
jsonArray=[]
#最后，写出剩下的部分
编写json（jsonArray，f“jsonFilePath-{filename\u index}.json”）

拆分您的读写方法并添加一个简单的阈值：

JSON_ENTRIES_THRESHOLD=5000#根据您认为合适的内容进行修改
def write_json（json_数组，文件名）：
将open（filename'w'，encoding='utf-8'）作为jsonf:
dump（json_数组，jsonf）#注意.dump直接用于文件描述符
def csv_to_json（csvFilePath，jsonFilePath）：
jsonArray=[]
将open（csvFilePath，encoding='utf-8'）作为csvf：
csvReader=csv.DictReader（csvf）
文件名\索引=0
对于csvReader中的行：
jsonArray.append（行）
如果len（jsonArray）>=JSON\u条目\u阈值：
#如果我们到了树林，写下来
编写json（jsonArray，f“jsonFilePath-{filename\u index}.json”）
文件名\u索引+=1
jsonArray=[]
#最后，写出剩下的部分
编写json（jsonArray，f“jsonFilePath-{filename\u index}.json”）

我喜欢这个答案，但为了可读性，我将把

write_json

函数放在if条件中，而不是像这样使用continueYupbetter@gionni，我还修复了覆盖问题-所有文件都用相同的名称记录。与答案一样，但为了可读性，我将把

write_json

函数放在if条件中，而不是使用continueYup，这看起来很简单better@gionni，我还修复了覆盖问题-所有文件都用相同的名称录制。