Python-将x行csv文件写入json文件
我有一个csv文件,我需要将其写入1000行的json文件。csv文件大约有9000行,因此理想情况下,我希望最后有9个连续数据的单独json文件 我知道如何将csv文件写入json—我一直在做:Python-将x行csv文件写入json文件,python,json,csv,Python,Json,Csv,我有一个csv文件,我需要将其写入1000行的json文件。csv文件大约有9000行,因此理想情况下,我希望最后有9个连续数据的单独json文件 我知道如何将csv文件写入json—我一直在做: csvfile = open("C:\\Users\Me\Desktop\data\data.csv", 'r', encoding="utf8") reader = csv.DictReader(csvfile, delimiter = ",") out = json.dumps( [ row f
csvfile = open("C:\\Users\Me\Desktop\data\data.csv", 'r', encoding="utf8")
reader = csv.DictReader(csvfile, delimiter = ",")
out = json.dumps( [ row for row in reader ] )
with open("C:\\Users\Me\Desktop\data\data.json", 'w') as f:
f.write(out)
这很有效。但是我需要json文件是9个分割文件。现在,我假设我会:
1) 尝试计数行并在达到1000时停止
2) 将csv文件写入单个json文件,然后打开json并尝试以某种方式拆分它
我对如何做到这一点非常迷茫——感谢您的帮助 这将读取文件
data.csv
一次,并通过data\u 9.json
创建id为data\u 1.json
的单独json文件,因为共有9000行
此外,只要data.csv
中的行数是1000的倍数,它将创建number\u of_rows/1000
文件,而无需更改代码
csvfile = open("C:\\Users\Me\Desktop\data\data.csv", 'rb', encoding="utf8")
reader = csv.DictReader(csvfile, delimiter = ",")
r = []
counter = 0
fileid = 1
for row in reader:
r.append( row )
counter += 1
if counter == 999:
out = json.dumps( r )
fname = "C:\\Users\Me\Desktop\data\data_"+ str(fileid) + ".json"
with open( fname, 'wb' ) as f:
f.write( out )
# resetting & updating variables
fileid += 1
counter = 0
r = []
out = None
这将读取一次文件
data.csv
,并通过data\u 9.json
创建id为data\u 1.json
的单独json文件,因为共有9000行
此外,只要data.csv
中的行数是1000的倍数,它将创建number\u of_rows/1000
文件,而无需更改代码
csvfile = open("C:\\Users\Me\Desktop\data\data.csv", 'rb', encoding="utf8")
reader = csv.DictReader(csvfile, delimiter = ",")
r = []
counter = 0
fileid = 1
for row in reader:
r.append( row )
counter += 1
if counter == 999:
out = json.dumps( r )
fname = "C:\\Users\Me\Desktop\data\data_"+ str(fileid) + ".json"
with open( fname, 'wb' ) as f:
f.write( out )
# resetting & updating variables
fileid += 1
counter = 0
r = []
out = None
将整个CSV文件读入一个列表或行,然后将长度为1000的片段写入JSON文件
import csv
import json
input_file = 'C:\\Users\\Me\\Desktop\\data\\data.csv'
output_file_template = 'C:\\Users\\Me\\Desktop\\data\\data_{}.json'
with open(input_file, 'r', encoding='utf8') as csvfile:
reader = csv.DictReader(csvfile, delimiter=',')
rows = list(reader)
for i in range(len(rows) // 1000):
out = json.dumps(rows[1000*i:1000*(i+1)])
with open(output_file_template.format(i), 'w') as f:
f.write(out)
将整个CSV文件读入一个列表或行,然后将长度为1000的片段写入JSON文件
import csv
import json
input_file = 'C:\\Users\\Me\\Desktop\\data\\data.csv'
output_file_template = 'C:\\Users\\Me\\Desktop\\data\\data_{}.json'
with open(input_file, 'r', encoding='utf8') as csvfile:
reader = csv.DictReader(csvfile, delimiter=',')
rows = list(reader)
for i in range(len(rows) // 1000):
out = json.dumps(rows[1000*i:1000*(i+1)])
with open(output_file_template.format(i), 'w') as f:
f.write(out)
您可以迭代(减少内存使用),而不是读取整个CSV文件 例如,下面是行的简单迭代:
with open(input_file, 'r', encoding='utf8') as csvfile:
reader = csv.DictReader(csvfile, delimiter=',')
for row in reader:
print(row)
在迭代过程中,您可以枚举行并使用此值计算1000行的组:
group_size = 1000
with open(input_file, 'r', encoding='utf8') as csvfile:
reader = csv.DictReader(csvfile, delimiter=',')
for index, row in enumerate(reader):
group_idx = index // group_size
print(group_idx, row)
你应该有这样的东西:
0 [row 0...]
0 [row 1...]
0 [row 2...]
...
0 [row 999...]
1 [row 1000...]
1 [row 1001...]
etc.
您可以使用将行按1000分组
使用Alberto Garcia Raboso的解决方案,您可以使用:
from __future__ import division
import csv
import json
import itertools
input_file = 'C:\\Users\\Me\\Desktop\\data\\data.csv'
output_file_template = 'C:\\Users\\Me\\Desktop\\data\\data_{}.json'
group_size = 1000
with open(input_file, 'r', encoding='utf8') as csvfile:
reader = csv.DictReader(csvfile, delimiter=',')
for key, group in itertools.groupby(enumerate(rows),
key=lambda item: item[0] // group_size):
grp_rows = [item[1] for item in group]
content = json.dumps(grp_rows)
with open(output_file_template.format(key), 'w') as jsonfile:
jsonfile.write(content)
例如,使用一些虚假数据:
from __future__ import division
import itertools
rows = [[1, 2], [3, 4], [5, 6], [7, 8],
[1, 2], [3, 4], [5, 6], [7, 8],
[1, 2], [3, 4], [5, 6], [7, 8],
[1, 2], [3, 4], [5, 6], [7, 8],
[1, 2], [3, 4], [5, 6], [7, 8]]
group_size = 4
for key, group in itertools.groupby(enumerate(rows),
key=lambda item: item[0] // group_size):
g_rows = [item[1] for item in group]
print(key, g_rows)
您将获得:
0 [[1, 2], [3, 4], [5, 6], [7, 8]]
1 [[1, 2], [3, 4], [5, 6], [7, 8]]
2 [[1, 2], [3, 4], [5, 6], [7, 8]]
3 [[1, 2], [3, 4], [5, 6], [7, 8]]
4 [[1, 2], [3, 4], [5, 6], [7, 8]]
您可以迭代(减少内存使用),而不是读取整个CSV文件 例如,下面是行的简单迭代:
with open(input_file, 'r', encoding='utf8') as csvfile:
reader = csv.DictReader(csvfile, delimiter=',')
for row in reader:
print(row)
在迭代过程中,您可以枚举行并使用此值计算1000行的组:
group_size = 1000
with open(input_file, 'r', encoding='utf8') as csvfile:
reader = csv.DictReader(csvfile, delimiter=',')
for index, row in enumerate(reader):
group_idx = index // group_size
print(group_idx, row)
你应该有这样的东西:
0 [row 0...]
0 [row 1...]
0 [row 2...]
...
0 [row 999...]
1 [row 1000...]
1 [row 1001...]
etc.
您可以使用将行按1000分组
使用Alberto Garcia Raboso的解决方案,您可以使用:
from __future__ import division
import csv
import json
import itertools
input_file = 'C:\\Users\\Me\\Desktop\\data\\data.csv'
output_file_template = 'C:\\Users\\Me\\Desktop\\data\\data_{}.json'
group_size = 1000
with open(input_file, 'r', encoding='utf8') as csvfile:
reader = csv.DictReader(csvfile, delimiter=',')
for key, group in itertools.groupby(enumerate(rows),
key=lambda item: item[0] // group_size):
grp_rows = [item[1] for item in group]
content = json.dumps(grp_rows)
with open(output_file_template.format(key), 'w') as jsonfile:
jsonfile.write(content)
例如,使用一些虚假数据:
from __future__ import division
import itertools
rows = [[1, 2], [3, 4], [5, 6], [7, 8],
[1, 2], [3, 4], [5, 6], [7, 8],
[1, 2], [3, 4], [5, 6], [7, 8],
[1, 2], [3, 4], [5, 6], [7, 8],
[1, 2], [3, 4], [5, 6], [7, 8]]
group_size = 4
for key, group in itertools.groupby(enumerate(rows),
key=lambda item: item[0] // group_size):
g_rows = [item[1] for item in group]
print(key, g_rows)
您将获得:
0 [[1, 2], [3, 4], [5, 6], [7, 8]]
1 [[1, 2], [3, 4], [5, 6], [7, 8]]
2 [[1, 2], [3, 4], [5, 6], [7, 8]]
3 [[1, 2], [3, 4], [5, 6], [7, 8]]
4 [[1, 2], [3, 4], [5, 6], [7, 8]]
没有理由使用Dictreader,普通的csv.reader就可以了。您也可以在reader对象上使用itertool.islice将数据切成
n
行,并将每个集合转储到一个新文件:
from itertools import islice, count
import csv
import json
with open("C:\\Users\Me\Desktop\data\data.csv") as f:
reader, cnt = csv.reader(f), count(1)
for rows in iter(lambda: list(islice(reader, 1000)), []):
with open("C:\\Users\Me\Desktop\data\data{}.json".format(next(cnt))) as out:
json.dump(rows, out)
没有理由使用Dictreader,普通的csv.reader就可以了。您也可以在reader对象上使用itertool.islice将数据切成
n
行,并将每个集合转储到一个新文件:
from itertools import islice, count
import csv
import json
with open("C:\\Users\Me\Desktop\data\data.csv") as f:
reader, cnt = csv.reader(f), count(1)
for rows in iter(lambda: list(islice(reader, 1000)), []):
with open("C:\\Users\Me\Desktop\data\data{}.json".format(next(cnt))) as out:
json.dump(rows, out)
奇妙地利用了
groupby
是懒惰的这一事实groupby
是一个迭代器,每个组也是一个迭代器(这就是为什么我使用理解列表将grp_行
变成一个列表)。奇妙地利用了groupby
是懒惰的这一事实groupby
是一个迭代器,每个组也是一个迭代器(这就是为什么我使用理解列表将grp_行
变成一个列表)。