Python-将x行csv文件写入json文件

Python-将x行csv文件写入json文件,python,json,csv,Python,Json,Csv,我有一个csv文件,我需要将其写入1000行的json文件。csv文件大约有9000行,因此理想情况下,我希望最后有9个连续数据的单独json文件 我知道如何将csv文件写入json—我一直在做: csvfile = open("C:\\Users\Me\Desktop\data\data.csv", 'r', encoding="utf8") reader = csv.DictReader(csvfile, delimiter = ",") out = json.dumps( [ row f

我有一个csv文件,我需要将其写入1000行的json文件。csv文件大约有9000行,因此理想情况下,我希望最后有9个连续数据的单独json文件

我知道如何将csv文件写入json—我一直在做:

csvfile = open("C:\\Users\Me\Desktop\data\data.csv", 'r', encoding="utf8")

reader = csv.DictReader(csvfile, delimiter = ",")
out = json.dumps( [ row for row in reader ] )

with open("C:\\Users\Me\Desktop\data\data.json", 'w') as f:
f.write(out)
这很有效。但是我需要json文件是9个分割文件。现在,我假设我会:

1) 尝试计数行并在达到1000时停止

2) 将csv文件写入单个json文件,然后打开json并尝试以某种方式拆分它


我对如何做到这一点非常迷茫——感谢您的帮助

这将读取文件
data.csv
一次,并通过
data\u 9.json
创建id为
data\u 1.json
的单独json文件,因为共有9000行

此外,只要
data.csv
中的行数是1000的倍数,它将创建
number\u of_rows/1000
文件,而无需更改代码

csvfile = open("C:\\Users\Me\Desktop\data\data.csv", 'rb', encoding="utf8")

reader = csv.DictReader(csvfile, delimiter = ",")

r = []
counter = 0
fileid = 1

for row in reader:
    r.append( row )
    counter += 1
    if counter == 999:
        out = json.dumps( r )
        fname = "C:\\Users\Me\Desktop\data\data_"+ str(fileid) + ".json"
        with open( fname, 'wb' ) as f:
            f.write( out )

        # resetting & updating variables
        fileid += 1
        counter = 0
        r = []
        out = None

这将读取一次文件
data.csv
,并通过
data\u 9.json
创建id为
data\u 1.json
的单独json文件,因为共有9000行

此外,只要
data.csv
中的行数是1000的倍数,它将创建
number\u of_rows/1000
文件,而无需更改代码

csvfile = open("C:\\Users\Me\Desktop\data\data.csv", 'rb', encoding="utf8")

reader = csv.DictReader(csvfile, delimiter = ",")

r = []
counter = 0
fileid = 1

for row in reader:
    r.append( row )
    counter += 1
    if counter == 999:
        out = json.dumps( r )
        fname = "C:\\Users\Me\Desktop\data\data_"+ str(fileid) + ".json"
        with open( fname, 'wb' ) as f:
            f.write( out )

        # resetting & updating variables
        fileid += 1
        counter = 0
        r = []
        out = None

将整个CSV文件读入一个列表或行,然后将长度为1000的片段写入JSON文件

import csv
import json

input_file = 'C:\\Users\\Me\\Desktop\\data\\data.csv'
output_file_template = 'C:\\Users\\Me\\Desktop\\data\\data_{}.json'

with open(input_file, 'r', encoding='utf8') as csvfile:
    reader = csv.DictReader(csvfile, delimiter=',')
    rows = list(reader)

for i in range(len(rows) // 1000):
    out = json.dumps(rows[1000*i:1000*(i+1)])
    with open(output_file_template.format(i), 'w') as f:
        f.write(out)

将整个CSV文件读入一个列表或行,然后将长度为1000的片段写入JSON文件

import csv
import json

input_file = 'C:\\Users\\Me\\Desktop\\data\\data.csv'
output_file_template = 'C:\\Users\\Me\\Desktop\\data\\data_{}.json'

with open(input_file, 'r', encoding='utf8') as csvfile:
    reader = csv.DictReader(csvfile, delimiter=',')
    rows = list(reader)

for i in range(len(rows) // 1000):
    out = json.dumps(rows[1000*i:1000*(i+1)])
    with open(output_file_template.format(i), 'w') as f:
        f.write(out)

您可以迭代(减少内存使用),而不是读取整个CSV文件

例如,下面是行的简单迭代:

with open(input_file, 'r', encoding='utf8') as csvfile:
    reader = csv.DictReader(csvfile, delimiter=',')
    for row in reader:
        print(row)
在迭代过程中,您可以枚举行并使用此值计算1000行的组:

group_size = 1000

with open(input_file, 'r', encoding='utf8') as csvfile:
    reader = csv.DictReader(csvfile, delimiter=',')
    for index, row in enumerate(reader):
        group_idx = index // group_size
        print(group_idx, row)
你应该有这样的东西:

0 [row 0...]
0 [row 1...]
0 [row 2...]
...
0 [row 999...]
1 [row 1000...]
1 [row 1001...]
etc.
您可以使用将行按1000分组

使用Alberto Garcia Raboso的解决方案,您可以使用:

from __future__ import division

import csv
import json
import itertools

input_file = 'C:\\Users\\Me\\Desktop\\data\\data.csv'
output_file_template = 'C:\\Users\\Me\\Desktop\\data\\data_{}.json'

group_size = 1000

with open(input_file, 'r', encoding='utf8') as csvfile:
    reader = csv.DictReader(csvfile, delimiter=',')
    for key, group in itertools.groupby(enumerate(rows),
                                        key=lambda item: item[0] // group_size):
       grp_rows = [item[1] for item in group]
       content = json.dumps(grp_rows)
       with open(output_file_template.format(key), 'w') as jsonfile:
           jsonfile.write(content)
例如,使用一些虚假数据:

from __future__ import division
import itertools

rows = [[1, 2], [3, 4], [5, 6], [7, 8],
        [1, 2], [3, 4], [5, 6], [7, 8],
        [1, 2], [3, 4], [5, 6], [7, 8],
        [1, 2], [3, 4], [5, 6], [7, 8],
        [1, 2], [3, 4], [5, 6], [7, 8]]

group_size = 4
for key, group in itertools.groupby(enumerate(rows),
                                    key=lambda item: item[0] // group_size):
    g_rows = [item[1] for item in group]
    print(key, g_rows)
您将获得:

0 [[1, 2], [3, 4], [5, 6], [7, 8]]
1 [[1, 2], [3, 4], [5, 6], [7, 8]]
2 [[1, 2], [3, 4], [5, 6], [7, 8]]
3 [[1, 2], [3, 4], [5, 6], [7, 8]]
4 [[1, 2], [3, 4], [5, 6], [7, 8]]

您可以迭代(减少内存使用),而不是读取整个CSV文件

例如,下面是行的简单迭代:

with open(input_file, 'r', encoding='utf8') as csvfile:
    reader = csv.DictReader(csvfile, delimiter=',')
    for row in reader:
        print(row)
在迭代过程中,您可以枚举行并使用此值计算1000行的组:

group_size = 1000

with open(input_file, 'r', encoding='utf8') as csvfile:
    reader = csv.DictReader(csvfile, delimiter=',')
    for index, row in enumerate(reader):
        group_idx = index // group_size
        print(group_idx, row)
你应该有这样的东西:

0 [row 0...]
0 [row 1...]
0 [row 2...]
...
0 [row 999...]
1 [row 1000...]
1 [row 1001...]
etc.
您可以使用将行按1000分组

使用Alberto Garcia Raboso的解决方案,您可以使用:

from __future__ import division

import csv
import json
import itertools

input_file = 'C:\\Users\\Me\\Desktop\\data\\data.csv'
output_file_template = 'C:\\Users\\Me\\Desktop\\data\\data_{}.json'

group_size = 1000

with open(input_file, 'r', encoding='utf8') as csvfile:
    reader = csv.DictReader(csvfile, delimiter=',')
    for key, group in itertools.groupby(enumerate(rows),
                                        key=lambda item: item[0] // group_size):
       grp_rows = [item[1] for item in group]
       content = json.dumps(grp_rows)
       with open(output_file_template.format(key), 'w') as jsonfile:
           jsonfile.write(content)
例如,使用一些虚假数据:

from __future__ import division
import itertools

rows = [[1, 2], [3, 4], [5, 6], [7, 8],
        [1, 2], [3, 4], [5, 6], [7, 8],
        [1, 2], [3, 4], [5, 6], [7, 8],
        [1, 2], [3, 4], [5, 6], [7, 8],
        [1, 2], [3, 4], [5, 6], [7, 8]]

group_size = 4
for key, group in itertools.groupby(enumerate(rows),
                                    key=lambda item: item[0] // group_size):
    g_rows = [item[1] for item in group]
    print(key, g_rows)
您将获得:

0 [[1, 2], [3, 4], [5, 6], [7, 8]]
1 [[1, 2], [3, 4], [5, 6], [7, 8]]
2 [[1, 2], [3, 4], [5, 6], [7, 8]]
3 [[1, 2], [3, 4], [5, 6], [7, 8]]
4 [[1, 2], [3, 4], [5, 6], [7, 8]]

没有理由使用Dictreader,普通的csv.reader就可以了。您也可以在reader对象上使用itertool.islice将数据切成
n
行,并将每个集合转储到一个新文件:

from itertools import islice, count
import csv
import json    

with open("C:\\Users\Me\Desktop\data\data.csv") as f:
    reader, cnt = csv.reader(f), count(1)
    for  rows in iter(lambda: list(islice(reader, 1000)), []):
        with open("C:\\Users\Me\Desktop\data\data{}.json".format(next(cnt))) as out:
        json.dump(rows, out)

没有理由使用Dictreader,普通的csv.reader就可以了。您也可以在reader对象上使用itertool.islice将数据切成
n
行,并将每个集合转储到一个新文件:

from itertools import islice, count
import csv
import json    

with open("C:\\Users\Me\Desktop\data\data.csv") as f:
    reader, cnt = csv.reader(f), count(1)
    for  rows in iter(lambda: list(islice(reader, 1000)), []):
        with open("C:\\Users\Me\Desktop\data\data{}.json".format(next(cnt))) as out:
        json.dump(rows, out)

奇妙地利用了
groupby
是懒惰的这一事实
groupby
是一个迭代器,每个组也是一个迭代器(这就是为什么我使用理解列表将
grp_行
变成一个列表)。奇妙地利用了
groupby
是懒惰的这一事实
groupby
是一个迭代器,每个组也是一个迭代器(这就是为什么我使用理解列表将
grp_行
变成一个列表)。