Python csv到嵌套JSON?
我正在尝试将平面结构的CSV转换为嵌套的JSON结构。CSV由SQL生成,SQL为每个主id创建多行。CSV的结构如下:Python csv到嵌套JSON?,python,json,csv,Python,Json,Csv,我正在尝试将平面结构的CSV转换为嵌套的JSON结构。CSV由SQL生成,SQL为每个主id创建多行。CSV的结构如下: PrimaryId,FirstName,LastName,City,CarName,DogName 100,约翰,史密斯,纽约,丰田,斯派克 100,约翰,史密斯,纽约,宝马,斯派克 100,约翰,史密斯,纽约,丰田,拉斯蒂 100,约翰,史密斯,纽约,宝马,拉斯蒂 101,本,斯旺,悉尼,大众,巴迪 101,本,斯旺,悉尼,福特,伙计 101,本,斯旺,悉尼,奥迪,巴迪 1
PrimaryId,FirstName,LastName,City,CarName,DogName
100,约翰,史密斯,纽约,丰田,斯派克
100,约翰,史密斯,纽约,宝马,斯派克
100,约翰,史密斯,纽约,丰田,拉斯蒂
100,约翰,史密斯,纽约,宝马,拉斯蒂
101,本,斯旺,悉尼,大众,巴迪
101,本,斯旺,悉尼,福特,伙计
101,本,斯旺,悉尼,奥迪,巴迪
101,本,斯旺,悉尼,大众,马克斯
101号,本,斯旺,悉尼,福特,马克斯
101,本,斯旺,悉尼,奥迪,马克斯
102,朱莉娅,布朗,伦敦,米尼,露西
所需的JSON输出是:
{
"data": [
{
"City": "NewYork",
"FirstName": "John",
"PrimaryId": 100,
"LastName": "Smith",
"CarName": [
"Toyota",
"BMW"
],
"DogName": [
"Spike",
"Rusty"
]
},
{
"City": "Sydney",
"FirstName": "Ben",
"PrimaryId": 101,
"LastName": "Swan",
"CarName": [
"Volkswagen",
"Ford",
"Audi"
],
"DogName": [
"Buddy",
"Max"
]
},
{
"City": "London",
"FirstName": "Julia",
"PrimaryId": 102,
"LastName": "Brown",
"CarName": [
"Mini"
],
"DogName": [
"Lucy"
]
}
]
}
两者都有帮助,但我还没有创建正确的结构。您转换为有效csv的数据保存在
数据中。csv
:
PrimaryId,FirstName,LastName,City,CarName,DogName
100,John,Smith,NewYork,Toyota,Spike
100,John,Smith,NewYork,BMW,Spike
100,John,Smith,NewYork,Toyota,Rusty
100,John,Smith,NewYork,BMW,Rusty
101,Ben,Swan,Sydney,Volkswagen,Buddy
101,Ben,Swan,Sydney,Ford,Buddy
101,Ben,Swan,Sydney,Audi,Buddy
101,Ben,Swan,Sydney,Volkswagen,Max
101,Ben,Swan,Sydney,Ford,Max
101,Ben,Swan,Sydney,Audi,Max
102,Julia,Brown,London,Mini,Lucy
使用熊猫完成繁重的工作,并假设此csv文件有效,这是实现您想要的一种方式:
import json
import pandas as pd
df = pd.read_csv('data.csv')
def get_nested_rec(key, grp):
rec = {}
rec['PrimaryId'] = key[0]
rec['FirstName'] = key[1]
rec['LastName'] = key[2]
rec['City'] = key[3]
for field in ['CarName','DogName']:
rec[field] = list(grp[field].unique())
return rec
records = []
for key, grp in df.groupby(['PrimaryId','FirstName','LastName','City']):
rec = get_nested_rec(key, grp)
records.append(rec)
records = dict(data = records)
print(json.dumps(records, indent=4))
结果是:
{
"data": [
{
"City": "NewYork",
"FirstName": "John",
"PrimaryId": 100,
"LastName": "Smith",
"CarName": [
"Toyota",
"BMW"
],
"DogName": [
"Spike",
"Rusty"
]
},
{
"City": "Sydney",
"FirstName": "Ben",
"PrimaryId": 101,
"LastName": "Swan",
"CarName": [
"Volkswagen",
"Ford",
"Audi"
],
"DogName": [
"Buddy",
"Max"
]
},
{
"City": "London",
"FirstName": "Julia",
"PrimaryId": 102,
"LastName": "Brown",
"CarName": [
"Mini"
],
"DogName": [
"Lucy"
]
}
]
}
转换为有效csv的数据保存在
data.csv
:
PrimaryId,FirstName,LastName,City,CarName,DogName
100,John,Smith,NewYork,Toyota,Spike
100,John,Smith,NewYork,BMW,Spike
100,John,Smith,NewYork,Toyota,Rusty
100,John,Smith,NewYork,BMW,Rusty
101,Ben,Swan,Sydney,Volkswagen,Buddy
101,Ben,Swan,Sydney,Ford,Buddy
101,Ben,Swan,Sydney,Audi,Buddy
101,Ben,Swan,Sydney,Volkswagen,Max
101,Ben,Swan,Sydney,Ford,Max
101,Ben,Swan,Sydney,Audi,Max
102,Julia,Brown,London,Mini,Lucy
使用熊猫完成繁重的工作,并假设此csv文件有效,这是实现您想要的一种方式:
import json
import pandas as pd
df = pd.read_csv('data.csv')
def get_nested_rec(key, grp):
rec = {}
rec['PrimaryId'] = key[0]
rec['FirstName'] = key[1]
rec['LastName'] = key[2]
rec['City'] = key[3]
for field in ['CarName','DogName']:
rec[field] = list(grp[field].unique())
return rec
records = []
for key, grp in df.groupby(['PrimaryId','FirstName','LastName','City']):
rec = get_nested_rec(key, grp)
records.append(rec)
records = dict(data = records)
print(json.dumps(records, indent=4))
结果是:
{
"data": [
{
"City": "NewYork",
"FirstName": "John",
"PrimaryId": 100,
"LastName": "Smith",
"CarName": [
"Toyota",
"BMW"
],
"DogName": [
"Spike",
"Rusty"
]
},
{
"City": "Sydney",
"FirstName": "Ben",
"PrimaryId": 101,
"LastName": "Swan",
"CarName": [
"Volkswagen",
"Ford",
"Audi"
],
"DogName": [
"Buddy",
"Max"
]
},
{
"City": "London",
"FirstName": "Julia",
"PrimaryId": 102,
"LastName": "Brown",
"CarName": [
"Mini"
],
"DogName": [
"Lucy"
]
}
]
}
下面是使用
csv.DictReader
执行此操作的一般方法
从加载数据开始:
import csv
import itertools
with open('stuff.csv', 'rb') as csvfile:
all_ = list(csv.DictReader(csvfile))
现在,您可以使用itertools.groupby
对每个组进行分组和处理。比如说
d = []
for k, g in itertools.groupby(
all_,
key=lambda r: (r['PrimaryId'], r[' LastName'])):
d.append({
'PrimaryId': k[0],
'LastName': k[1],
'CarName': [e[' CarName'] for e in g]
})
将按主id和姓氏分组,并列出车辆列表
一旦你有了这样的东西,你就可以使用了。下面是使用csv.DictReader的一般方法 从加载数据开始:
import csv
import itertools
with open('stuff.csv', 'rb') as csvfile:
all_ = list(csv.DictReader(csvfile))
现在,您可以使用itertools.groupby
对每个组进行分组和处理。比如说
d = []
for k, g in itertools.groupby(
all_,
key=lambda r: (r['PrimaryId'], r[' LastName'])):
d.append({
'PrimaryId': k[0],
'LastName': k[1],
'CarName': [e[' CarName'] for e in g]
})
将按主id和姓氏分组,并列出车辆列表
一旦你有了这样的东西,你可以直接使用。请在这里发布你的代码,即你尝试了什么。此外,您的csv似乎有额外的空格,您的json也肯定不是json。请在此处发布您的代码,即您尝试了什么。此外,您的csv似乎有额外的空间,您的json也肯定不是json。