使用Python/CSV将CSV转换为嵌套JSON

使用Python/CSV将CSV转换为嵌套JSON,python,json,csv,pandas,Python,Json,Csv,Pandas,我正在尝试将平面CSV转换为嵌套JSON格式。这是我的数据: # data.csv company_id,company_name,income_type,income_amt 1,"Foobar Inc","royalties",5000000 2,"ACME Corp","sales",3000000 2,"ACME Corp","rent",1000000 并且需要转换为以下JSON结构: {"data": [{ "company_id": 1,

我正在尝试将平面CSV转换为嵌套JSON格式。这是我的数据:

# data.csv
company_id,company_name,income_type,income_amt
1,"Foobar Inc","royalties",5000000
2,"ACME Corp","sales",3000000
2,"ACME Corp","rent",1000000
并且需要转换为以下JSON结构:

{"data": [{
            "company_id": 1,
            "name": "Foobar Inc",
            "income": ["royalties": 5000000]
        }, 
        {
            "company_id": 2,
            "company_name": "ACME Corp",
            "income": [
                "sales": 3000000,
                "rent": 1000000
            ]
        }]
}
但我当前的代码基于并使用Python和pandas库:

# script.py
import json
import pandas as pd

df = pd.read_csv('data.csv')

def get_nested_rec(key, grp):
rec = {}

    rec['company_id'] = key[0]
    rec['company_name'] = key[1]

    for field in ['income_type']:
        income_types = list(grp[field].unique())
        rec['income'] = income_types

    return rec

records = []

for key, grp in df.groupby(['company_id','company_name','income_type','income_amt']):
    rec = get_nested_rec(key, grp)
    records.append(rec)

records = dict(data = records)

print(json.dumps(records, indent=4))
输出此格式:

{"data": [
        {
            "company_id": 1,
            "company_name": "Foobar Inc", 
            "income": [
                "royalties"
            ]
        }, 
        {
            "company_id": 2,
            "company_name": "ACME Corp",
            "income": [
                "sales"
            ]
        }, 
        {
            "company_id": 2,
            "company_name": "ACME Corp",
            "income": [
                "rent"
            ]
        }
    ]}

在计算如何将具有相同公司id的行合并到单个对象中并添加收入金额值时遇到了麻烦。

您可以这样做:

for key, grp in df.groupby('company_id'):
    records.append({
        "company_id": key,
        "company_name": grp.company_name.iloc[0],
        "income": {
            row.income_type: row.income_amt for row in grp.itertuples()
        }})
这给了你:

[{'company_id': 1,
  'company_name': 'Foobar Inc',
  'income': {'royalties': 5000000}},
 {'company_id': 2,
  'company_name': 'ACME Corp',
  'income': {'rent': 1000000, 'sales': 3000000}}]