Python csv到json，其中包含需要分组的列数据_Python_Pandas

Python csv到json，其中包含需要分组的列数据

python pandas

Python csv到json，其中包含需要分组的列数据,python,pandas,Python,Pandas,我有一个类似于此格式的CSV文件 order_id, customer_name, item_1_id, item_1_quantity, Item_2_id, Item_2_quantity, Item_3_id, Item_3_quantity 1, John, 4, 1, 24, 4, 16, 1 2, Paul, 8,

我有一个类似于此格式的CSV文件

order_id, customer_name, item_1_id, item_1_quantity, Item_2_id, Item_2_quantity, Item_3_id, Item_3_quantity
1,        John,          4,         1,               24,        4,               16,        1
2,        Paul,          8,         3,               41,        1,               33,        1
3,        Andrew,        1,         1,               34,        4,               8,          2

我想导出到json，目前我正在这样做

df = pd.read_csv('simple.csv')
print ( df.to_json(orient = 'records') )

输出是

[
    {
        "Item_2_id": 24,
        "Item_2_quantity": 4,
        "Item_3_id": 16,
        "Item_3_quantity": 1,
        "customer_name": "John",
        "item_1_id": 4,
        "item_1_quantity": 1,
        "order_id": 1
    },
......

但是，我希望输出是

[
    {
        "customer_name": "John",
        "order_id": 1,
        "items": [
            { "id": 4, "quantity": 1 },
            { "id": 24, "quantity": 4 },
            { "id": 16, "quantity": 1 },
         ]
    },
......

有什么好的建议吗

在这个特定的项目中，每个订单不会超过5次

In [168]: df
Out[168]:
   order_id customer_name  item_1_id  item_1_quantity  Item_2_id  Item_2_quantity  Item_3_id  Item_3_quantity
0         1          John          4                1         24                4         16                1
1         2          Paul          8                3         41                1         33                1
2         3        Andrew          1                1         34                4          8                2

In [169]: %paste
import re

x = df[['order_id','customer_name']].copy()
x['id'] = \
    pd.Series(df.loc[:, df.columns.str.contains(r'item_.*?_id',
                                                flags=re.I)].values.tolist(),
              index=df.index)
x['quantity'] = \
    pd.Series(df.loc[:, df.columns.str.contains(r'item_.*?_quantity',
                                                flags=re.I)].values.tolist(),
              index=df.index)
x.to_json(orient='records')
## -- End pasted text --
Out[169]: '[{"order_id":1,"customer_name":"John","id":[4,24,16],"quantity":[1,4,1]},{"order_id":2,"customer_name":"Paul","id":[8,41,33],"qua
ntity":[3,1,1]},{"order_id":3,"customer_name":"Andrew","id":[1,34,8],"quantity":[1,4,2]}]'

In [82]: x
Out[82]:
   order_id customer_name           id   quantity
0         1          John  [4, 24, 16]  [1, 4, 1]
1         2          Paul  [8, 41, 33]  [3, 1, 1]
2         3        Andrew   [1, 34, 8]  [1, 4, 2]

解决方案：

In [168]: df
Out[168]:
   order_id customer_name  item_1_id  item_1_quantity  Item_2_id  Item_2_quantity  Item_3_id  Item_3_quantity
0         1          John          4                1         24                4         16                1
1         2          Paul          8                3         41                1         33                1
2         3        Andrew          1                1         34                4          8                2

In [169]: %paste
import re

x = df[['order_id','customer_name']].copy()
x['id'] = \
    pd.Series(df.loc[:, df.columns.str.contains(r'item_.*?_id',
                                                flags=re.I)].values.tolist(),
              index=df.index)
x['quantity'] = \
    pd.Series(df.loc[:, df.columns.str.contains(r'item_.*?_quantity',
                                                flags=re.I)].values.tolist(),
              index=df.index)
x.to_json(orient='records')
## -- End pasted text --
Out[169]: '[{"order_id":1,"customer_name":"John","id":[4,24,16],"quantity":[1,4,1]},{"order_id":2,"customer_name":"Paul","id":[8,41,33],"qua
ntity":[3,1,1]},{"order_id":3,"customer_name":"Andrew","id":[1,34,8],"quantity":[1,4,2]}]'

In [82]: x
Out[82]:
   order_id customer_name           id   quantity
0         1          John  [4, 24, 16]  [1, 4, 1]
1         2          Paul  [8, 41, 33]  [3, 1, 1]
2         3        Andrew   [1, 34, 8]  [1, 4, 2]

中级助手DF:

In [168]: df
Out[168]:
   order_id customer_name  item_1_id  item_1_quantity  Item_2_id  Item_2_quantity  Item_3_id  Item_3_quantity
0         1          John          4                1         24                4         16                1
1         2          Paul          8                3         41                1         33                1
2         3        Andrew          1                1         34                4          8                2

In [169]: %paste
import re

x = df[['order_id','customer_name']].copy()
x['id'] = \
    pd.Series(df.loc[:, df.columns.str.contains(r'item_.*?_id',
                                                flags=re.I)].values.tolist(),
              index=df.index)
x['quantity'] = \
    pd.Series(df.loc[:, df.columns.str.contains(r'item_.*?_quantity',
                                                flags=re.I)].values.tolist(),
              index=df.index)
x.to_json(orient='records')
## -- End pasted text --
Out[169]: '[{"order_id":1,"customer_name":"John","id":[4,24,16],"quantity":[1,4,1]},{"order_id":2,"customer_name":"Paul","id":[8,41,33],"qua
ntity":[3,1,1]},{"order_id":3,"customer_name":"Andrew","id":[1,34,8],"quantity":[1,4,2]}]'

In [82]: x
Out[82]:
   order_id customer_name           id   quantity
0         1          John  [4, 24, 16]  [1, 4, 1]
1         2          Paul  [8, 41, 33]  [3, 1, 1]
2         3        Andrew   [1, 34, 8]  [1, 4, 2]

请尝试以下操作：

import pandas as pd
import json

output_lst = []

##specify the first row as header
df = pd.read_csv('simple.csv', header=0)
##iterate through all the rows
for index, row in df.iterrows():
    dict = {}
    items_lst = []
    ## column_list is a list of column headers
    column_list = df.columns.values
    for i, col_name in enumerate(column_list):
        ## for the first 2 columns simply copy the value into the dictionary
        if i<2:
            element = row[col_name]
            if isinstance(element, str):
            ## strip if it is a string type value
                element = element.strip()
            dict[col_name] = element

        elif "_id" in col_name:
            ## i+1 is used assuming that the item_quantity comes right after  the corresponding item_id for each item
            item_dict  = {"id":row[col_name], "quantity":row[column_list[i+1]]}
            items_lst.append(item_dict)

    dict["items"] = items_lst
    output_lst.append(dict)

print json.dumps(output_lst)

令人高兴的结果：

In [122]: print(json.dumps(json.loads(j), indent=2))
[
  {
    "order_id": 1,
    "customer_name": "John",
    "id": [
      4,
      24,
      16
    ],
    "quantity": [
      1,
      4,
      1
    ]
  },
  {
    "order_id": 2,
    "customer_name": "Paul",
    "id": [
      8,
      41,
      33
    ],
    "quantity": [
      3,
      1,
      1
    ]
  },
  {
    "order_id": 3,
    "customer_name": "Andrew",
    "id": [
      1,
      34,
      8
    ],
    "quantity": [
      1,
      4,
      2
    ]
  }
]

我将此标记为正确答案，因为它完全按照我的预期生成json。感谢其他答案，它们看起来更紧凑、更高效，是如何使用熊猫的好例子。