Python Dataframe对象转换为JSON并带有条件_Python_Arrays_Json_Pandas_Dataframe

Python Dataframe对象转换为JSON并带有条件

python arrays json pandas dataframe

Python Dataframe对象转换为JSON并带有条件,python,arrays,json,pandas,dataframe,Python,Arrays,Json,Pandas,Dataframe,之前我遵循了这个解决方案，但后来我意识到这与我的情况不一样，我想在JSON文件的display\u rows部分中显示与日期和ID相同的的一些值，我有这样一个数据框： as_of_date create_date ID value_1 count value_3 0 02/03/2021 02/03/2021 12345 5 2 55 1 02/03/2021 01/03/2021 12345 8 2

之前我遵循了这个解决方案，但后来我意识到这与我的情况不一样，我想在JSON文件的

display\u rows

部分中显示与日期和ID相同的

的一些值，我有这样一个数据框：
     as_of_date create_date   ID  value_1   count   value_3
0    02/03/2021 02/03/2021  12345   5         2      55
1    02/03/2021 01/03/2021  12345   8         2      55
2    02/03/2021 01/03/2021  34567   9         1      66
3    02/03/2021 02/03/2021  78945   9         1      77
4    03/03/2021 02/03/2021  78945   9         1      22
5    03/03/2021 02/03/2021  12345   5         1      33


其中，count
列是相同的ID
和作为
日期的行数，例如，对于作为
和ID=12345
，有两行（每行有不同的创建
日期，但我不关心创建
），因此前两行的count
是相同的：2
预期的JSON是：
{
    "examples": [
        {
            "Id": 12345,
            "as_of_date": "2021-03-02 00:00:00", # this field is datetime format
            "value_3": 55, 
            "count": 2,    # for the same 'ID=12345'&'as_of_date=02/03/2021'
            "display_rows": [
                {
                    "value_1": 5,
                    "type": "int" # 'type' field will always be 'int'
                },
                {
                    "value_1": 8,
                    "type": "int"
                }
            ]
        },
        {
            "Id": 34567,
            "as_of_date": "2021-03-02 00:00:00",
            "value_3": 66,
            "count": 1,
            "display_rows": [
                {
                    "value_1": 9,
                    "type": "int"
                }
            ]
        },
        {
            "Id": 78945,
            "as_of_date": "2021-03-02 00:00:00",
            "value_3": 77,
            "count": 1,
            "display_rows": [
                {
                    "value_1": 9,
                    "type": "int" 
                }
            ]
        },
        {
            "Id": 78945,
            "as_of_date": "2021-03-03 00:00:00",
            "value_3": 22,
            "count": 1,
            "display_rows": [
                {
                    "value_1": 9,
                    "type": "int" 
                }
            ]
        },
        {
            "Id": 12345,
            "as_of_date": "2021-03-03 00:00:00",
            "value_3": 33,
            "count": 1,
            "display_rows": [
                {
                    "value_1": 5,
                    "type": "int" 
                }
            ]
        }
    ]
}

我花了几乎一整天的时间才弄明白，但似乎不起作用。。。有人能帮忙吗？谢谢。
使用lambda函数处理值\u 1
列，如：
import json

df['as_of_date'] = pd.to_datetime(df['as_of_date'], dayfirst=True, errors='coerce')


f = lambda x: [ {"value_1": y, "type": "int" } for y in x]
df = (df.groupby(['as_of_date','ID','value_3','count'])['value_1']
        .apply(f)
        .reset_index(name='display_rows'))
print (df)
  as_of_date     ID  value_3  count  \
0 2021-03-02  12345       55      2   
1 2021-03-02  34567       66      1   
2 2021-03-02  78945       77      1   
3 2021-03-03  12345       33      1   
4 2021-03-03  78945       22      1   

                                        display_rows  
0  [{'value_1': 5, 'type': 'int'}, {'value_1': 8,...  
1                    [{'value_1': 9, 'type': 'int'}]  
2                    [{'value_1': 9, 'type': 'int'}]  
3                    [{'value_1': 5, 'type': 'int'}]  
4                    [{'value_1': 9, 'type': 'int'}]  

j = json.dumps({"examples":df.to_dict(orient='records')}, default=str)


编辑：
编辑：

您好，谢谢，只是一个快速的后续问题，如果我有多个列，如value\u 1
需要添加到display\u rows
？我试过：df=（df.groupby（['as_of u date'，'ID'，'value_3'，'count']）['value_1'，'value_7'，'value_8']].apply（lambda x:[{“value_1”：a，'value_7'：b，'value_8'：c，“type”：“int”}用于a，b，b，b，c在x中）。重置索引（name='display_rows'）
这给了我一个错误。apply（da x:[{ValueError:ValueError:太多的值无法解包）
@Cecilia-答案已编辑。谢谢，如果我将'type='int'
更改为其他字符串，如.assign，它似乎不起作用（示例_placeholder='xyz'
我知道这个问题，我应该使用记录
而不是记录，这是一个愚蠢的错误。@Cecilia-永远不要这样做，所以我不知道。也许可以尝试寻找一些解决方案或发布问题来解决这个问题。
print (j)
{"examples": [{"as_of_date": "2021-03-02 00:00:00", "ID": 12345, "value_3": 55, "count": 2, "display_rows": [{"value_1": 5, "type": "int"}, {"value_1": 8, "type": "int"}]}, {"as_of_date": "2021-03-02 00:00:00", "ID": 34567, "value_3": 66, "count": 1, "display_rows": [{"value_1": 9, "type": "int"}]}, {"as_of_date": "2021-03-02 00:00:00", "ID": 78945, "value_3": 77, "count": 1, "display_rows": [{"value_1": 9, "type": "int"}]}, {"as_of_date": "2021-03-03 00:00:00", "ID": 12345, "value_3": 33, "count": 1, "display_rows": [{"value_1": 5, "type": "int"}]}, {"as_of_date": "2021-03-03 00:00:00", "ID": 78945, "value_3": 22, "count": 1, "display_rows": [{"value_1": 9, "type": "int"}]}]}

#added some another column
df['value_7'] = 52
print (df)
   as_of_date create_date     ID  value_1  count  value_3  value_7
0  02/03/2021  02/03/2021  12345        5      2       55       52
1  02/03/2021  01/03/2021  12345        8      2       55       52
2  02/03/2021  01/03/2021  34567        9      1       66       52
3  02/03/2021  02/03/2021  78945        9      1       77       52
4  03/03/2021  02/03/2021  78945        9      1       22       52
5  03/03/2021  02/03/2021  12345        5      1       33       52

#added type column for last value in dict
df = (df.assign(type='int')
        .groupby(['as_of_date','ID','value_3','count'])[['value_1', 'value_7','type']]
        .apply(lambda x:  x.to_dict('records'))
        .reset_index(name='display_rows'))
print (df)
   as_of_date     ID  value_3  count  \
0  02/03/2021  12345       55      2   
1  02/03/2021  34567       66      1   
2  02/03/2021  78945       77      1   
3  03/03/2021  12345       33      1   
4  03/03/2021  78945       22      1   

                                        display_rows  
0  [{'value_1': 5, 'value_7': 52, 'type': 'int'},...  
1     [{'value_1': 9, 'value_7': 52, 'type': 'int'}]  
2     [{'value_1': 9, 'value_7': 52, 'type': 'int'}]  
3     [{'value_1': 5, 'value_7': 52, 'type': 'int'}]  
4     [{'value_1': 9, 'value_7': 52, 'type': 'int'}]  

j = json.dumps({"examples":df.to_dict(orient='records')}, default=str)

df = (df.assign(example_placeholder='xyz')
        .groupby(['as_of_date','ID','value_3','count'])[['value_1', 'value_7','example_placeholder']]
        .apply(lambda x:  x.to_dict('records'))
        .reset_index(name='display_rows'))
print (df)
   as_of_date     ID  value_3  count  \
0  02/03/2021  12345       55      2   
1  02/03/2021  34567       66      1   
2  02/03/2021  78945       77      1   
3  03/03/2021  12345       33      1   
4  03/03/2021  78945       22      1   

                                        display_rows  
0  [{'value_1': 5, 'value_7': 52, 'example_placeh...  
1  [{'value_1': 9, 'value_7': 52, 'example_placeh...  
2  [{'value_1': 9, 'value_7': 52, 'example_placeh...  
3  [{'value_1': 5, 'value_7': 52, 'example_placeh...  
4  [{'value_1': 9, 'value_7': 52, 'example_placeh...  

df = (df.assign(aa='xyz', type='int')
        .groupby(['as_of_date','ID','value_3','count'])[['value_1', 'value_7','aa', 'type']]
        .apply(lambda x:  x.to_dict('records'))
        .reset_index(name='display_rows'))
print (df)

   as_of_date     ID  value_3  count  \
0  02/03/2021  12345       55      2   
1  02/03/2021  34567       66      1   
2  02/03/2021  78945       77      1   
3  03/03/2021  12345       33      1   
4  03/03/2021  78945       22      1   

                                        display_rows  
0  [{'value_1': 5, 'value_7': 52, 'aa': 'xyz', 't...  
1  [{'value_1': 9, 'value_7': 52, 'aa': 'xyz', 't...  
2  [{'value_1': 9, 'value_7': 52, 'aa': 'xyz', 't...  
3  [{'value_1': 5, 'value_7': 52, 'aa': 'xyz', 't...  
4  [{'value_1': 9, 'value_7': 52, 'aa': 'xyz', 't...