Python Dataframe对象转换为JSON并带有条件
之前我遵循了这个解决方案,但后来我意识到这与我的情况不一样,我想在JSON文件的Python Dataframe对象转换为JSON并带有条件,python,arrays,json,pandas,dataframe,Python,Arrays,Json,Pandas,Dataframe,之前我遵循了这个解决方案,但后来我意识到这与我的情况不一样,我想在JSON文件的display\u rows部分中显示与日期和ID相同的的一些值,我有这样一个数据框: as_of_date create_date ID value_1 count value_3 0 02/03/2021 02/03/2021 12345 5 2 55 1 02/03/2021 01/03/2021 12345 8 2
display\u rows
部分中显示与日期和ID相同的的一些值,我有这样一个数据框:
as_of_date create_date ID value_1 count value_3
0 02/03/2021 02/03/2021 12345 5 2 55
1 02/03/2021 01/03/2021 12345 8 2 55
2 02/03/2021 01/03/2021 34567 9 1 66
3 02/03/2021 02/03/2021 78945 9 1 77
4 03/03/2021 02/03/2021 78945 9 1 22
5 03/03/2021 02/03/2021 12345 5 1 33
其中,count
列是相同的ID
和作为
日期的行数,例如,对于作为
和ID=12345
,有两行(每行有不同的创建
日期,但我不关心创建
),因此前两行的count
是相同的:2
预期的JSON是:
{
"examples": [
{
"Id": 12345,
"as_of_date": "2021-03-02 00:00:00", # this field is datetime format
"value_3": 55,
"count": 2, # for the same 'ID=12345'&'as_of_date=02/03/2021'
"display_rows": [
{
"value_1": 5,
"type": "int" # 'type' field will always be 'int'
},
{
"value_1": 8,
"type": "int"
}
]
},
{
"Id": 34567,
"as_of_date": "2021-03-02 00:00:00",
"value_3": 66,
"count": 1,
"display_rows": [
{
"value_1": 9,
"type": "int"
}
]
},
{
"Id": 78945,
"as_of_date": "2021-03-02 00:00:00",
"value_3": 77,
"count": 1,
"display_rows": [
{
"value_1": 9,
"type": "int"
}
]
},
{
"Id": 78945,
"as_of_date": "2021-03-03 00:00:00",
"value_3": 22,
"count": 1,
"display_rows": [
{
"value_1": 9,
"type": "int"
}
]
},
{
"Id": 12345,
"as_of_date": "2021-03-03 00:00:00",
"value_3": 33,
"count": 1,
"display_rows": [
{
"value_1": 5,
"type": "int"
}
]
}
]
}
我花了几乎一整天的时间才弄明白,但似乎不起作用。。。有人能帮忙吗?谢谢。使用lambda函数处理值\u 1
列,如:
import json
df['as_of_date'] = pd.to_datetime(df['as_of_date'], dayfirst=True, errors='coerce')
f = lambda x: [ {"value_1": y, "type": "int" } for y in x]
df = (df.groupby(['as_of_date','ID','value_3','count'])['value_1']
.apply(f)
.reset_index(name='display_rows'))
print (df)
as_of_date ID value_3 count \
0 2021-03-02 12345 55 2
1 2021-03-02 34567 66 1
2 2021-03-02 78945 77 1
3 2021-03-03 12345 33 1
4 2021-03-03 78945 22 1
display_rows
0 [{'value_1': 5, 'type': 'int'}, {'value_1': 8,...
1 [{'value_1': 9, 'type': 'int'}]
2 [{'value_1': 9, 'type': 'int'}]
3 [{'value_1': 5, 'type': 'int'}]
4 [{'value_1': 9, 'type': 'int'}]
j = json.dumps({"examples":df.to_dict(orient='records')}, default=str)
编辑:
编辑:
您好,谢谢,只是一个快速的后续问题,如果我有多个列,如value\u 1
需要添加到display\u rows
?我试过:df=(df.groupby(['as_of u date','ID','value_3','count'])['value_1','value_7','value_8']].apply(lambda x:[{“value_1”:a,'value_7':b,'value_8':c,“type”:“int”}用于a,b,b,b,c在x中)。重置索引(name='display_rows')
这给了我一个错误。apply(da x:[{ValueError:ValueError:太多的值无法解包)
@Cecilia-答案已编辑。谢谢,如果我将'type='int'
更改为其他字符串,如.assign,它似乎不起作用(示例_placeholder='xyz'
我知道这个问题,我应该使用记录
而不是记录
,这是一个愚蠢的错误。@Cecilia-永远不要这样做,所以我不知道。也许可以尝试寻找一些解决方案或发布问题来解决这个问题。
print (j)
{"examples": [{"as_of_date": "2021-03-02 00:00:00", "ID": 12345, "value_3": 55, "count": 2, "display_rows": [{"value_1": 5, "type": "int"}, {"value_1": 8, "type": "int"}]}, {"as_of_date": "2021-03-02 00:00:00", "ID": 34567, "value_3": 66, "count": 1, "display_rows": [{"value_1": 9, "type": "int"}]}, {"as_of_date": "2021-03-02 00:00:00", "ID": 78945, "value_3": 77, "count": 1, "display_rows": [{"value_1": 9, "type": "int"}]}, {"as_of_date": "2021-03-03 00:00:00", "ID": 12345, "value_3": 33, "count": 1, "display_rows": [{"value_1": 5, "type": "int"}]}, {"as_of_date": "2021-03-03 00:00:00", "ID": 78945, "value_3": 22, "count": 1, "display_rows": [{"value_1": 9, "type": "int"}]}]}
#added some another column
df['value_7'] = 52
print (df)
as_of_date create_date ID value_1 count value_3 value_7
0 02/03/2021 02/03/2021 12345 5 2 55 52
1 02/03/2021 01/03/2021 12345 8 2 55 52
2 02/03/2021 01/03/2021 34567 9 1 66 52
3 02/03/2021 02/03/2021 78945 9 1 77 52
4 03/03/2021 02/03/2021 78945 9 1 22 52
5 03/03/2021 02/03/2021 12345 5 1 33 52
#added type column for last value in dict
df = (df.assign(type='int')
.groupby(['as_of_date','ID','value_3','count'])[['value_1', 'value_7','type']]
.apply(lambda x: x.to_dict('records'))
.reset_index(name='display_rows'))
print (df)
as_of_date ID value_3 count \
0 02/03/2021 12345 55 2
1 02/03/2021 34567 66 1
2 02/03/2021 78945 77 1
3 03/03/2021 12345 33 1
4 03/03/2021 78945 22 1
display_rows
0 [{'value_1': 5, 'value_7': 52, 'type': 'int'},...
1 [{'value_1': 9, 'value_7': 52, 'type': 'int'}]
2 [{'value_1': 9, 'value_7': 52, 'type': 'int'}]
3 [{'value_1': 5, 'value_7': 52, 'type': 'int'}]
4 [{'value_1': 9, 'value_7': 52, 'type': 'int'}]
j = json.dumps({"examples":df.to_dict(orient='records')}, default=str)
df = (df.assign(example_placeholder='xyz')
.groupby(['as_of_date','ID','value_3','count'])[['value_1', 'value_7','example_placeholder']]
.apply(lambda x: x.to_dict('records'))
.reset_index(name='display_rows'))
print (df)
as_of_date ID value_3 count \
0 02/03/2021 12345 55 2
1 02/03/2021 34567 66 1
2 02/03/2021 78945 77 1
3 03/03/2021 12345 33 1
4 03/03/2021 78945 22 1
display_rows
0 [{'value_1': 5, 'value_7': 52, 'example_placeh...
1 [{'value_1': 9, 'value_7': 52, 'example_placeh...
2 [{'value_1': 9, 'value_7': 52, 'example_placeh...
3 [{'value_1': 5, 'value_7': 52, 'example_placeh...
4 [{'value_1': 9, 'value_7': 52, 'example_placeh...
df = (df.assign(aa='xyz', type='int')
.groupby(['as_of_date','ID','value_3','count'])[['value_1', 'value_7','aa', 'type']]
.apply(lambda x: x.to_dict('records'))
.reset_index(name='display_rows'))
print (df)
as_of_date ID value_3 count \
0 02/03/2021 12345 55 2
1 02/03/2021 34567 66 1
2 02/03/2021 78945 77 1
3 03/03/2021 12345 33 1
4 03/03/2021 78945 22 1
display_rows
0 [{'value_1': 5, 'value_7': 52, 'aa': 'xyz', 't...
1 [{'value_1': 9, 'value_7': 52, 'aa': 'xyz', 't...
2 [{'value_1': 9, 'value_7': 52, 'aa': 'xyz', 't...
3 [{'value_1': 5, 'value_7': 52, 'aa': 'xyz', 't...
4 [{'value_1': 9, 'value_7': 52, 'aa': 'xyz', 't...