Python 在转换为json格式之前,我想在两个不同的级别上对数据帧进行分组
我有以下结构和数据的数据框架。我想按两个不同的级别分组:Python 在转换为json格式之前,我想在两个不同的级别上对数据帧进行分组,python,pandas-groupby,Python,Pandas Groupby,我有以下结构和数据的数据框架。我想按两个不同的级别分组:level1bydoc\u id和doc\u name,以及level2 pgf\u id pgf\u data。执行groupby后,需要将其转换为以下格式的json df_final = (df.groupby(['pgf_id'], as_index=True) .apply(lambda x: x[['sent_id','sent_data','label']].to_dict('r')) .reset_index(
level1
bydoc\u id
和doc\u name
,以及level2 pgf\u id pgf\u data
。执行groupby
后,需要将其转换为以下格式的json
df_final = (df.groupby(['pgf_id'], as_index=True)
.apply(lambda x: x[['sent_id','sent_data','label']].to_dict('r'))
.reset_index().to_json(orient='records'))
doc_id doc_name pgf_id pgf_data sent_id sent_data label
001abz simple_doc 0001567a This is for understanding purpose. There are more 2 important sentences in the para.
我不确定您是否打算发布有效的JSON或某种变体,但我对此做了一些假设,主要是带有key
sent\u id
等的裸对象应该位于一个数组中,keyr
。如果您可以使用一些循环,下面是我所做的:
from json import dumps
# `df` is a pandas.DataFrame with your data
output = []
for (doc_id, doc_name), pgf_dataframe in df.groupby(['doc_id', 'doc_name']):
document = {'doc_id': doc_id, 'doc_name': doc_name}
paragraphs = []
for (pgf_id, pgf_data), r_dataframe in pgf_dataframe.groupby(['pgf_id', 'pgf_data']):
paragraph = {'pgf_id': pgf_id, 'pgf_text': pgf_data}
events = []
for i, row in r_dataframe.iterrows():
events.append({'sent_id': row['sent_id'], 'sent_data': row['sent_data'], 'label': row['label']})
paragraph['r'] = events
paragraphs.append(paragraph)
document['paragraphs'] = paragraphs
output.append(document)
# `output` is a list of "document" objects.
print(dumps(output))
你的帖子完全被破坏了。请把它修好,我已经修好了。
from json import dumps
# `df` is a pandas.DataFrame with your data
output = []
for (doc_id, doc_name), pgf_dataframe in df.groupby(['doc_id', 'doc_name']):
document = {'doc_id': doc_id, 'doc_name': doc_name}
paragraphs = []
for (pgf_id, pgf_data), r_dataframe in pgf_dataframe.groupby(['pgf_id', 'pgf_data']):
paragraph = {'pgf_id': pgf_id, 'pgf_text': pgf_data}
events = []
for i, row in r_dataframe.iterrows():
events.append({'sent_id': row['sent_id'], 'sent_data': row['sent_data'], 'label': row['label']})
paragraph['r'] = events
paragraphs.append(paragraph)
document['paragraphs'] = paragraphs
output.append(document)
# `output` is a list of "document" objects.
print(dumps(output))