Python 在转换为json格式之前,我想在两个不同的级别上对数据帧进行分组

Python 在转换为json格式之前,我想在两个不同的级别上对数据帧进行分组,python,pandas-groupby,Python,Pandas Groupby,我有以下结构和数据的数据框架。我想按两个不同的级别分组:level1bydoc\u id和doc\u name,以及level2 pgf\u id pgf\u data。执行groupby后,需要将其转换为以下格式的json df_final = (df.groupby(['pgf_id'], as_index=True) .apply(lambda x: x[['sent_id','sent_data','label']].to_dict('r')) .reset_index(

我有以下结构和数据的数据框架。我想按两个不同的级别分组:
level1
by
doc\u id
doc\u name
,以及
level2 pgf\u id pgf\u data
。执行
groupby
后,需要将其转换为以下格式的json

df_final = (df.groupby(['pgf_id'], as_index=True)
    .apply(lambda x: x[['sent_id','sent_data','label']].to_dict('r'))
    .reset_index().to_json(orient='records'))

doc_id  doc_name    pgf_id  pgf_data    sent_id sent_data   label
001abz  simple_doc  0001567a This is for understanding purpose. There are more 2 important sentences in the para.


我不确定您是否打算发布有效的JSON或某种变体,但我对此做了一些假设,主要是带有key
sent\u id
等的裸对象应该位于一个数组中,key
r
。如果您可以使用一些循环,下面是我所做的:

from json import dumps

# `df` is a pandas.DataFrame with your data

output = []
for (doc_id, doc_name), pgf_dataframe in df.groupby(['doc_id', 'doc_name']):
    document = {'doc_id': doc_id, 'doc_name': doc_name}
    paragraphs = []
    for (pgf_id, pgf_data), r_dataframe in pgf_dataframe.groupby(['pgf_id', 'pgf_data']):
        paragraph = {'pgf_id': pgf_id, 'pgf_text': pgf_data}
        events = []
        for i, row in r_dataframe.iterrows():
            events.append({'sent_id': row['sent_id'], 'sent_data': row['sent_data'], 'label': row['label']})
        paragraph['r'] = events
        paragraphs.append(paragraph)
    document['paragraphs'] = paragraphs
    output.append(document)

# `output` is a list of "document" objects.
print(dumps(output))

你的帖子完全被破坏了。请把它修好,我已经修好了。
from json import dumps

# `df` is a pandas.DataFrame with your data

output = []
for (doc_id, doc_name), pgf_dataframe in df.groupby(['doc_id', 'doc_name']):
    document = {'doc_id': doc_id, 'doc_name': doc_name}
    paragraphs = []
    for (pgf_id, pgf_data), r_dataframe in pgf_dataframe.groupby(['pgf_id', 'pgf_data']):
        paragraph = {'pgf_id': pgf_id, 'pgf_text': pgf_data}
        events = []
        for i, row in r_dataframe.iterrows():
            events.append({'sent_id': row['sent_id'], 'sent_data': row['sent_data'], 'label': row['label']})
        paragraph['r'] = events
        paragraphs.append(paragraph)
    document['paragraphs'] = paragraphs
    output.append(document)

# `output` is a list of "document" objects.
print(dumps(output))