Python 使用groupby将数据帧转换为2层嵌套JSON_Python_Json_Pandas_Dataframe_Object

Python 使用groupby将数据帧转换为2层嵌套JSON

python json pandas dataframe object

Python 使用groupby将数据帧转换为2层嵌套JSON,python,json,pandas,dataframe,object,Python,Json,Pandas,Dataframe,Object,假设我有一个名为df的数据帧，类似于： source tables src1 table1 src1 table2 src1 table3 src2 table1 src2 table2 我目前能够输出一个JSON文件，该文件通过各种源进行迭代，为每个源创建一个对象，代码如下： all_data = [] for src in df['s

假设我有一个名为

df

的数据帧，类似于：

source      tables
src1        table1       
src1        table2          
src1        table3       
src2        table1        
src2        table2

我目前能够输出一个JSON文件，该文件通过各种源进行迭代，为每个源创建一个对象，代码如下：

all_data = [] 

    for src in df['source']:
        source_data = {
            src: {
            }
        }
        all_data.append(source_data)

    with open('data.json', 'w') as f:
        json.dump(all_data, f, indent = 2)

这将产生以下输出：

[
  {
    "src1": {}
  },
  {
    "src2": {}
  }
]

本质上，我想做的是遍历这些源列表，并分别添加对应于每个源的表对象。我期望的输出类似于以下内容：

[
  {
    "src1": {
      "table1": {},
      "table2": {},
      "table3": {}
    }
  },
  {
    "src2": {
      "table1": {},
      "table2": {}
    }
  }
]

如果能帮助我修改代码，使之同时遍历tables列并将其附加到相应的源代码值中，我将不胜感激。提前谢谢。

这就是你要找的吗

data = [
    {k: v} 
    for k, v in df.groupby('source')['tables'].agg(
        lambda x: {v: {} for v in x}).items()
]

with open('data.json', 'w') as f:
    json.dump(data, f, indent=2)

答案有两层。要按源对表进行分组，请先使用

groupby

，并进行内部理解。您可以使用列表理解来以这种特定的格式组合数据

[
  {
    "src1": {
      "table1": {},
      "table2": {},
      "table3": {}
    }
  },
  {
    "src2": {
      "table1": {},
      "table2": {}
    }
  }
]

使用

的示例。对任意数据应用

df['tables2'] = 'abc'

def func(g): 
    return {x: y for x, y in zip(g['tables'], g['tables2'])}

data = [{k: v} for k, v in df.groupby('source').apply(func).items()]
data
# [{'src1': {'table1': 'abc', 'table2': 'abc', 'table3': 'abc'}},
#  {'src2': {'table1': 'abc', 'table2': 'abc'}}]

请注意，pandas 1.0无法使用此功能（可能是因为存在错误）
是的，此功能非常有效，谢谢！假设我需要更进一步，并在每个表中添加列列表（类似于将表列表添加到各个源中的方式），那么我如何才能做到这一点呢？@weovibewoivwoiv在groupby
条件下将agg
更改为apply
，然后你可以用你的数据做任意的事情，就像我给你展示的那样。我还不太确定这到底是怎么回事。如果你不介意的话，你能把这个额外的步骤附加到你原来的答案上吗？会很有帮助的，thanks@weovibewvoibweoivwoivpandas 1.0中有一个bug阻止了这样的表达式，您的版本是什么？我添加了一个示例。希望它能帮助你。我没有熊猫1.0，所以代码运行良好。但是，输出与我要求的不完全相同。寻找更像[{'src1'：{'table1'：{'col1'：{}，'col2'：{}，'table2'：{'col1'：{}，'col2'：{}，'col3'：{}}}]
本质上与以前一样，只有src和表，但现在还有另一个列层。我们能把它带到私有消息吗？